REX is a very powerful pattern matcher which can be called
internally within the Metamorph Query Language. It can also be used
as an external stand-alone tool. Any pattern or expression which is
outlined herein can be specified from within a Metamorph query by
preceding it with a forward slash (/
). Similarly, any
REX pattern can be entered as Start or End Delimiter expression, or
used otherwise in a Vortex script which calls for rex
, there
requiring no forward slash (/
).
REX (for Regular EXpression) allows you to search
for any fixed or variable length regular expression, and executes more
efficiently types of patterns you might normally try to look for with
the Unix Grep
family. REX puts all these facilities into
one program, along with a Search & Replace capability, easier to
learn syntax, faster execution, ability to set delimiters within which
you want to search (i.e., it can search across lines), and goes beyond
what is possible with other search tools.
It may be somewhat new to those who haven't previously used such tools, but you'll find that if you follow REX's syntax very literally and practice searching for simple, then more complex expressions, that it becomes quite understandable and easy to use. One thing to keep in mind is that REX looks both forwards and backwards, something quite different from other types of tools. Therefore, when you construct a REX expression, make sure it makes sense from a global view of the file; that is, whether you'd be looking forwards, or looking backwards.
A REX pattern can be constructed from a series of F-REX'S (pronounced "f-rex"), or Fixed length Regular EXpressions, where repetition operators after each subexpression are used to state how many of each you are looking for. Unless otherwise delineated, REX assumes you are looking for one occurrence of each subexpression stated within the expression, as specified within quotation marks.
To begin learning to use REX, first try some easy F-REX (fixed length) patterns. For example, on the query line, type in:
/life
This pattern "life
" is the same as the pattern
"life=
", and means that you are asking the pattern matcher
REX to look for one occurrence of the fixed length expression
"life
".
The equal sign `=
' is used to designate one occurrence of a
fixed length pattern, and is assumed if not otherwise stated.
The plus sign `+
' is used to designate one or more occurrences
of a fixed length pattern. If you were to search for "life+
"
REX would look for one or more occurrences of the word
"life
": e.g., it would locate the pattern "life
"
and it would also locate the pattern "lifelife
".
The asterisk `*
' is used to designate zero or more occurrences
of a fixed length pattern. If you were to try to look for
"life*
", REX would be directed to look for zero or more
occurrences of the word life
(rather than 1 or more
occurrences); therefore, while it could locate "life
" or
"lifelife
", it would also have to look for 0 occurrences of
"life
" which could be every pattern in the file. This would
be an impossible and unsatisfactory search, and so is not a legal
pattern to look for. The rule is that a `*
' search must be
rooted to something else requiring one or more occurrences.
If you root "life*
" to a fixed length pattern which must find
at least one occurrence of something, the pattern becomes legal.
Therefore, you could precede "life*
" with "love=
",
making the pattern "love=life*
". Now it is rooted to
something which definitely can be found in the file; e.g., one
occurrence of the word "love
", followed by 0 or more
occurrences of the word "life
". Such an expression
"love=life*
" would locate "love
",
"lovelife
", and "lovelifelife
".
If there is more than one subexpression within a REX pattern, any
designated repetition operator will attach itself to the longest
preceding fixed length pattern. Therefore a pattern preceding a plus
sign `+
', even if it is made of more than one subexpression,
will be treated as one or more occurrences of the whole preceding
pattern. Use the equal sign `=
', if necessary, to separate
these subexpressions and prevent an incorrect interpretation.
For example: if you say "lovelife*
" (rather than
"love=life*
") the `*
' operator will attach itself to
the whole preceding expression, "lovelife
", and will
therefore be translated to mean 0 or more occurrences of the entire
pattern "lovelife
", making it an illegal expression. On the
other hand, in the expression "love=life*
", REX will
correctly look for 1 occurrence of "love
", followed by 0 or
more occurrences of "life
".
Use the "-x
" option in REX
(from the Windows or Unix
command line) to get feedback on how REX translates the syntax you
have entered to what it is really looking for. In this way you can
debug your use of syntax and learn to use REX to its maximum power.
Using our same example, if you enter on the Windows or Unix command line:
rex -x ``love=life*''
you will get the following output back to show you how REX will interpret your search request:
1 occurrence(s) of : [Ll][Oo][Vv][Ee]
followed by from 0 to 32000 occurrences of : [Ll][Ii][Ff][Ee]
The brackets above, as: "[Ll]
", mean either of the
characters shown inside the brackets, in that position; i.e., a
capital or small `l
' in the 1st character position, a capital
or small `o
' in the 2nd character position, and so on.
You can find REX syntax for all matters discussed above and delineated
below, by typing in "REX
" followed by a carriage return on
the Windows or Unix command line.