REX, The Regular Expression Pattern Matcher

REX is a very powerful pattern matcher which can be called internally within the Metamorph Query Language. It can also be used as an external stand-alone tool. Any pattern or expression which is outlined herein can be specified from within a Metamorph query by preceding it with a forward slash (/). Similarly, any REX pattern can be entered as Start or End Delimiter expression, or used otherwise in a Vortex script which calls for rex, there requiring no forward slash (/).

REX (for Regular EXpression) allows you to search for any fixed or variable length regular expression, and executes more efficiently types of patterns you might normally try to look for with the Unix Grep family. REX puts all these facilities into one program, along with a Search & Replace capability, easier to learn syntax, faster execution, ability to set delimiters within which you want to search (i.e., it can search across lines), and goes beyond what is possible with other search tools.

It may be somewhat new to those who haven't previously used such tools, but you'll find that if you follow REX's syntax very literally and practice searching for simple, then more complex expressions, that it becomes quite understandable and easy to use. One thing to keep in mind is that REX looks both forwards and backwards, something quite different from other types of tools. Therefore, when you construct a REX expression, make sure it makes sense from a global view of the file; that is, whether you'd be looking forwards, or looking backwards.

A REX pattern can be constructed from a series of F-REX'S (pronounced "f-rex"), or Fixed length Regular EXpressions, where repetition operators after each subexpression are used to state how many of each you are looking for. Unless otherwise delineated, REX assumes you are looking for one occurrence of each subexpression stated within the expression, as specified within quotation marks.

To begin learning to use REX, first try some easy F-REX (fixed length) patterns. For example, on the query line, type in:

/life

This pattern "life" is the same as the pattern "life=", and means that you are asking the pattern matcher REX to look for one occurrence of the fixed length expression "life".

The equal sign `=' is used to designate one occurrence of a fixed length pattern, and is assumed if not otherwise stated.

The plus sign `+' is used to designate one or more occurrences of a fixed length pattern. If you were to search for "life+" REX would look for one or more occurrences of the word "life": e.g., it would locate the pattern "life" and it would also locate the pattern "lifelife".

The asterisk `*' is used to designate zero or more occurrences of a fixed length pattern. If you were to try to look for "life*", REX would be directed to look for zero or more occurrences of the word life (rather than 1 or more occurrences); therefore, while it could locate "life" or "lifelife", it would also have to look for 0 occurrences of "life" which could be every pattern in the file. This would be an impossible and unsatisfactory search, and so is not a legal pattern to look for. The rule is that a `*' search must be rooted to something else requiring one or more occurrences.

If you root "life*" to a fixed length pattern which must find at least one occurrence of something, the pattern becomes legal. Therefore, you could precede "life*" with "love=", making the pattern "love=life*". Now it is rooted to something which definitely can be found in the file; e.g., one occurrence of the word "love", followed by 0 or more occurrences of the word "life". Such an expression "love=life*" would locate "love", "lovelife", and "lovelifelife".

If there is more than one subexpression within a REX pattern, any designated repetition operator will attach itself to the longest preceding fixed length pattern. Therefore a pattern preceding a plus sign `+', even if it is made of more than one subexpression, will be treated as one or more occurrences of the whole preceding pattern. Use the equal sign `=', if necessary, to separate these subexpressions and prevent an incorrect interpretation.

For example: if you say "lovelife*" (rather than "love=life*") the `*' operator will attach itself to the whole preceding expression, "lovelife", and will therefore be translated to mean 0 or more occurrences of the entire pattern "lovelife", making it an illegal expression. On the other hand, in the expression "love=life*", REX will correctly look for 1 occurrence of "love", followed by 0 or more occurrences of "life".

Use the "-x" option in REX (from the Windows or Unix command line) to get feedback on how REX translates the syntax you have entered to what it is really looking for. In this way you can debug your use of syntax and learn to use REX to its maximum power. Using our same example, if you enter on the Windows or Unix command line:

rex -x ``love=life*''

you will get the following output back to show you how REX will interpret your search request:

1 occurrence(s) of : [Ll][Oo][Vv][Ee]
     followed by from 0 to 32000 occurrences of : [Ll][Ii][Ff][Ee]

The brackets above, as: "[Ll]", mean either of the characters shown inside the brackets, in that position; i.e., a capital or small `l' in the 1st character position, a capital or small `o' in the 2nd character position, and so on.

You can find REX syntax for all matters discussed above and delineated below, by typing in "REX" followed by a carriage return on the Windows or Unix command line.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.