# REX Expression Syntax

• Expressions are composed of characters and operators. Operators are characters with special meaning to REX. The following characters have special meaning: "\=+*?{},[]^$.-!" and must be escaped with a \' if they are meant to be taken literally. The string ">>" is also special and if it is to be matched, it should be written "\>>". Not all of these characters are special all the time; if an entire string is to be escaped so it will be interpreted literally, only the characters "\=?+*{[^$.!>" need be escaped.

• A \' followed by an R' or an I' mean to begin respecting or ignoring alphabetic case distinction. (Ignoring case is the default.) These switches do not apply inside range brackets.

• A \' followed by an L' indicates that the characters following are to be taken literally up to the next \L'. The purpose of this operation is to remove the special meanings from characters.

• A subexpression following \F' (followed by) or \P' (preceded by) can be used to root the rest of an expression to which it is tied. It means to look for the rest of the expression "as long as followed by ..." or "as long as preceded by ..." the subexpression following the \F or \P, but the designated subexpression will be considered excluded from the located expression itself.

• A \' followed by one of the following C' language character classes matches that character class: alpha, upper, lower, digit, xdigit, alnum, space, punct, print, graph, cntrl, ascii.

• A \' followed by one of the following special characters will assume the following meaning: n=newline, t=tab, v=vertical tab, b=backspace, r=carriage return, f=form feed, 0=the null character.

• A \' followed by Xn or Xnn where n is a hexadecimal digit will match that character.

• A \' followed by any single character (not one of the above) matches that character. Escaping a character that is not a special escape is not recommended, as the expression could change meaning if the character becomes an escape in a future release.

• The character ^' placed anywhere in an expression (except after a [') matches the beginning of a line. (same as: \x0A in Unix or \x0D\x0A in Windows)

• The character \$' placed anywhere in an expression matches the end of a line. (\x0A in Unix, \x0D\x0A in Windows)

• The character .' matches any character.

• A single character not having special meaning matches that character.

• A string enclosed in brackets [] is a set, and matches any single character from the string. Ranges of ASCII character codes may be abbreviated as in [a-z] or [0-9]. A ^' occurring as the first character of the set will invert the meaning of the set. A literal -' must be preceded by a \'. The case of alphabetic characters is always respected within brackets.

A double-dash ("--") may be used inside a bracketed set to subtract characters from the set; e.g. "[\alpha--x]" for all alphabetic characters except "x". The left-hand side of a set subtraction must be a range, character class, or another set subtraction. The right-hand side of a set subtraction must be a range, character class, or a single character. Set subtraction groups left-to-right. The range operator "-" has precedence over set subtraction. Set subtraction was added in version 6.

• The >>' operator in the first position of a fixed expression will force REX to use that expression as the "root" expression off which the other fixed expressions are matched. This operator overrides one of the optimizers in REX. This operator can be quite handy if you are trying to match an expression with a !' operator or if you are matching an item that is surrounded by other items. For example: "x+>>y+z+" would force REX to find the "y's" first then go backwards and forwards for the leading "x's" and trailing "z's".

• The !' character in the first position of an expression means that it is not to match the following fixed expression. For example: "start=!finish+" would match the word "start" and anything past it up to (but not including the word "finish". Usually operations involving the "!" operator involve knowing what direction the pattern is being matched in. In these cases the >>' operator comes in handy. If the >>' operator is used, it comes before the !'. For example: ">>start=!finish+finish" would match anything that began with "start" and ended with "finish". The "!" operator cannot be used by itself in an expression, or as the root expression in a compound expression. NOTE: This "!" operator "nots" the whole expression rather than its sequence of characters, as in earlier versions of REX.

Note that "!" expressions match a character at a time, so their repetition operators count characters, not expression-lengths as with normal expressions. E.g. "!finish{2,4}" matches 2 to 4 characters, whereas "finish{2,4}" matches 2 to 4 times the length of "finish`".

Copyright © Thunderstone Software     Last updated: Aug 4 2020