Expressions

  • REX search expressions are composed of characters and operators. Operators are characters with special meaning to REX. The following characters have special meaning: "\=+*?{},[]^$.-!" and must be escaped with a "\" if they are meant to be taken literally. The string ">>" is also special and if it is to be matched, it should be written "\>>". Not all of these characters are special all the time; if an entire string is to be escaped so it will be interpreted literally, only the characters "\=?+*{[^$.!>" need be escaped.

  • A "\" followed by an "R" or an "I" means to begin respecting or ignoring alphabetic case distinction. (Ignoring case is the default.) These switches stay in effect until the end of the subexpression. They do not apply to characters inside range brackets.

  • A "\" followed by an "L" indicates that the characters following are to be taken literally, case-sensitive, up to the next "\L". The purpose of this operation is to remove the special meanings from characters.

  • A subexpression following "\F" (followed by) or "\P" (preceded by) can be used to root the rest of an expression to which it is tied. It means to look for the rest of the expression "as long as followed by ..." or "as long as preceded by ..." the subexpression following the \F or \P. Subexpressions before and including one with \P, and subexpressions after and including one with \F, will be considered excluded from the located expression itself.

  • A "\" followed by one of the following C language character classes matches any character in that class: alpha, upper, lower, digit, xdigit, alnum, space, punct, print, graph, cntrl, ascii. Note that the definition of these classes may be affected by the current locale.

  • A "\" followed by one of the following special characters will assume the following meaning: n=newline, t=tab, v=vertical tab, b=backspace, r=carriage return, f=form feed, 0=the null character.

  • A "\" followed by Xn or Xnn where n is a hexadecimal digit will match that character.

  • A "\" followed by any single character (not one of the above) matches that character. Escaping a character that is not a special escape is not recommended, as the expression could change meaning if the character becomes an escape in a future release.

  • The character "^" placed anywhere in an expression (except after a "[") matches the beginning of a line (same as \x0A in Unix or \x0D\x0A in Windows).

  • The character "$" placed anywhere in an expression matches the end of a line (\x0A in Unix, \x0D\x0A in Windows).

    Note: The beginning of line ("^") and end of line ("$") notation expressions for Windows are both identified as a 2 character notation; i.e., REX under Windows matches "\x0D\x0A" (carriage return, line feed) as beginning and end of line, rather than "\x0A" as beginning, and "\x0D" as end.

  • The character "." matches any character.

  • A single character not having special meaning matches that character.

  • A string enclosed in brackets ("[]") is a set, and matches any single character from the string. Ranges of ASCII character codes may be abbreviated with a dash, as in "[a-z]" or "[0-9]". A "^" occurring as the first character of the set will invert the meaning of the set, i.e. any character not in the set will match instead. A literal "-" must be preceded by a "\". The case of alphabetic characters is always respected within brackets.

    A double-dash ("--") may be used inside a bracketed set to subtract characters from the set; e.g. "[\alpha--x]" for all alphabetic characters except "x". The left-hand side of a set subtraction must be a range, character class, or another set subtraction. The right-hand side of a set subtraction must be a range, character class, or a single character. Set subtraction groups left-to-right. The range operator "-" has precedence over set subtraction. Set subtraction was added in Texis version 6.

  • The ">>" operator in the first position of a fixed expression will force REX to use that expression as the "root" expression off which the other fixed expressions are matched. This operator overrides one of the optimizers in REX. This operator can be quite handy if you are trying to match an expression with a "!" operator or if you are matching an item that is surrounded by other items. For example: "x+>>y+z+" would force REX to find the "y"s first then go backwards and forwards for the leading "x"s and trailing "z"s.

  • Normally, an empty expression such as "=" (i.e. 1 occurrence of nothing) is meaningless. However, if such an empty expression is the first or last in the list, and is the root expression (i.e. contains ">>"), it will constrain the whole expression list to only match at the start or end of the buffer. For example: ">>=first" would only match the string "first" if it occurs at the start of the search buffer. Similarly, "last=>>=" would only match "last" at the end of the buffer.

  • The "!" character in the first position of an expression means that it is not to match the following fixed expression. For example: "start=!finish+" would match the word "start" and anything past it up to (but not including the word "finish". Usually operations involving the NOT operator involve knowing what direction the pattern is being matched in. In these cases the ">>" operator comes in handy. If the `>>" operator is used, it comes before the "!". For example: ">>start=!finish+finish" would match anything that began with "start" and ended with "finish". The NOT operator cannot be used by itself in an expression, or as the root expression in a compound expression.

    Note that "!" expressions match a character at a time, so their repetition operators count characters, not expression-lengths as with normal expressions. E.g. "!finish{2,4}" matches 2 to 4 characters, whereas "finish{2,4}" matches 2 to 4 times the length of "finish".


Copyright © Thunderstone Software     Last updated: Nov 8 2024

 

Copyright © 2024 Thunderstone Software LLC. All rights reserved.