rex, split - regular expression search

`rex`, `split` - regular expression search

SYNOPSIS

<rex $exprs $data[ /]>               <split $exprs $data[ /]>
  or                                   or
<rex [options] $exprs $data>         <split [options] $exprs $data>
  ...                                  ...
</rex>                               </split>

DESCRIPTION
The rex function searches for each REX expression value of $exprs in each value of $data. The split function acts the same way, except that it returns the non-matching data from $data (i.e. the SPLIT option below). The return type is varbyte if the $data is type varbyte or byte, otherwise it is varchar.

In version 8 syntax - i.e. when the syntaxversion pragma (here) is 8 or more, the default in version 8 - rex and split are non-looping if self-closed, looping otherwise (requiring a close tag), like other loopable statements. In version 7 and earlier syntax, they are looping if any options (except SYNTAX) are given, non-looping otherwise.

When non-looping, rex and split return a list of the matching (or non-matching) hits from $data, in $ret. In addition, the variable $ret.off contains the integer byte offsets into the current search buffer where the hits start.

When looping however, hits (and offsets) are returned one at a time per iteration, and $loop/$next are also set as in SQL ($loop starts at 0). Any statements inside the block are executed once per returned hit. The loop can be exited with BREAK or RETURN.In version 8.00.1645136290 20220217 and later, the self-closing syntax also sets $loop and $next.

The looping syntax was added in version 2.6.938200000 19990924; $ret.off in version 3.01.966500000 20000816 (and supported for non-looping syntax as well in version 6.00.1355622000 20121215).

Options are:

ROW
Note that in version 8 or later syntax - i.e. when the syntaxversion pragma (here) is 8 or more - return values never accumulate in $ret nor $ret.off. Thus the ROW flag is unneeded, and not accepted. It is only valid in version 7 and earlier syntax.
As in SQL, ROW indicates that values do not accumulate in $ret, and it should not be a loop variable; each new value erases the previous. ROW should be used in a looping rex/split when a large number of return values are expected but only need to be examined one at a time; this saves memory and time since all the hits do not have to be stored in memory. ROW should also be used when functions are called within the block, because otherwise $ret is a loop variable, hindering multi-value returns.
SKIP=$n Skip the first $n hits when returning values. This does not affect the value of $loop.
MAX=$n Return at most $n hits.
SPLIT Instead of returning the hit data, return non-matching data, i.e. the parts of $data outside the hits. The REX expressions in effect become delimiters for the data returned. This is similar to the command-line rex option -v (except there are no delimiters as with command-line rex). This is the default for the split command.
NONEMPTY Ignore empty (zero-length) return values. This is useful with SPLIT when empty values are not significant.
SYNTAX=re2|rex
The $exprs syntax is RE2 or REX; the default is REX. Note that the expression syntax may also be changed by prefixing the expression with "\<re2\>" or "\<rex\>". Added in version 7.06. See here for more details on RE2. This option, unlike others, does not imply looping in syntaxversion 7.

DIAGNOSTICS
rex returns a list of the matching hits from $data. split returns a list of the non-matching data. The corresponding byte offsets into the current search item are returned in $ret.off as well.

The syntaxversion pragma (here) affects the syntax of this statement: the ROW flag is not accepted in version 8 or later.