xtree - maintain sorted list of values

 

SYNOPSIS

<xtree [XTREE=$treename] $command [$values] [$treename][ /]>
or
<xtree [XTREE=$treename] [options] $command [$values] [$treename]>
  ...
</xtree>


DESCRIPTION
The <xtree> function is a fast way to maintain a sorted list of temporary string values in memory - faster than using a SQL table. Values can be added, removed, counted and deleted from a tree, named with treename (default tree name is empty). The values parameter is the list of values to operate on, for INSERT, DELETE and SEARCH; for SET it is the option(s) to set. Otherwise the values parameter is not given, or is empty if treename is specified via the last-argument syntax.

Each value in an xtree has a count and a sequence number associated with it. The count is the number of times the value was inserted. The sequence number is the order in which the value was first inserted, starting with 0 for the first value inserted. These values are retrieved either with the COUNT and SEQ commands, or in the ret.count and ret.seq variables.

If no options (or only XTREE option) are given to <xtree>, it returns values in ret as a simple function. If any of the options below are given (except XTREE), then <xtree> is a looping statement, e.g. a closing "</xtree>" tag is expected, and ret.count/ret.seq are set (and looped over) as well. The variables ret.count and ret.seq, when set, are the corresponding count and sequence numbers for ret.

When <xtree> is a looping statement, loop/next are set as in SQL (loop always starts at 0). Any statements inside the block are executed once per returned value, with ret being a loop variable accumulating values. The loop can be exited with BREAK or RETURN. The loop syntax was added in version 2.6.938200000 19990924.

Options are:

  • ROW

    As in SQL, do not accumulate hits in ret/ret.count/ret.seq, and do not make them loop variables; each new value erases the previous. ROW should be used in a looping <xtree> when a large number of return values are expected but only need to be examined one at a time; this saves memory and time since all the values do not have to be stored in memory. ROW should also be used when functions are called within the block, because otherwise ret is a loop variable, hindering multi-value returns (see --warn-if-ret-loop, here). ROW should also be used when the COUNT and SEQ commands are not to be used, as it saves the memory of preserving those values for a later COUNT or SEQ.

  • SKIP=$n Skip the first $n values. This does not affect the value of loop.

  • MAX=$n Return at most $n values.

  • XTREE=$treename Use tree named treename instead of the default. This option was added in version 5.01.1217380000 20080729. The old syntax was to give the tree name as the last argument, with no option name, which can be confusing. Only one syntax - XTREE option or last-unnamed-argument - may be used in a statement.

The $command argument determines what <xtree> does, whether it takes a values argument or not, and what values, if any, are returned. It is one of the following commands:

  • CLEAR Clear the tree. All counts are set to 0, but the tree and its values remain in memory. values need not be set. No values are returned.

  • DUMP Dump the tree. The return value ret is a unique, sorted list of the values in the tree that have non-zero counts, in ascending order. In the looping syntax, ret.count and ret.seq are set to the corresponding count and sequence numbers; in the non-looping syntax for version 7 and earlier, these values must be retrieved with separate COUNT and SEQ commands afterwards. If the CLEAR option (not command) is set with SET, each value is also cleared as it is returned. values need not be set.

  • COUNT Dump the value counts from the immediately previous non-looping DUMP, SEARCH or INSERT command on the same tree. ret will contain a list of the number of times each value from the operation occurs in the tree. For the looping syntax, nothing is returned - the counts and sequences are available during the DUMP/SEARCH/INSERT in ret.count and ret.seq. values need not be set.

  • SEQ Returns the corresponding sequence numbers (starting at 0), from the immediately previous non-looping DUMP, SEARCH or INSERT on the same tree. These indicate the order in which the strings were originally added. For the looping syntax, nothing is returned - the counts and sequences are available during the DUMP/SEARCH/INSERT in ret.count and ret.seq. values need not be set.

  • INSERT Adds the values to the tree, if not already present, and increment their counts. If the NOADD option is set (below), new values are not added; only already-present values are incremented. In versions after 3.01.963400000 20000712, the inserted values - and counts and sequence numbers, if looping - are returned. This saves an extra SEARCH to determine the numbers right after an INSERT.

  • DELETE Deletes the values from the tree, i.e. sets their counts to 0. No values are returned in ret.

  • SEARCH Returns a list of the values that are in the tree. Counts and sequences are available as per DUMP and INSERT.

  • SET

    Set the option(s) given in values. In version 6.00.1305253000 20110512 and later, 1 is returned on success, 0 on failure; previous versions always returned empty string. Looping version iterates 0 times in version 7 and earlier. Options:

    • noadd

      Do not add new values to the tree during an INSERT. Inserting a previously-cleared value will make it "reappear", because it was present before but now has a non-zero count. But wholly-new values will not be inserted.

    • clear Also clear the tree during a DUMP.

    • stringcomparemode $mode Set string compare mode - how values are compared - for the tree to $mode. The syntax is the same as the apicp setting stringcomparemode (here). The default (and the meaning of values default or builtin) is the current apicp stringcomparemode. This option must be set before any data is inserted into the tree, as it would alter the sort order. Added in version 6.

    • storefolded $onOff Set the store-folded flag for the tree to $onOff (i.e. "on" or "off"). Enabling store-folded increases speed slightly, at the expense of not retaining inserted values exactly.

      If storefolded is on, values are folded (according to stringcomparemode) once, at insert time, and stored folded. This increases speed because only one folding is needed. However, values returned from DUMP etc. will be the folded value, not the originally-inserted value. E.g. if stringcomparemode is set to ignorecase, the inserted value "McDuff" will be DUMPed as "mcduff".

      If storefolded is off (the default), values are folded whenever needed for comparisons, and stored exactly as inserted. This decreases speed because potentially more than one folding is needed. However, the originally-inserted value is preserved, i.e. if "McDuff" is inserted in an ignorecase tree, it will be DUMPed as "McDuff".

      In either case, stringcomparemode is respected for searches, inserts etc.; just the value of ret is affected. This option must be set before any data is inserted into the tree, as it would alter the sort order. Added in version 6.

    • defaults

      Restores defaults: noadd, clear and storefolded are off; stringcomparemode is the same as the <apicp> value. Note that since the last two settings can only be set on an empty tree, defaults can only be fully restored on an empty tree as well. In version 6.00.1305312600 20110513 and earlier, stringcomparemode and storefolded mode were not restored at all when setting defaults.

  • GET

    Gets the option(s) listed in values. Added in version 6.00.1305253000 20110512. Options and what they return are:

    • noadd

      1 if not adding new values to the tree during an INSERT; 0 if adding.

    • clear

      1 if clearing the tree during a DUMP, 0 if not.

    • stringcomparemode

      The current stringcomparemode for the tree.

    • storefolded

      1 if storing values folded, 0 if not.

    • numitems

      The number of items in the tree, i.e. the number of non-zero count items (returned by DUMP). It is much faster (constant-time) to obtain this value via get numitems than by DUMPing the tree and checking loop.

    • numallocateditems

      The number of allocated items in the tree, i.e. including zero-count (cleared) items.

    • memused

      The number of bytes of memory used by the tree.

  • FLUSH Remove the tree and free its memory. Nothing is returned.


DIAGNOSTICS
The <xtree> function returns either nothing, a list of strings, or a list of integer counts/sequences. If invoked as a looping command, ret.count and ret.seq may be set as well.


EXAMPLE

<$txt = "senselessness" "this" "test" "finesse">
  <rex "." $txt>          <!-- split the words into letters -->
  <xtree INSERT $ret>
  <xtree ROW DUMP>
    $ret $ret.count
  </xtree>

The output would be a count of how many times each letter occurs among all the words:

e 7
f 1
h 1
i 2
l 1
n 3
s 10
t 3


CAVEATS
The <xtree> function was added Apr. 29 1997. The loop syntax was added in version 2.6.938200000 19990924, as was support for full binary data.

Note that loop, next, ret.count and ret.seq are only set in the looping version.

It is not possible to modify an xtree while inside an <xtree> loop for the same tree, nor is it possible to start another DUMP inside the loop. An error message such as "Cannot flush xtree: still in use" or "Cannot insert into xtree while walking it" may be generated in such cases.

In long-running scripts, trees should be FLUSHed when done, to save memory. Since they are kept only in memory, they will be lost when the script exits. ROW should be used whenever possible to save memory and speed.

In version 6 and later, string comparisons use the current apicp stringcomparemode setting (here), changeable with the set stringcomparemode tree option.


SEE ALSO
sort, uniq


Copyright © Thunderstone Software     Last updated: Sep 25 2019
Copyright © 2019 Thunderstone Software LLC. All rights reserved.