Delimiters can be defined as the beginning and ending patterns in the text which define the text unit inside of which your query items will be located. Concept proximity is adjusted through delimiters.
If you look for the words "bear
" and "woods
" within
a sentence, the result will be a tight match to your query. Looking
for "bear
" and "woods
" inside a paragraph is less
restrictive. Requiring only that "bear
" and "woods
"
occur inside the same section might or might not have much to do with
what you are looking for. Knowing that, you would probably add more
qualifying search items when searching by section; e.g.,
"bear
" "woods
" "winchester
"
"rifle
". In short, the relevance of your search results will
differ greatly based on the type of text unit by which you are
searching.
Beginning and ending delimiters are defined by repeating patterns in the text, technically known as regular expressions. A sentence can be identified by its ending punctuation (period, question mark, or exclamation point); a paragraph is often identified by a few new lines in a row; a page is often identified by the presence of a page formatting character.
In a Metamorph search, the entered query concepts will be searched for within the bounds of two such expressions, called delimiters. These regular expressions are defined with REX syntax for Regular Expressions. If you know how to write a regular expression using REX you can enter delimiters of your own design. In lieu of that you can rely upon the defaults that have been provided.
Since the most common types of search are by line, sentence,
paragraph, page, or whole document, these expressions have been
written into the Metamorph Query Language so that you can signal their
use dynamically from within any query, using English. The expression
"w/delim
" means "within a ..." where 4 characters
following "w/
" are an abbreviation for a commonly delimited
unit of text. If you want to search by paragraph, you would add
"w/para
" as an item on the query line.
For example:
power struggle w/para
Such a search will look within any paragraph for an occurrence of the
concept "power
" and the concept "struggle
".
You can designate a quantitative proximity, using the same syntax, by
stating how many characters you want before and after the located
search items. For example: power struggle w/150
Such a search will look within a 300 character range
(150 before, 150 after the first item located) for an occurrence of
the concept power
and the concept struggle
. This is
useful for text which doesn't follow any particular text pattern, such
as source code.
You can also write your own expressions, useful for section heads
and/or tails. To enter delimiters of your own design, create a
REX expression first which works, and enter it following the
"w/
". For example:
power struggle w/\n\RSECTION
This search uses the pattern "SECTION
" where it begins on a
new line and is in Caps only, as both beginning and ending delimiters.
Thus an occurrence of the set "power
" and the set
"struggle
" need only occur within the same section as so
demarcated, which might be several pages long.
You can figure out such useful section delimiter expressions when
setting up an application, and use the corresponding APICP flag
sdexp
(start delimiter expression) and edexp
(end
delimiter expression) in a Vortex script to make use of them.
The following examples of queries would dictate the proximity of the
search items "power
" and "struggle
", by specifying
the desired delimiters. Noting the highlighted search items, you
might try these searches on the demo text files to see the difference
in what is retrieved.
power struggle w/line within a line (1 new line)
power struggle w/sent within a sentence (ending punctuation)
power struggle w/para within a paragraph (a new line + some space)
power struggle w/page within a page (where format character exists)
power struggle w/all within whole document (all of the record)
power struggle w/500 within a window of 500 characters forward
and backwards
power struggle w/$$$ within user designed expression for a section;
where what follows the slash `/' is assumed to
be a REX expression. (In this case, the
expression means 3 new lines in a row.)
More often than not the beginning and ending delimiters are the same. Therefore if you do not specify an ending delimiter (as in the above example), it will be assumed that the one specified is to be used for both. If two expressions are specified, the first will be beginning, the second will be ending. Specifying both would be required most frequently where special types of messages or sections are used which follow a prescribed format.
Another factor to consider is whether you want the expression defining the text unit to be included inside that text unit or not. For example, the ending delimiter for a sentence (ending punctuation from the located sentence) obviously belongs with the hit. However, the beginning delimiter (ending punctuation from the previous sentence) is really the end of the last sentence, and therefore should be excluded.
Inclusion or exclusion of beginning and ending delimiters with the hit has been thought out for the defaults provided. However, if you are designing your own beginning and ending expressions, you may wish to so specify.