Discussion

Delimiters can be defined as the beginning and ending patterns in the text which define the text unit inside of which your query items will be located. Concept proximity is adjusted through delimiters.

If you look for the words "bear" and "woods" within a sentence, the result will be a tight match to your query. Looking for "bear" and "woods" inside a paragraph is less restrictive. Requiring only that "bear" and "woods" occur inside the same section might or might not have much to do with what you are looking for. Knowing that, you would probably add more qualifying search items when searching by section; e.g., "bear" "woods" "winchester" "rifle". In short, the relevance of your search results will differ greatly based on the type of text unit by which you are searching.

Beginning and ending delimiters are defined by repeating patterns in the text, technically known as regular expressions. A sentence can be identified by its ending punctuation (period, question mark, or exclamation point); a paragraph is often identified by a few new lines in a row; a page is often identified by the presence of a page formatting character.

In a Metamorph search, the entered query concepts will be searched for within the bounds of two such expressions, called delimiters. These regular expressions are defined with REX syntax for Regular Expressions. If you know how to write a regular expression using REX you can enter delimiters of your own design. In lieu of that you can rely upon the defaults that have been provided.

Since the most common types of search are by line, sentence, paragraph, page, or whole document, these expressions have been written into the Metamorph Query Language so that you can signal their use dynamically from within any query, using English. The expression "w/delim" means "within a ..." where 4 characters following "w/" are an abbreviation for a commonly delimited unit of text. If you want to search by paragraph, you would add "w/para" as an item on the query line.

For example:

power struggle w/para

Such a search will look within any paragraph for an occurrence of the concept "power" and the concept "struggle".

You can designate a quantitative proximity, using the same syntax, by stating how many characters you want before and after the located search items. For example:

power struggle w/150
Such a search will look within a 300 character range (150 before, 150 after the first item located) for an occurrence of the concept power and the concept struggle. This is useful for text which doesn't follow any particular text pattern, such as source code.

You can also write your own expressions, useful for section heads and/or tails. To enter delimiters of your own design, create a REX expression first which works, and enter it following the "w/". For example:

power struggle w/\n\RSECTION

This search uses the pattern "SECTION" where it begins on a new line and is in Caps only, as both beginning and ending delimiters. Thus an occurrence of the set "power" and the set "struggle" need only occur within the same section as so demarcated, which might be several pages long.

You can figure out such useful section delimiter expressions when setting up an application, and use the corresponding APICP flag sdexp (start delimiter expression) and edexp (end delimiter expression) in a Vortex script to make use of them.

The following examples of queries would dictate the proximity of the search items "power" and "struggle", by specifying the desired delimiters. Noting the highlighted search items, you might try these searches on the demo text files to see the difference in what is retrieved.

power struggle w/line   within a line (1 new line)
  power struggle w/sent   within a sentence (ending punctuation)
  power struggle w/para   within a paragraph (a new line + some space)
  power struggle w/page   within a page (where format character exists)
  power struggle w/all    within whole document (all of the record)

  power struggle w/500    within a window of 500 characters forward
                          and backwards

  power struggle w/$$$    within user designed expression for a section;
                          where what follows the slash `/' is assumed to
                          be a REX expression.  (In this case, the
                          expression means 3 new lines in a row.)

More often than not the beginning and ending delimiters are the same. Therefore if you do not specify an ending delimiter (as in the above example), it will be assumed that the one specified is to be used for both. If two expressions are specified, the first will be beginning, the second will be ending. Specifying both would be required most frequently where special types of messages or sections are used which follow a prescribed format.

Another factor to consider is whether you want the expression defining the text unit to be included inside that text unit or not. For example, the ending delimiter for a sentence (ending punctuation from the located sentence) obviously belongs with the hit. However, the beginning delimiter (ending punctuation from the previous sentence) is really the end of the last sentence, and therefore should be excluded.

Inclusion or exclusion of beginning and ending delimiters with the hit has been thought out for the defaults provided. However, if you are designing your own beginning and ending expressions, you may wish to so specify.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.