Adjusting Proximity Range by Specifying Delimiters

By default Texis considers the entire field to be a hit when the full text is retrieved.

If you want your search items to occur within a more tightly constrained proximity range this can be adjusted. If you are using Vortex you will need to allow within operators which are disabled by default due to the extra processing required.

Add a "within" operator to your query syntax; "w/line" indicates a line; "w/para" indicates a paragraph; "w/sent" indicates a sentence; "w/all" incdicates the entire field; "w/#" indicates # characters. The default proximity is "w/all".

Example: Using the legal ordinance text, we are searching the full text bodies of those ordinances for controls issued about dogs. The following query uses sentence proximity to qualify its hits.

WHERE BODY LIKE 'dog control w/sent'

This sentence qualifies as a hit because "control" and "dogs" are in the same sentence.

Ordinances provide that the animal CONTROL officer takes
     possession of DOGS which are free of restraint.

Add a within operator to the Metamorph query to indicate both stated search items must occur within a single line of text, rather than within a sentence.

WHERE BODY LIKE 'dog control w/line'

The retrieved concept group has changed from a sentence to a line, so "dog" and "control" must occur in closer proximity to each other. Now the line, rather than the sentence, is the hit.

CONTROL officer takes possession of DOGS

Expanding the proximity range to a paragraph broadens the allowed distance between located search words.

WHERE BODY LIKE 'dog control w/para'

The same query with a different "within" operator now locates this whole paragraph as the hit:

The mayor, subject to the approval of the city council,
     shall appoint an animal CONTROL officer who is qualified to
     perform the duties of an animal control officer under the
     laws of this state and the ordinances of the city.  This
     officer shall take possession of any DOG which is free of
     restraint in the city.

The words "control" and "dog" span different lines and different sentences, but are within the same paragraph.

These "within" operators for designating proximity are also referred to as delimiters. Any delimiter can be designed by creating a regular expression using REX syntax which follows the "w/". Anything following "w/" that is not one of the previously defined special delimiters is assumed to be a REX expression. For example:

WHERE BODY LIKE 'dog control w/\RSECTION'

What follows the `w/' now is a user designed REX expression for sections. This would work on text which contained capitalized headers leading with "SECTION" at the beginning of each such section of text.

Delimiters can also be expressed as a number of characters forward and backwards from the located search items. For example:

WHERE BODY LIKE 'dog control w/500'

In this example "dog" and "control" must occur within a window of 500 characters forwards and backwards from the first item located.

More often than not the beginning and ending delimiters are the same. Therefore if you do not specify an ending delimiter (as in the above example), it will be assumed that the one specified is to be used for both. If two expressions are specified, the first will be beginning, the second will be ending. Specifying both would be required most frequently where special types of messages or sections are used which follow a prescribed format.

Another factor to consider is whether you want the expression defining the text unit to be included inside that text unit or not. For example, the ending delimiter for a sentence obviously belongs with the hit. However, the beginning delimiter is really the end of the last sentence, and therefore should be excluded.

Inclusion or exclusion of beginning and ending delimiters with the hit has been thought out for the defaults provided with the program. However, if you are designing your own beginning and ending expressions, you may wish to specify this.



Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.