Some Search Examples and Explanations

Example 1:

Let's say that we want to search for any occurrence of An Intel 80X86 processor on the same line with the concept of "speed" or "benchmark" as long as the string "Motorola" is not present.

The query is: +/80=[1-4]?86 -/motorola speed benchmark

Explanation:

A leading '+' means "this must be present".

A leading '-' means "this must not be present".

The '/' signals the use of a regular-expression.

'/80=[1-4]?86' will locate an '80' followed by an optional ('1' or '2' or '3' or '4') followed by an '86'. This will locate: 8086, 80186, 80286, 80386 or 80486.

'/motorola' will locate 'MOTOROLA' or 'Motorola' or 'motorola' (or any other combination of alphabetic cases).

'speed' will locate any word that means "speed".

'benchmark' will locate any word that means "benchmark".

The beginning and ending delimiting expressions would be defined as '\n' (meaning a new-line character).

The Metamorph search engine will now optimize this search and will perform the following actions:

A:
Search for any pattern that matches '/80=[1-4]?86'. When it is located do item (B).
B:
Search backwards for the start delimiter '\n' (or begin of file/record whichever comes first).
C:
Search forwards for the ending delimiter '\n' (or end of file/record whichever comes first).
D:
Search for the pattern '/motorola' between the start and end delimiters. If it is not located do item (E), otherwise go to item (A).
E:
Search for the set of words that mean "benchmark". If a member is located do item (G), otherwise, do item (F).
F:
Search for the set of words that mean "speed". If a member is located do item (G), otherwise, go to item (A).
G:
Inform the user that a hit has been located.

Example 2:

Let's say we are searching an address and phone number list trying to find an entry for a person whose name has been apparently entered incorrectly.

The query: "%60 Jane Plaxton" "%60 234 rhoads dr." /OH /49004

Because our database is large, we want to enter as much as possible about what we know about Ms. Plaxton so that we decrease the number of erroneous hits. The actual address in our database looks as follows:

Jane Plxaton
243 Roads Dr.
Middle Town OH 49004

This is a little exaggerated for reasons of clarity, but what has happened is that the data-entry operator has transposed the 'x' and the 'a' in 'Plaxton' as well as the '4' and '3' and has also misspelled 'Rhodes'.

The query we performed has four sets:

xxxxxxxxxxxxxxxxxxxxxxxx = xxxxxxxxxxxxxxxxxxxxx 
A 60% approximation of:   > "Jane Plaxton"       
A 60% approximation of:   > "234 rhoads dr."     
The state string      :   > OH                   
The zip code string   :   > 49004

The database records are separated by a blank line, therefore our start and end delimiters will be '\n\n' (two new-line characters).

The Approximate pattern matcher will be looking for the name and street address information and will match anything that comes within 60matcher will default to 80regular-expression pattern matcher will be looking for the state and zip-code strings. We are searching for three intersections of the four sets (this is the default action).

Example 3:

We are reading the electronic version of the Wall Street Journal and we are interested in locating any occurrence of profits and/or losses that amount to more than a million dollars.

The query: +#>1,000,000 +dollar @0 profit loss gain

The '+' symbol in front of the first two terms indicates that they must be present in the hit. The '@0' tells Metamorph to find zero intersections of the following sets. Put another way, only one of the remaining sets needs to be located.

The sets:

  • Mandatory (because of the '+' symbol):

    • Any quantity in the text that is greater than one million.

    • Any word (or string) that means "dollar".

  • Permutation (because of the '@0'):

    • Anything that means "profit".

    • Anything that means "loss".

    • Anything that means "gain".

We would probably define the delimiters to be either a sentence or a paragraph.

The following would qualify as hits to this query:

  • Congress has spent 2.5 billion dollars on the stealth bomber.

  • Lockheed Corp. has taken a four million dollar contract from Boeing.

  • The Lottery income from John Q. Public last week was One Million Two Hundred and Fifty Thousand dollars and twenty five cents.

Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.