Multiple Search Algorithms

The Metamorph Query Language allows for several methods of searching. You can enter a natural language question. You can specify which words, phrases, or regular expressions you wish to look for, and your search request will be processed accordingly. To accomplish all this, several different search algorithms are used which go about pattern matching in different ways. These are called up internally by Metamorph. The chief pattern matchers in use are:

SPM
Metamorph's Single Pattern Matcher (includes wildcarding, the `*' operator)
PPM
Metamorph's Parallel Pattern Matcher
REX
Metamorph's Regular EXpression Pattern Matcher
XPM
Metamorph's ApproXimate Pattern Matcher
NPM
Metamorph's Numeric Pattern Matcher

When you enter the most common kind of Metamorph search, a normal English word, Metamorph calls SPM or PPM. SPM handles the morpheme processing for root words which have no equivalences; PPM handles the root words with their lists of equivalences expanded into sets of words. Where there is only a single word in a list (i.e., a root word which has no equivalences) SPM is used instead so as to optimize search speed. PPM searches for every occurrence of every valid word form for each item in a list in parallel, and will handle the multiple lists of words created from a routine query.

PPM and SPM make it possible to routinely execute such searches at tremendous speed, locating hits containing all combinations of all items from each of these lists.

Entering words in English calls PPM or SPM; this is the default and no special denotation is necessary. You can make use of a wildcard (*) operator with English words if you wish. Entering "gorb*yelt" would locate "Gorbachev had a meeting with Yeltsin". The asterisk (*) will locate up to 80 characters per asterisk noted.

REX makes it possible to look for fixed or variable length regular expressions of any kind and is integrated into the Metamorph search routine so that you can mix and match words and regular expressions. You signal REX by putting a a forward slash (/) in front of the word or expression, and REX will be called by Metamorph to handle that string, utilizing all the rules of REX syntax.

XPM allows you to specify an "almost right" pattern which you are unsure of, so that you can find approximately what you have specified. XPM is also integrated into the search procedure and can be mixed in with PPM word searches and REX regular expressions; you signal XPM with a percent sign (%) denoting the percentage of proximity to the entered pattern you desire.

NPM allows you to look for numeric quantities in text which may have been expressed in English. NPM does number crunching through all possible numbers found in the text to locate those numbers which are in the specified range of desired numbers. It is generally used in combination with some other search item, such as a unit. NPM is signalled with a pound sign (#) preceding the numeric quantity you wish to match.

The heart of Metamorph's ability to encompass so many functions and subroutines so effectively, in a way which produces quick results for the user in acceptable response time, is its exceedingly fast search algorithms. Other bodies of technology have attempted to create small replicas of a few of the functions in Metamorph, but none of this can be successful if it cannot be done fast enough to get plentiful and accurate search results.

Metamorph on its own has been benchmarked on some fast Unix machines at around 4.5 million characters per second internal throughput rate. The speed and accuracy of the pattern matching techniques employed make possible Metamorph's versatile and flexible operation.


Copyright © Thunderstone Software     Last updated: Oct 5 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.