Technical Description

Within Metamorph a "set" can be any one of four different types of text data:

  • The set of words or phrases that mean the same thing.

  • The set of text patterns that match a regular-expression.

  • The set of text patterns that are approximately the same.

  • The set of quantities that are within some range.

There are three types of operations that can be used in conjunction with any set:

xxxxxxxxxxxxxxxxxxxxxx= xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
INCLUSION             > The set must be present.                 
EXCLUSION             > The set must not be present.             
PERMUTATION           > X out of Y sets must be present.

The set logic operations are performed within two boundaries:

  • A starting delimiter (e.g.; the beginning of a sentence).

  • An ending delimiter (e.g,; the end of a sentence).

Each type of set plays an important role in the real-world use of a text retrieval tool:

  • The word-list pattern matcher can locate any word form of an entire list of English words and/or phrases.

  • The regular-expression pattern matcher allows the user to search for things like dates, part numbers, social security numbers, and product codes.

  • The approximate pattern matcher can search for things like misspellings, typos, and names or addresses that are similar.

  • The numeric/quantity pattern matcher can look for numeric values that are present in the text in almost any form and allows the user to search for them generically by their value.

The Metamorph search engine will always optimize the search operations performed so that it will minimize the amount of CPU utilization and maximize the throughput search rate. At the heart of the Metamorph search engine lie seven of the most efficient pattern matchers there are for locating items within text. With the exception of the Approximate Pattern Matcher, all of these pattern matchers use a proprietary algorithmic technique that is guaranteed to out-perform any other published pattern matching algorithm (including those described by Boyer-Moore-Gosper and Knuth-Pratt-Morris).

Providing the user with set-logic to manipulate combinations of these set-types gives them the ability to search for just about anything that they might want to find in their textual information. The query tool in general can be as simple or sophisticated as the user wishes, with the simplest query being a simple natural-language question.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.