Numeric Quantities Entered as Text (NPM)

NPM, the Numeric Pattern Matcher, is one of several pattern matchers that can be called by the user in sending out a Metamorph query. It is signified by a pound sign `#' in the starting position, in the same way that the tilde `~' calls SPM, a percent sign `%' calls XPM, a forward slash `/' calls REX, and no special character in the first position (where there are equivalences) calls PPM or SPM.

There are still many numeric patterns that are best located with a REX expression to match the range of characters desired. However, when you need the program to interpret your query as a numeric quantity, use NPM. NPM does number crunching through all possible numbers found in the text to locate those numbers which are in the specified range of desired numbers. Therefore where a lot of numeric searching is being done you may find that a math co-processor can speed up such searches.

Since all numbers in the text become items to be checked for numeric validity, one should tie down the search by specifying one or more other items along with the NPM item. For example you might enter on the query line:

     cosmetic sales $ #>1000000

Such a search would locate a sentence like:

Income produced from lipstick brought the company $4,563,000 last year.

In this case "income" is located by PPM as a match to "sales", "lipstick" is located by PPM as a match to "cosmetic", the English character "$" signifying "dollars" is located by SPM as a match to "$", and the numeric quantity represented in the text as "4,563,000" is located by NPM as a match to "#>1000000" (a number greater than one million). Another example:

     cosmetic sales $ #>million

Even though one can locate the same sentence by entering the above query, it is strongly recommended that searches entered on the query line are entered as precise numeric quantities. The true intent of NPM is to make it possible to locate and treat as a numeric value information in text which was not entered as such.

You would find the above sentence even without specifying the string "$", but realize that the dollar sign ($) in the text is not part of the numeric quantity located by NPM. There may be cases where it is important to specify both the quantity and the unit. For example, if you are looking for quantities of coal, you wouldn't want to find coal pricing information by mistake. Compare these two searches:

     Query1:  Australia coal tons #>500
     Query2:  Australia coal $ #>500

The first would locate the sentence:

Petroleum Consolidated mined 1200 tons of coal in Australia.

The second would locate the sentence:

From dividends paid out of the $4,563,000 last year.

In this case "income" is located by PPM as a match to "sales", "lipstick" is located by PPM as a match to "cosmetic", the English character "$" signifying "dollars" is located by SPM as a match to "$", and the numeric quantity represented in the text as "4,563,000" is located by NPM as a match to "#>1000000" (a number greater than one million). Another example:

     cosmetic sales $ #>million

Even though one can locate the same sentence by entering the above query, it is strongly recommended that searches entered on the query line are entered as precise numeric quantities. The true intent of NPM is to make it possible to locate and treat as a numeric value information in text which was not entered as such.

You would find the above sentence even without specifying the string "$", but realize that the dollar sign ($) in the text is not part of the numeric quantity located by NPM. There may be cases where it is important to specify both the quantity and the unit. For example, if you are looking for quantities of coal, you wouldn't want to find coal pricing information by mistake. Compare these two searches:

     Query1:  Australia coal tons #>500
     Query2:  Australia coal $ #>500

The first would locate the sentence:

Petroleum Consolidated mined 1200 tons of coal in Australia.

The second would locate the sentence:

From dividends paid out of the $3.5 million profit in the coal industry, they were able to afford a vacation in Australia.

Some units, such as kilograms, milliliters, nanoamps, and such, are understood by NPM to be their true value; that is, in the first case, 1000 grams. Use NPMP to find out which units are understood and how they will be interpreted. The carrot mark (^) shows where the parser stops understanding valid numeric quantities. Note that an abbreviation such as "kg" is not understood as a quantity, but only a unit; therefore, "5 kilograms" has a numeric quantity of 5000 (grams), where "5 kg" has a numeric quantity of 5 (kg's).

Beware of entering something that doesn't make sense. For example, a quantity cannot be less than 6 and greater than 10 at the same time, and therefore "#<6>10" will make the controlfile sent to the engine unable to be processed.

Do not enter ambiguity on the query line; NPM is intended to deal with ambiguity in the text, not in the query. The safest way to enter NPM searches is by specifying the accurate numeric quantity desired. Example:

     date #>=1980<=1989
This query will locate lines containing a date specification and a year, where one wants only those entries from the 1980's. It would also locate dates in legal documents which are spelled out. Example:
     retirement benefits age #>50<80
This query will locate references about insurance benefits which reference age 54, 63, and so on. Reflecting the truer intent of NPM, a sentence like the following could also be retrieved.

At fifty-five one is awarded the company's special Golden Age program.

In the event that a numeric string contains a space, it must be in quotes to be interpreted correctly. So, although it is strongly not recommended, one could enter the following:

     revenue "#>fifty five"
With this, you can locate references like the following example.

Their corporate gross income was $3.5 million profit in the coal industry, they were able to afford a vacation in Australia.

Some units, such as kilograms, milliliters, nanoamps, and such, are understood by NPM to be their true value; that is, in the first case, 1000 grams. Use NPMP to find out which units are understood and how they will be interpreted. The carrot mark (^) shows where the parser stops understanding valid numeric quantities. Note that an abbreviation such as "kg" is not understood as a quantity, but only a unit; therefore, "5 kilograms" has a numeric quantity of 5000 (grams), where "5 kg" has a numeric quantity of 5 (kg's).

Beware of entering something that doesn't make sense. For example, a quantity cannot be less than 6 and greater than 10 at the same time, and therefore "#<6>10" will make the controlfile sent to the engine unable to be processed.

Do not enter ambiguity on the query line; NPM is intended to deal with ambiguity in the text, not in the query. The safest way to enter NPM searches is by specifying the accurate numeric quantity desired. Example:

     date #>=1980<=1989
This query will locate lines containing a date specification and a year, where one wants only those entries from the 1980's. It would also locate dates in legal documents which are spelled out. Example:
     retirement benefits age #>50<80
This query will locate references about insurance benefits which reference age 54, 63, and so on. Reflecting the truer intent of NPM, a sentence like the following could also be retrieved.

At fifty-five one is awarded the company's special Golden Age program.

In the event that a numeric string contains a space, it must be in quotes to be interpreted correctly. So, although it is strongly not recommended, one could enter the following:

     revenue "#>fifty five"
With this, you can locate references like the following example.

Their corporate gross income was $1.4 million before they merged with Acme Industrial.

Keep in mind that an NPM Search done within the context of Metamorph relies upon occurrences of intersections of found items inside the specified text delimiters, just as any Metamorph search. It is still not a database tool. The Engine will retrieve any hit which satisfies all search requirements including those which contain additional numeric information beyond what was called for.

In an application where Metamorph Hit Markup has been enabled, exactly what was found will be highlighted. This is the easiest way to get feedback on what was located to satisfy search requirements. If there are any questions about results, review basic program theory and compare to the other types of searches as given elsewhere in this chapter.


Copyright © Thunderstone Software     Last updated: Dec 10 2018
Copyright © 2019 Thunderstone Software LLC. All rights reserved.