Query processing and Equivalence lookup

Query processing and equivalence lookup occur in setmmapi() and openmmapi() if query!=(byte *)NULL.

Control query parsing and equivalence lookup with the following APICP variables:

byte  *query
  : The user query interpret and get equivalences for.

byte  *eqprefix
  : The main equivalence file name.

byte  *ueqprefix
  : The user equivalence file name.

byte   see
  : Flag that says whether to lookup see references or not.

byte   keepeqvs
  : Flag that says whether to keep equivalences or not.

byte   keepnoise
 : Flag that says whether to keep noise words or not.

byte   withinproc
 : Flag that says whether to process the within operator (w/) or not.

byte   suffixproc
 : Flag that says whether to do suffix processing or not.

int    minwordlen
 : The smallest a word is allowed to get through suffix stripping.

byte **suffixeq
 : The list of suffixes.

byte **noise
 : The list of noise words.

int  (*eqedit)(APICP *)
 : Equivalence editor function.

int  (*eqedit2)(APICP *,EQVLST ***)
 : Equivalence editor function.

void  *usr
 : An arbitrary user data pointer.

NOTE: Also see Metamorph chapter here for further descriptions of these variables.

byte *query:
query is a pointer to a Metamorph query. This string typically comes directly from user input, but may be constructed or preprocessed by your program. All rules of a Metamorph query apply.

  • REX patterns are prefixed by '/'.

  • XPM patterns are prefixed by '%'.

  • NPM patterns are prefixed by '#'.

  • Required sets are prefixed by '+'.

  • Exclusive sets are prefixed by '-'`.

  • Normal sets are prefixed by '=' or nothing.

  • Intersection quantities are prefixed by '@'.

  • Equivalence lookup may be prevented/forced on an individual word or phrase by prefixing it with '~'.

  • Commas will be treated as whitespace except when part of a pattern (REX, XPM, or NPM).

  • Phrases or patterns with spaces in them that should be treated as a unit are surrounded by double quotes ('"').

  • Noise stripping is controlled by the keepnoise flag (see below).

  • Equivalence lookup may be completely turned off by setting eqprefix to (byte *)NULL (see below). Turning off equiv lookup does not affect query parsing as described above.

  • New delimiters may be specified using the within operator (w/).

byte *eqprefix:
This string contains the name of the main equivalence file. This typically includes the full path but may have a relative path or no path at all. The equivs may be relocated or even renamed.

Default eqprefix "builtin" which refers to a compiled in equiv file.

This default may be permanently adjusted by changing the macro API3EQPREFIX in the api3.h header file and recompiling api3.c and replacing the resultant object file in the library.

Equivalence lookup may be completely turned off by setting eqprefix to (byte *)NULL. Sometimes it is not appropriate to get the associations from the equiv file or you may want to run your application without the disk space overhead of the equiv file which is very large (around 2 megabytes). Turning off equiv lookup does not affect query parsing as described previously.

byte *ueqprefix:
This string contains the name of the user equivalence file. This typically includes the full path but may have a relative path or no path at all. The equivs may be relocated or even renamed.

Default ueqprefix for Unix :"/usr/local/morph3/eqvsusr" Default ueqprefix for MS-DOS:"c:\morph3\eqvsusr"

This default may be permanently adjusted by changing the macro API3UEQPREFIX in the api3.h header file and recompiling api3.c and replacing the resultant object file in the library.

Equivalences in the user equiv file edit and/or override those in the main equiv file.

byte withinproc:
Process the within operator (w/). The within operator allows changing the start and end delimiters from the query line. The argument of the within operator may be one of the built in names, a number indicating character proximity, or a REX expression. The built in names are:

Name    Meaning     Expression
sent    Sentence    \verb`[^\digit\upper][.?!][\space'"]`
para    Paragraph   \verb`\x0a=\space+  `
line    Line        \verb`$`
page    Page        \verb`x0c`
\#      Proximity   \verb`.{,#}`(where \# is the number of characters)

Any other string following the "w/" is considered a REX expression. When using a REX expression with the within operator both start and delimiters are set to the expression to set the end delimiter to a different expression specify another within operator and expression. e.g. "power w/tag: w/$" will set the start delimiter to "tag:" and the end delimiter to "$".

By default both delimiters will be excluded from the hit when using a REX with the within operator. To specify inclusion use a "W/" instead of "w/". You may specify different inclusion/exclusion for the end delimiter without repeating the expression if you wish to use the same expression for both. Simply use the "W/" or "w/" by itself for the end delimiter. e.g. "power w/$$ W/" will set both delimiters to "$$" but will exclude the start delimiter and include the end delimiter.

The default value for withinproc is 1. This default may be adjusted by changing the macro API3WITHINPROC in the api3.h header file and recompiling api3.c.

See also the section "Reprogramming the Within Operator".

byte see:
Lookup "see" references in the equiv file. The equiv file has "see" references much as a dictionary or thesaurus has. With this flag off "see" references are left in the word list as is. With it on, those references will be looked up and their equiv lists added to the list for the original word. This can greatly increase the number of equivs and abstraction for a given word. This is not needed in most cases.

The default value for see is 0. This default may be adjusted by changing the macro API3SEE in the api3.h header file and recompiling api3.c.

byte keepnoise:
Keep noise words. With this flag off any word in query, that is not part of a larger phrase, that is also found in the noise array will be removed from the list.

The default value for keepnoise is 1. This default may be adjusted by changing the macro API3KEEPNOISE in the api3.h header file and recompiling api3.c.

byte keepeqvs:
Invert normal meaning of ~ . With this flag on words will not normally have equivs. To get the equivs for a word use the ~ prefix.

The default value for keepeqvs is 1. This default may be adjusted by changing the macro API3KEEPEQVS in the api3.h header file and recompiling api3.c.

Setting keepeqvs to 0 does not eliminate looking for the equiv file. See the eqprefix variable for how to eliminate the equiv file completely.

byte suffixproc:
This is a flag that, if not set to 0, will cause the equiv lookup process to strip suffixes from query words and words from the equiv file to find the closest match if there is not an exact match. Words will not be stripped smaller than the minwordlen value (see below). This flag has a similar effect on the search process (see Metamorph section here).

The default value for suffixproc is 1. This default may be adjusted by changing the macro API3SUFFIXPROC in the api3.h header file and recompiling api3.c.

byte **suffixeq:
This is the list of word endings used by the suffix processor if suffixproc is on (see the description of lists). The suffix processor also has some permanent built in rules for stripping. This is the default list:

'   s  ies

The default may be changed by editing the suffixeq[] array in the function openapicp() in the file api3.c and recompiling.

int minwordlen:
This only applies if suffixproc is on. It is the smallest that a word is allowed to get before suffix stripping will stop and give up.

The default value for minwordlen is 5. This default may be adjusted by changing the macro API3MINWORDLEN in the api3.h header file and recompiling api3.c. This flag has a similar effect on the search process (see Metamorph section here).

byte **noise:
This is the default noise list:

a between got me she upon
about but gotten mine should us
after by had more so very
again came has most some was
ago can have much somebody we
all cannot having my someone went
almost come he myself something were
also could her never stand what
always did here no such whatever
am do him none sure what's
an does his not take when
and doing how now than where
another done i of that whether
any down if off the which
anybody each in on their while
anyhow else into one them who
anyone even is onto then whoever
anything ever isn't or there whom
anyway every it our these whose
are everyone just ourselves they why
as everything last out this will
at for least over those with
away from left per through within
back front less put till without
be get let putting to won't
became getting like same too would
because go make saw two wouldn't
been goes many see unless yet
before going may seen until you
being gone maybe shall up your

The default may be changed by editing the noise[] array in the function openapicp() in the file api3.c and recompiling.

void *usr:

This is a pointer that the application programmer my use as a method of passing arbitrary application specific information to the callback functions (*eqedit)() and (*eqedit2)(). This pointer is entirely under the control of the programmer. The Metamorph API does not reference it in any way except to set it to (void *)NULL in openapicp().


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.