Query processing and equivalence lookup occur in setmmapi()
and
openmmapi()
if query!=(byte *)NULL
.
Control query parsing and equivalence lookup with the following APICP variables:
byte *query
: The user query interpret and get equivalences for.
byte *eqprefix
: The main equivalence file name.
byte *ueqprefix
: The user equivalence file name.
byte see
: Flag that says whether to lookup see references or not.
byte keepeqvs
: Flag that says whether to keep equivalences or not.
byte keepnoise
: Flag that says whether to keep noise words or not.
byte withinproc
: Flag that says whether to process the within operator (w/) or not.
byte suffixproc
: Flag that says whether to do suffix processing or not.
int minwordlen
: The smallest a word is allowed to get through suffix stripping.
byte **suffixeq
: The list of suffixes.
byte **noise
: The list of noise words.
int (*eqedit)(APICP *)
: Equivalence editor function.
int (*eqedit2)(APICP *,EQVLST ***)
: Equivalence editor function.
void *usr
: An arbitrary user data pointer.
NOTE: Also see Metamorph chapter here for
further descriptions of these variables.
byte *query:
query is a pointer to a Metamorph query. This string typically
comes directly from user input, but may be constructed or
preprocessed by your program. All rules of a Metamorph query
apply.
'/'
.'%'
.'#'
.'+'
.
'-'`.'='
or nothing.'@'
.'~'
.'"'
).(byte *)NULL
(see below). Turning off equiv lookup does not
affect query parsing as described above.
byte *eqprefix:
This string contains the name of the main equivalence file. This
typically includes the full path but may have a relative path or
no path at all. The equivs may be relocated or even renamed.
Default eqprefix
"builtin"
which refers to a compiled in equiv
file.
This default may be permanently adjusted by changing the macro
API3EQPREFIX
in the api3.h
header file and recompiling api3.c
and
replacing the resultant object file in the library.
Equivalence lookup may be completely turned off by setting
eqprefix
to (byte *)NULL
. Sometimes it is not appropriate to get
the associations from the equiv file or you may want to run your
application without the disk space overhead of the equiv file
which is very large (around 2 megabytes). Turning off equiv
lookup does not affect query parsing as described previously.
byte *ueqprefix:
This string contains the name of the user equivalence file. This
typically includes the full path but may have a relative path or
no path at all. The equivs may be relocated or even renamed.
Default ueqprefix
for Unix :"/usr/local/morph3/eqvsusr"
Default ueqprefix
for MS-DOS:"c:\morph3\eqvsusr"
This default may be permanently adjusted by changing the macro
API3UEQPREFIX
in the api3.h
header file and recompiling api3.c
and
replacing the resultant object file in the library.
Equivalences in the user equiv file edit and/or override those in the main equiv file.
byte withinproc:
Process the within operator (w/)
. The within operator allows
changing the start and end delimiters from the query line. The
argument of the within operator may be one of the built in names,
a number indicating character proximity, or a REX expression. The
built in names are:
Name Meaning Expression
sent Sentence \verb`[^\digit\upper][.?!][\space'"]`
para Paragraph \verb`\x0a=\space+ `
line Line \verb`$`
page Page \verb`x0c`
\# Proximity \verb`.{,#}`(where \# is the number of characters)
Any other string following the "w/"
is considered a REX
expression. When using a REX expression with the within operator
both start and delimiters are set to the expression to set the end
delimiter to a different expression specify another within
operator and expression. e.g. "power w/tag: w/$"
will set the
start delimiter to "tag:"
and the end delimiter to "$"
.
By default both delimiters will be excluded from the hit when
using a REX with the within operator. To specify inclusion use a
"W/"
instead of "w/"
. You may specify different
inclusion/exclusion for the end delimiter without repeating the
expression if you wish to use the same expression for both.
Simply use the "W/"
or "w/"
by itself for the end delimiter. e.g.
"power w/$$ W/"
will set both delimiters to "$$"
but will exclude
the start delimiter and include the end delimiter.
The default value for withinproc
is 1
. This default may be
adjusted by changing the macro API3WITHINPROC
in the api3.h
header
file and recompiling api3.c
.
See also the section "Reprogramming the Within Operator".
byte see:
Lookup "see" references in the equiv file. The equiv file has
"see" references much as a dictionary or thesaurus has. With this
flag off "see" references are left in the word list as is. With
it on, those references will be looked up and their equiv lists
added to the list for the original word. This can greatly
increase the number of equivs and abstraction for a given word.
This is not needed in most cases.
The default value for see is 0
. This default may be adjusted by
changing the macro API3SEE
in the api3.h
header file and
recompiling api3.c
.
byte keepnoise:
Keep noise words. With this flag off any word in query, that is
not part of a larger phrase, that is also found in the noise array
will be removed from the list.
The default value for keepnoise
is 1
. This default may be adjusted
by changing the macro API3KEEPNOISE
in the api3.h
header file and
recompiling api3.c
.
byte keepeqvs:
Invert normal meaning of ~
. With this flag on words will not
normally have equivs. To get the equivs for a word use the ~
prefix.
The default value for keepeqvs
is 1
. This default may be adjusted
by changing the macro API3KEEPEQVS
in the api3.h
header file and
recompiling api3.c
.
Setting keepeqvs
to 0
does not eliminate looking for the equiv
file. See the eqprefix
variable for how to eliminate the equiv
file completely.
byte suffixproc:
This is a flag that, if not set to 0
, will cause the equiv lookup
process to strip suffixes from query words and words from the
equiv file to find the closest match if there is not an exact
match. Words will not be stripped smaller than the minwordlen
value (see below). This flag has a similar effect on the search
process (see Metamorph section here).
The default value for suffixproc
is 1
. This default may be
adjusted by changing the macro API3SUFFIXPROC
in the api3.h
header
file and recompiling api3.c
.
byte **suffixeq:
This is the list of word endings used by the suffix processor if
suffixproc
is on (see the description of lists). The suffix
processor also has some permanent built in rules for stripping.
This is the default list:
' s ies
The default may be changed by editing the suffixeq[]
array in the
function openapicp()
in the file api3.c
and recompiling.
int minwordlen:
This only applies if suffixproc
is on. It is the smallest that a
word is allowed to get before suffix stripping will stop and give
up.
The default value for minwordlen
is 5
. This default may be
adjusted by changing the macro API3MINWORDLEN
in the api3.h
header
file and recompiling api3.c. This flag has a similar effect on
the search process (see Metamorph section here).
byte **noise:
This is the default noise list:
a | between | got | me | she | upon |
about | but | gotten | mine | should | us |
after | by | had | more | so | very |
again | came | has | most | some | was |
ago | can | have | much | somebody | we |
all | cannot | having | my | someone | went |
almost | come | he | myself | something | were |
also | could | her | never | stand | what |
always | did | here | no | such | whatever |
am | do | him | none | sure | what's |
an | does | his | not | take | when |
and | doing | how | now | than | where |
another | done | i | of | that | whether |
any | down | if | off | the | which |
anybody | each | in | on | their | while |
anyhow | else | into | one | them | who |
anyone | even | is | onto | then | whoever |
anything | ever | isn't | or | there | whom |
anyway | every | it | our | these | whose |
are | everyone | just | ourselves | they | why |
as | everything | last | out | this | will |
at | for | least | over | those | with |
away | from | left | per | through | within |
back | front | less | put | till | without |
be | get | let | putting | to | won't |
became | getting | like | same | too | would |
because | go | make | saw | two | wouldn't |
been | goes | many | see | unless | yet |
before | going | may | seen | until | you |
being | gone | maybe | shall | up | your |
The default may be changed by editing the noise[]
array in the
function openapicp()
in the file api3.c
and recompiling.
void *usr:
This is a pointer that the application programmer my use as a
method of passing arbitrary application specific information to
the callback functions (*eqedit)()
and (*eqedit2)()
. This pointer
is entirely under the control of the programmer. The Metamorph
API does not reference it in any way except to set it to
(void *)NULL
in openapicp()
.