SYNOPSIS
REX - Deterministic Regular Expression Matcher
FFS *openrex(byte *s);
FFS *closerex(FFS *fs);
FFS *mknegexp(FFS *fs);
byte *getrex(FFS *fs,byte *buf,byte *end,int operation);
int rexsize(FFS *ex);
XPM - Approximate Pattern Matcher
XPMS *openxpm(byte *s,int threshold); /* 0 < threshhold < 101 */
XPMS *closexpm(XPMS *xs);
byte *getxpm(XPMS *xs,byte *buf,byte *end,int operation);
PPM - Parallel String Pattern Matcher
PPMS *closeppm(PPMS *ps);
PPMS *openppm(byte **sl);
byte *getppm(PPMS *ps,byte *buf,byte *end,int operation);
SPM - Single String Pattern Matcher
SPMS *openspm(char *s);
SPMS *closespm(SPMS *fs);
byte *getspm(SPMS *fs,byte *buf,byte *end,int operation);
int spmhitsz(SPMS *fs);
NPM - Numeric Pattern Matcher
NPMS *opennpm(char *s);
NPMS *closenpm(NPMS *);
byte *getnpm(NPMS *,byte *,byte *,int);
DESCRIPTION
These pattern matchers represent the core search algorithms that are
present in the Metamorph program and API. Usage and syntax details
are not presented here as they are described in great detail in the
Metamorph User Manual. The programmer is encouraged to understand
their purpose and usage before implementing their own software using these
functions.
All of the pattern matchers behave in a similar fashion from a programming
perspective. The open___()
call initializes the matcher and
returns a "handle" to it, and the close___()
call free()s
the memory associated with that object's "handle". If the open___()
call fails for any reason it will return a cast pointer to NULL
.
The arguments passed to the
open___()
call are as follows:
xxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Call >Parameters Type(s) > Example
openrex()
>Rex expression > "\\x0d\\x0a+"
openxpm()
>any string, threshold > "abc widgets",80
openppm()
>a list of strings > {"abc","def","ghi",""};
opennpm()
>a numeric expression > ">1000<=million"
openspm()
>a string expression > "abc*def"
In all cases, the open___()
call is computationally intensive, as
each algorithm makes extensive use of dynamic programming techniques. It
is generally considered that the pattern matcher will be processing a
great deal of information between it's creation and destruction so that
the creation overhead is justified by the dramatic increase in search
rates.
If the pattern matchers are to be used in conjunction with one another then the programmer should optimize usage by dispatching the pattern matchers in order of relative speed. The following table enumerates the relative search rates:
xxxxxxxxxxxxxx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Pattern Types >Governing factors
SPM Fastest >longer strings are faster
REX . >longer expressions are usually faster
PPM . >shorter lists with the longest minimum strlen()
XPM . >shorter strings are faster
NPM Slowest >nothing makes a difference
The get___()
call is responsible for the location of hits
within a buffer of information. All of the pattern matchers share
a common parameter set for this operation:
(object pointer, byte *buf , byte *end, int operation)
open___()
call.
buf+bufsize
).
SEARCHNEWBUF
,CONTINUESEARCH
, BSEARCHNEWBUF
,
or BCONTINUESEARCH
.
The SEARCHNEWBUF
to locate the first occurrence of a hit within the
delineated buffer, and the CONTINUESEARCH
operation code indicates
that the pattern matcher should locate the next (or succesive) occurrence.
A pointer to the matched pattern will be returned or, a
(byte *)NULL
if no match was located within the buffer
for the given call.
The operation codes BSEARCHNEWBUF
and BCONTINUESEARCH
are
only understood by the REX pattern matcher, and are used to search
backwards from the end
pointer towards the buf
pointer. A
non NULL
return value will point to the beginning of the matched
pattern.
Some information about the matched hits for each of the pattern matchers may be obtained by looking into that pattern matcher's structure. The following structure members are the only valid ones for an API programmer to use:
NPMS /* Numeric Pattern Matcher */
{
int hitsz; /* the length of the matched quantity */
double hy; /* the maximum numeric value of the matched string */
double hx; /* the minimum numeric value of the matched string */
};
PPMS
{
byte **slist; /* the strings being sought */
int sn; /* the index of the located string within slist */
};
XPMS
{
word thresh; /* threshold above which a hit can occur */
word maxthresh; /* the max possible value a match may have */
word thishit; /* the value of this match */
word maxhit; /* max value located so far */
byte maxstr[DYNABYTE]; /* the string with highest match value */
};
SPMS
{
/* no usable members */
};
FFS /* aka REX */
{
/* no usable members */
};
Hit length information for REX and SPM is available through calls to
rexsize()
and spmhitsz()
respectively. Each of these
functions return the length of the last hit located by a call to
get___()
. The reason there are not similar calls available for the
othe pattern matchers is because their length is obtainable via the
structure.
EXAMPLE#include "api3.h"
/* this code breaks some rules in the interest of brevity */
main(argc,argv)
int argc;
char **argv;
{
void *pm=(void *)NULL;
int i;
void (*close)(); /* pointer to the close function */
void (*get)(); /* pointer to the get function */
char ln[80];
switch(*argv[1]) /* determine search type via leading character */
{
case '/' : pm=(void *)openrex(argv[1]+1);
get=getrex;
close=closerex;
break;
case '#' : pm=(void *)opennpm(argv[1]+1);
get=getnpm;
close=closenpm;
break;
case '%' : pm=(void *)openxpm(argv[1]+1,80);
get=getxpm;
close=closexpm;
break;
}
if(pm==(void *)NULL) /* check to see if it opened */
exit(1);
while(gets(ln)!=NULL)
{ /* see if there hit on this line */
if((*get)(pm,(byte *)ln,(byte *)(ln+strlen(ln)),SEARCHNEWBUF)
!=(byte *)NULL)
puts(ln);
}
(*close)(pm);
exit(0);
}