The Programmers' Interface Overview

There are thousands of stand-alone Metamorph programs in the field today, and over time we have received many requests by application developers who would like to be able to embed our searching technology inside their particular application. It has taken us a long time to figure out how to provide a simple and clean method to provide a solution to their problems. We have tried to make it as easy as possible while providing the maximum power and flexibility.

All of the code that comprises Metamorph has been written in ANSI compliant 'C' Language. The source code to the API (only) is provided to the programmer for reference and modification. Metamorph has currently been compiled and tested on 22 different UNIX platforms, MS-DOS, and IBM MVS. The API can be ported by Thunderstone to almost any Machine/OS that has an ANSI compliant 'C' compiler.

The set of calls in the API are structured in a fashion similar to fopen(), fclose(), ftell(), and gets(), standard library functions. And just like you can have multiple files open at the same time, you can open as many simultaneous Metamorph queries as needed. (One reason you might do this is to have a different search in effect for two different fields of the same record.)

The API itself allows the software engineer to conduct a Metamorph search through any buffer or file that might contain text. There are two data structures that are directly involved with the API:

APICP     /* this structure contains all the control parameters */
MMAPI       /* this structure is passed around to the API calls */

The APICP structure contains all the default parameters required by the API. It is separate from the MMAPI structure so that its contents can be easily manipulated by the developer. An APICP contains the following information:

  • A flag telling Metamorph to do suffix processing

  • A flag telling it do prefix processing

  • A flag that says whether or not to perform word derivations

  • The minimum size a word may be processed down to

  • The list of suffixes to use in suffix processing

  • The list of prefixes to use in prefix processing

  • A start delimiter expression

  • An end delimiter expression

  • A flag indicating to include the starting delimiter in the hit

  • A flag indicating to include the ending delimiter in the hit

  • A list of high frequency words/phrases to ignore

  • The default names of the Thesaurus files

  • Two optional, user-written, Thesaurus list editing functions

  • The list of suffixes to use in equivs lookup

  • A flag indicating to look for the within operator (w/)

  • A flag indicating to lookup see references

  • A flag indicating to keep equivalences

  • A flag indicating to keep noise words

  • A user data pointer

Usually the developer will have no need to modify the contents of this structure more than one time to tailor it to their application, but in some applications it will be very desirable to be able to modify its contents dynamically. Two calls are provided that handle the manipulation of this structure:

APICP * openapicp(void)             /* returns an APICP pointer */

APICP * closeapicp(APICP *cp)  /* always returns a NULL pointer */

The openapicp() function creates a structure that contains a set of default parameters and then returns a pointer to it. The closapicp() function cleans up and releases the memory allocated by the openapicp() function. Between these two calls the application developer may modify any of the contents of the APICP structure.

There are five function calls that are associated with the actual API retrieval function; they are as follows:

MMAPI *openmmapi(char *query,APICP *cp)

int   setmmapi(MMAPI *mm,char *query)

char  *getmmapi(MMAPI *mm, char *buf, char *endofbuf, int operation)

int   infommapi(MMAPI *mm, int index, char **what, char **where,
                int *size)

MMAPI *closemmapi(MMAPI *mm)

The openmmapi() function takes the set of default parameters from the APICP structure and builds an MMAPI structure that is ready to be manipulated by the other four functions. It returns a pointer to this structure.

The setmmapi() function is passed a standard Metamorph query (see examples) and does all the processing required to get the API ready to perform a search that will match the query. If the application program wishes to, it can define a function that will be called by the setmmapi() function to perform editing of the word lists and query items before the initialization is completed (this is not required).

The getmmapi() function performs the actual search of the data. All that is required is to pass the getmmapi() function the beginning and ending locations of the data to be searched. There are two operations that may be performed with the getmmapi() call; SEARCHNEWBUF and CONTINUESEARCH. Because there may be multiple hits within a single buffer, the search-new-buf command tells the API to locate the first hit, and then by using successive calls with the command continue-search you will locate all the remaining hits in the buffer.

The infommapi() function returns information about a hit to the caller; it will give the following information:

  • Where the hit is located within the buffer.

  • The overall length of the hit.

  • For each set in the search that was matched:

    1. The query set searched for and located.

    2. The location of the set item.

    3. The length of the set item.

  • The location and length of the start and end delimiters.

The closemmapi() function cleans up and releases the memory allocated by the openmmapi() call.

The last of the important calls in the API is the function that reads data in from files. While your application may not require this function, if files are being read in as text streams the use of this function is mandated.

int rdmmapi(char *buf,int n,FILE *fh,MMAPI *mm)

This function works very much like fread() with one important exception; it guarantees that a hit will not be broken across a buffer boundary. The way it works is as follows:

  • A normal fread() for the number of requested bytes is performed.

  • rdmmapi() searches backwards from the end of the buffer for an occurrence of the ending delimiter regular-expression.

  • The data that is beyond the last occurrence of an ending delimiter is pushed back into the input stream. (The method that is used depends on whether an fseek() can be performed or not.)


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.