SYNOPSIS<rmcommon $data $template [$maxrm]>
DESCRIPTION
The rmcommon function removes the prefix and suffix text from
each $data value that is shared with the corresponding
$template value. Up to $maxrm characters are removed,
rounded down to the nearest word boundary; the default is the maximum
amount of common text. This function is useful in stripping common
header and footer text from web pages before indexing.
DIAGNOSTICSrmcommon returns $data with its common prefix/suffix
text removed.
EXAMPLE<$template = "Acme Industries, Inc. [Data] Home Next Previous">
<rmcommon $data $template>
<SQL NOVARS "insert into webpages
values(counter, $Url, $ret)">
</SQL>
In the above example, $template is set to a template
representative of a typical (formatted) page from a web site, i.e. an
actual fetched page. Like all pages from this site, it contains the
same title prefix and navigation-bar suffix that we want to strip
before indexing, to prevent useless hits on "Acme" for
example. By using this template with <rmcommon> against every
fetched page $data, the prefix/suffix is stripped before
insertion into the database. Thus, if $data was initially
"Acme Industries, Inc. Widgets and Gadgets Home Next
Previous", after the <rmcommon> call it would be inserted as
"Widgets and Gadgets".
CAVEATS
The rmcommon function was added in version 3.01.984600000 20010314.