<rmcommon $data $template [$maxrm]>
rmcommon function removes the prefix and suffix text from
$data value that is shared with the corresponding
$template value. Up to
$maxrm characters are removed,
rounded down to the nearest word boundary; the default is the maximum
amount of common text. This function is useful in stripping common
header and footer text from web pages before indexing.
$data with its common prefix/suffix
<$template = "Acme Industries, Inc. [Data] Home Next Previous">
<rmcommon $data $template>
<SQL NOVARS "insert into webpages
values(counter, $Url, $ret)">
In the above example,
$template is set to a template
representative of a typical (formatted) page from a web site, i.e. an
actual fetched page. Like all pages from this site, it contains the
same title prefix and navigation-bar suffix that we want to strip
before indexing, to prevent useless hits on "
example. By using this template with
<rmcommon> against every
$data, the prefix/suffix is stripped before
insertion into the database. Thus, if
$data was initially
"Acme Industries, Inc. Widgets and Gadgets Home Next
Previous", after the
<rmcommon> call it would be inserted as
"Widgets and Gadgets".
rmcommon function was added in version 3.01.984600000 20010314.