Result ranking is a useful feature, although due to the variety of cases where you might want to use ranking, there are a number of variables that control the ranking algorithm.
The first major choice will be whether proximity is important. This
will indicate if you want to use LIKER
or LIKEP
. LIKER
uses the index
to determine the frequencies of the terms, and the presence of absence
of the terms in each document to determine the rank for each document.
Each term is assigned a weight between 0 and 1000, and the rank value
for the document is the sum of the weights for all the terms that
occur.
LIKER
has a threshold value, such that documents with a lower
rank value than the threshold value will not be returned. This prevents
a large number of irrelevant documents from being returned. Initially
the threshold is set to the weight of the term with the highest weight.
If there are more than five terms then the threshold is doubled, and if
there are more than 10 terms the threshold is doubled again. This keeps
queries containing a lot of terms from returning irrelevant hits. It
is possible to force the threshold lower if desired to return more records.
This can be performed either by specifying the maximum number of records
a term should occur in, and still be returned by LIKER
. This is the
likerrows
variable. For example, in a three term query, where
the terms occur in 400, 900 and 1400 records respectively, setting
likerrows
to 1000 would allow records containing only the second
search term to be returned.
In general LIKEP
will perform the same initial step as LIKER
to
determine which documents to rank. LIKEP
then looks at the
likeprows
highest ranked documents from LIKER
, and recalculates
the rank by actually looking inside the document to see where the
matching terms occur. Because of this it will be slower than LIKER
,
although if you are using a Metamorph inverted index the ranks may
still be determinable from the index alone, saving actual table
accesses.
There are a number of variables that can be set with LIKEP
, which
affect both how documents are ranked, as well as how many documents
are returned. See the "Rank knobs" (here) and
"Other ranking properties" (here) discussions
in the Server Properties section of the manual.