Command Discussion

Result ranking is a useful feature, although due to the variety of cases where you might want to use ranking, there are a number of variables that control the ranking algorithm.

The first major choice will be whether proximity is important. This will indicate if you want to use LIKER or LIKEP. LIKER uses the index to determine the frequencies of the terms, and the presence of absence of the terms in each document to determine the rank for each document. Each term is assigned a weight between 0 and 1000, and the rank value for the document is the sum of the weights for all the terms that occur.

LIKER has a threshold value, such that documents with a lower rank value than the threshold value will not be returned. This prevents a large number of irrelevant documents from being returned. Initially the threshold is set to the weight of the term with the highest weight. If there are more than five terms then the threshold is doubled, and if there are more than 10 terms the threshold is doubled again. This keeps queries containing a lot of terms from returning irrelevant hits. It is possible to force the threshold lower if desired to return more records. This can be performed either by specifying the maximum number of records a term should occur in, and still be returned by LIKER. This is the likerrows variable. For example, in a three term query, where the terms occur in 400, 900 and 1400 records respectively, setting likerrows to 1000 would allow records containing only the second search term to be returned.

In general LIKEP will perform the same initial step as LIKER to determine which documents to rank. LIKEP then looks at the likeprows highest ranked documents from LIKER, and recalculates the rank by actually looking inside the document to see where the matching terms occur. Because of this it will be slower than LIKER, although if you are using a Metamorph inverted index the ranks may still be determinable from the index alone, saving actual table accesses.

There are a number of variables that can be set with LIKEP, which affect both how documents are ranked, as well as how many documents are returned. See the "Rank knobs" (here) and "Other ranking properties" (here) discussions in the Server Properties section of the manual.


Copyright © Thunderstone Software     Last updated: Oct 5 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.