7.5 Large number of records


7.5 Large number of records

Databases require some special consideration when dealing with either a large number of records, or a large number of fields. We'll discuss the case of a large number of records first, although in both cases we are trying to make sure we do the least amount of work necessary.

When you have a large number of records one of the most important steps is ensuring that you are not reading records unnecessarily. Beyond that you need to make sure that the index is able to complete its work quickly. Fast search results on gigabytes of data, while easy to achieve for very simple, well-chosen queries, can be harder to achieve with more complicated queries. Don't be fooled by the fast results, and think all will work well.

A simple query, that should cause no problem would be a LIKE query on a single infrequent term, occurring in at most a few hundred records. You may want to try running a number of such queries initially to get an idea of possible search performance. The first couple of queries may be a little slow until the cache gets populated.

As the queries get more complicated, the amount of work that needs to be done increases, and the goal is to keep the amount of work needed to a minimum. The query protection features in Texis are designed to help with some aspects of this, and help ensure that you are not linear scanning large amounts of data, or that you aren't hitting too many words. There are other flags that can be modified to improve search performance in some cases.

When searching a large number of records the order of the results is often critical. The two most common orderings are by relevance and by date. LIKEP is the most common way to get results ordered by relevance. In general you want the initial search to be as simple as possible, and return good results. While there may be some users who want more control, or more options, if you produce good answers quickly the first time, you may be able to reduce the number of more expensive queries as people fiddle with settings.

First for LIKEP is to make sure that you have a METAMORPH INVERTED index, and that it is kept upto date. Linear searching records can be a big hit. With an up to date METAMORPH INVERTED index the default query protection settings in Vortex will ensure that you don't need to do a post-search on any records. Some terms may be dropped from the search if they are not resolvable. <Putmsg> will be called in such cases which, by default, displays in the HTML source, and vortex.log.

With a large data set it makes sense to enable likepallmatch for most searches. This will cause Texis to look for, and rank only those documents containing all the terms, rather than those containing any of the terms. If one of the terms is common then this will have a big impact. If a likepallmatch search returns no results it may be acceptable to switch to a non-likepallmatch search.

Setting likeprows can also impact the search. As Texis is generating the ranks it needs to keep track of the best likeprows answers. The lower the number the less it has to do. For a simple likep search you can set likeprows to the number of records needed (max + skip).

Another setting that can be helpful is likepindexthresh. This can be used with large result sets to stop the ranker after it has ranked a certain number of documents.

Back: RAM: cache most-common index

Next: Large number of searchable fields