wordlist, wordcount, wordoccurrencecounts - get words and frequencies from index

SYNOPSIS

<wordlist $table [$field [$wordsOrWildcards [$options]]]>
<wordcount>
<wordoccurrencecounts>


DESCRIPTION
The wordlist function returns a list of the words in the given $table, as found in a Metamorph index. The first Metamorph index found is used, for $field if given, otherwise any field. Each value of $wordsOrWildcards can be a single word, in which case only that word is returned, or a word-prefix followed by "*" (asterisk), in which case only words having that prefix are returned. If $wordsOrWildcards is not given, all words are returned.

The following values for $options are accepted:

  • NOCOUNTS

    No counts are available via wordcount. This is to conserve memory when examining a large list when the counts are not needed.

  • db $db

    Look for $table in database $db, instead of the current database. Note that these must be consecutive values of a single $options argument. Added in version 7.05.1459800000 20160404.

The wordcount function returns a list of the row counts of each corresponding word returned by the previous wordlist, i.e. the number of rows each word occurs in.

In version 6 and later, the wordoccurrencecounts function returns a list of the occurrence counts of the corresponding words. E.g. if a word occurs twice in each of 10 documents, its wordcount value will be 10, while its wordoccurrencecounts value will be 20. Note that word occurrence information is only stored for inverted Metamorph indexes: non-inverted indexes will return 0 or nothing for word occurrence values.


DIAGNOSTICS
wordlist returns a list of the words found in a Metamorph index. wordcount returns the corresponding document frequencies of those words. wordoccurrencecounts returns the hit counts (every word every doc).


EXAMPLE
This example prints a list of the words and their frequencies in the title field of the table books, sorted by ascending frequency (e.g. rarest first):

<wordlist "books" "title"><$words = $ret>
<wordcount>
<sort $ret $words>
<LOOP $ret $words>
  $words $ret
</LOOP>


CAVEATS
The wordlist and wordcount functions were added Feb. 20 1997. The NOCOUNTS option was added in March 1999.

A Metamorph index must exist on the named table/field for wordlist to work. Note that what constitutes a word, and how many words there are, is dependent on the Metamorph index, how it was created (e.g. the index expression), and when it was last updated.


Copyright © Thunderstone Software     Last updated: Oct 24 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.