THUNDERSTONE NEWS

April 2003 - Archive

CONTENTS


NEW WEBINATOR UNPACKING FEATURES

Webinator and Texis now can automatically unpack many types of compressed files and index the contents. The most common use of this feature will be to index *.zip, gzip (*.gz), and Unix tape archive (*.tar) files. However, it also handles a variety of other formats if you have the appropriate unpacker or translator. Examples include Rich Text Format (*.rtf), Microsoft Help (*.hlp and *.chm), and Microsoft TNEF files (attachments).

In addition, any other format for which you have a translator or unpacker can be handled by editing the config file!

The unpacking feature is now part of the Webinator and Texis File-Format plug-in (anytotx) from version 4.3 on. Texis maintenance customers or those with Webinator paid versions 4.0+ may request a copy of the new plug-in from Tech Support. Other customers may obtain the new plug-in by upgrading Webinator or joining Texis Maintenance.


NEW TEXIS FILE CRAWLING FEATURE

Texis can now crawl both local and network accessible files. This makes it easier to index documents that are not served by a web or FTP server. The feature is implemented as an enhancement to the Vortex <fetch> statement, which can now fetch file:// URLs, so network files can be indexed directly. For example, to get the file C:\myfile.txt, use the statement <fetch "file:///c|/myfile.txt">.

This feature is available as of Texis version 4.3. The enhancement eliminates a step some customers used in the past, involving an <exec> or <stat> of a directory listing, then reading the individual file names to insert into Texis. It also eliminates having to manually map such files to a file:// URL during the search, since the URL can be fetched and stored as-is.

The new feature also is used in the dowalk script distributed with Texis releases. This means Texis customers can use the Webinator application to crawl network directories together with http:// and other URLs, right out of the box. Customers with Webinator only need to upgrade to Texis to take advantage of this feature.


MEET US IN WASHINGTON IN APRIL

Thunderstone will be exhibiting at FOSE, April 8-10, and e-gov Knowledge Management Conference, April 16, both in Washington D.C. Please stop by our display to talk about how you use Texis, Webinator, or Vortex! (Let us know you read about it here!) Admission to the exposition hall is free for government employees at both events.


TECH CORNER: CALCULATED RANKS

With the ability in Texis to evaluate SQL expressions, you can order by a computed value. One common use is to modify the relevance ranking to include data from another field, such as date or price.

For example, John Punshon of BMW UK wanted to return results ordered such that 80% weight was given to $rank (the relevance score assigned by Texis), and 20% to the age of the document. He came up with the following Vortex code:

<$now=(convert( 'now' , 'date' ))> 
<SQL ROW SKIP=$skip MAX=10 "select id, Date,
     (($$rank+5)/10)+(7300/((($now - Date)/86400)+365)) relevance
     from content where body likep $query
     order by 3 desc">

The value of $rank was between 0 and 799 in their experience, so (($$rank+5)/10) will return between 0 and 80. 7300/365 is 20, which will be the weight given to today's records (86400 is the number of seconds in a day), and they will taper off approaching 0 as they get older. The ORDER BY 3 uses the third field selected to sort.

You can use these examples as guidelines for your own sorts. Please let us know of other cool examples you come up with!


Feedback, suggestions and questions are welcome to
Copyright © 2024 Thunderstone Software LLC. All rights reserved.