"We use the Thunderstone Search Appliance to crawl, index and search Word files, PDFs and other content in our law firm's internal document management system. The Appliance gives us a lot of customization options in the way it operates, with excellent control over precisely what we want to make searchable and what we don't want included. It does everything we need it to do. You can just plug it in and forget about it. It works great. After years of trouble-free performance, when we finally did have a hardware failure — Thunderstone had us quickly up and running again on the same day we received our replacement unit. Their level of customer support is almost unheard of in the I.T. industry."
Michael E. Salopek
Janik, Dorman & Winter, L.L.P.
Last time we discussed exclusions and requirements for managing what pages your crawler gets, but there's one setting that gets a Tech Tips all to its own: Exclude by Field. It gives you extra power in how you're excluding and what exactly is being excluded.
Rather than a prefix or substring match, Exclude by Field uses a "Metamorph query", which is the full-text matching engine used for our normal searches. You can simply type in words to match, or if you begin with a slash (/) then it is treated as a REX expression (our RegEx-like pattern matching language; see the "REX" section in the Vortex documentation on our website for more details).
All previously discussed exclusion & requirement options operate only on the URL itself. Exclude by Field allows you to exclude based on a number of different other areas:
Beyond more power in specifying what to match, Exclude by Field also gives you more control with what to do when you get a match.
A disadvantage that Exclude by Field has when using any Field except URL is the page must be fully fetched before the rule can be applied.
With all other exclusion rules (and Exclude by Field on URL), the URL can be thrown out before the page is fetched an processed.
When performing Exclude by Field on the content of the page, though, the page must be downloaded and fully processed before we can know if it has HTML or a Body that matches the rules specified.
When possible, it's better to use other exclusion rules or the URL target for Exclude by Field, as this will allow you to prune URLs before they are fetched. Still, there are many things that Exclude by Field can do that the other settings simply can't (as mentioned below).
A perfect example of Exclude by Field is directories when performing a file crawl — we can't fully exclude directories because they are what link to all the files, and without them we'd have nothing. Still, we might want them not to show up in the search. We can get this with Exclude by Field.
If you have any questions about how to use Exclude by Field, please feel free to contact Thunderstone Support — and we'll discuss it.
The February 2009 issue of CRN, a publication of Everything Channel and ChannelWeb.com, recognized the "top Channel Chiefs in the industry based upon their record of business innovation and dedication to the partner community." This annual list, which CRN calls "Our definitive guide to the movers and shakers of I.T. channel management," included Frederick A. Harmon (Thunderstone's Channel Director & CSO.)
You can visit the CRN website (http://www.crn.com/crn/chiefs/2009cc.jhtml?chief=136) to view pertinent information about Fred Harmon in the 2009 Channel Chiefs list.
Thunderstone's John Turnbull (President and CEO) will present a workshop session entitled The Next Generation in Search: Today's Best Practices on Friday, April 17, 2009, (2:00 p.m. - 3:30 p.m.) during the DigitalNow 2009 Conference at Disney's Yacht and Beach Club Resorts in Lake Buena Vista, Florida.
Search has progressed from a complex tool used by librarians through simple tools that let users perform a keyword search, to today's information access tools that can still provide users a simple interface but make use of much of an association's collective knowledge. In this workshop participants will learn what sorts of information can be behind a search engine and how to make it more valuable to users. The session includes a case study from IEEE, the world's largest technical membership association that significantly improved their business by focusing on their customers and helping them access content in new ways.
DigitalNow (http://www.fusionproductions.com/digitalnow/) is an annual conference that brings together senior-level executives and volunteer leaders from some of the most influential professional and trade associations in America. Produced by Fusion Productions and Disney Institute, two of the foremost authorities in adult educational design, with input from registered attendees and a conference advisory board, DigitalNow addresses the critical issues facing association leaders in the digital age.
The AIIM International Exposition + Conference, the yearly gathering for information management professionals across industries and lines of business, will take place Monday, March 30, through Thursday, April 2, 2009, at the Pennsylvania Convention Center in Philadelphia, PA. With 19 tracks, more than 135 conference sessions featuring more than 100 real-world case studies, and an Expo floor showcasing 200+ information management technology solution providers, the event aims to provide attendees with actionable insight they can use.
REGISTER TODAY FOR YOUR FREE EXPO FLOOR PASS
and get access to all keynotes, general sessions,
Expo floor education and the co-located ON DEMAND Expo!
To receive your free pass, use Registration Code: 615M
when you register at WWW.AIIMEXPO.COM
or call +1 888 824 3004.
Your FREE pass comes to you compliments of Thunderstone Software. Please stop by and visit Fred Harmon (Channel Director & CSO) and Peter Thusat (Communication Director & CMO) at Booth 1045.
Feedback, suggestions and questions are welcome. Send your email to