Group By

Another way to utilize Parametric Search fields involves not searching for them directly, but using them to organize the results after the search. For example, a certain field may have been defined to indicate a sub-classification - type, style, region, etc. - that is shared by other documents or URLs. It may then be useful when searching to produce not only the standard summary of "N results" overall, but how may results matched in each type, with links to "drill down" further into a type. The Group By search settings can provide such clustered results or faceted navigation.

A Field may be selected to group search results by. After every search, the search results' summary will then be expanded to include a list of each distinct value of the Field found in the results, together with a count of results for that value. Each value also contains a link, which will continue the search but for that Field value only, enabling the user to "drill down" into that type/style/region/etc.

The order of results for a given Group By is determined by the "Order by" search setting: either by count or field value, ascending or descending. E.g. Count Descending will sort the highest-result-count Field value first; this is useful to show "popular" values first. Field Ascending will sort by the Field value itself; this may be useful when the Field values themselves are more significant than number of results in each.

By default, the number of results for a given Group By is limited to the Results per Page value. This can be changed by entering a value in the Max field. For example, a data set with only a few distinct Style values might have Max set to 3, to only show the top 3 Styles.

Up to 50 Group By fields may be specified, e.g. if there are other attributes to be grouped by. For example, a data set might be grouped by both Region and Price. Each additional Group By field may increase search time somewhat however.

To speed up searches when there are a large number of Group By fields set, or simply to "unclutter" the output, Max Group Bys may be set. This is the maximum number of Group Bys to actually perform; it defaults (if empty) to all Group Bys set (that are not in an exact-match infield query, i.e. not already drilled-down into). Only the top Max Group Bys group-bys are performed; the rest are still listed in the output, but collapsed, with links to expand them if the user desires. For example, for an automobile search, Group Bys might be set on Vehicle Type, Make, Model, Year, Price Range, Color, Mileage and Warranty. Expanding all of those attributes for every search might produce too much output for the user. By setting Max Group Bys to 3, only the top 3 Group Bys will be performed: the rest will have links for the user to expand. Thus, the output is less cluttered (and faster), yet the user can still expand and drill down on any of the attributes. As the user drills down on specific items in an expanded Group By, it is no longer shown (because all results are that item), and other Group Bys (lower priority) are now expanded.

Setting a Group By field will produce extra elements in the XML output version; see the section on XML Elements in Search Results here for details.

Another way to speed up searches with large numbers of results and Group By fields, is to set Max Docs to Group By. This is the maximum number of search result documents to pass to Group By; the default (i.e. if empty) is to pass all search results. For example, if a query generates a 500,000 document result set, all 500,000 documents must be grouped, which may take some time. If Max Docs to Group By is set to 100,000, then only the first 100,000 documents are grouped, and the counts scaled up to an estimate for the entire 500,000 document result set. This saves search time, at the expense of less accurate Group By counts, and potentially missing groups (e.g. groups that do not occur at all in the first 100,000 documents).

Note that if another query returns fewer than Max Docs to Group By results, then all the results are passed to Group By, and the counts are accurate and complete. Max Docs to Group By thus serves as a threshold to balance timely searches against accurate counts, and only kicks in on "noisy" queries above the threshold. When it takes effect, the countIsEstimate attribute of the XML element <GroupByResults> is set to Y in the XML output (if selected), to indicate that counts are estimated.


Copyright © Thunderstone Software     Last updated: Oct 10 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.