Data from Field

Syntax: REX expression, Replace expression, field to search, where to store it

This provides alternate means of setting both the HTML fields (Modified, Title, Description etc.) and any Additional Fields. It allows getting page information from non-default places by searching and optionally replacing the data. New blank rows will be provided as rows are used. See below for examples.

REX Search - Allows you to specify a REX expression to narrow down what contents of the From Field will be used. Leave it empty to use the entire field. See here for details on REX search syntax.

Note that a REX Search must be specified for the following From field types:

  • HTML

  • HTML, raw output

  • Text
You can specify they entire field for these by using .+ as the REX Search.

Replace - Replace can be used to specify a subset of the value to be stored in the To field (or subset of the match, if REX Search is used. See here for details on REX replace syntax.

From Field - specifies what the source field is for the data.

  • HTML - the raw HTML source of the page. After matching, HTML tags are removed and HTML entities are resolved.

  • HTML, raw output - the raw HTML source of the page. Content is left as-is, with tags in place.

  • Text - the text of the page, after HTML rendering has been applied.

  • Title - the HTML title of the page

  • All Meta - the contents of all HTML <meta> headers - name, http-equiv, property, itemprop (but see From Meta Field footnote) - and HTTP headers specified in the document.

  • Meta Field -> - the contents of a specific <meta>/HTTP field, specified in the next input box, From Meta Field.

  • Keywords - the contents of the Keywords and/or Keyword meta field.

  • Description - the contents of the Description and/or Subject meta field.

  • Mime Type the MIME type of the page. This may have been derived from the Content-Type header, a <meta http-equiv> tag, or the URL extension, depending on what is available.

  • URL - the URL of the page.

  • URL Decoded - the decoded version of the URL. Any %XX 'URL-safe' sequences in the URL are replaced with their real characters. E.g. Pre%20%2D%20Expense%20Report.doc is decoded into Pre - Expense Report.doc.

  • URL Protocol - the URL's protocol, e.g. http.

  • URL Host - the host (without port number) from the URL.

  • URL Host and Port - the host (and port number if given) from the URL.

  • URL Path - the file path from the URL.

  • URL Path Decoded - the file path from the URL, URL-decoded.

  • URL Anchor - the anchor from the URL (if any), i.e. the part after the # (pound sign). May not be available if already stripped.

  • URL Query - the query string from the URL (if any), i.e. the part after the ? (question mark).

  • URL Query Var -> - the value of the URL query-string variable named in From Meta Field, URL-decoded.

  • Referrer's Data - the value of a referring pages field. Store refs is required for this. The field selected will be the same field being populated.

From Meta Field - If Meta Field -> or URL Query Var -> is given as the From Field, this field is used to specify which meta field's or query var's contents to use as data. Leave blank otherwise.

Entering text in this field will force the use of Meta Field ->, if From Field is set to anything besides Meta Field or URL Query Var.

To Field - specifies where information should be stored.

  • Modified, Title, Description, Keywords, Depth, and Body - Override the standard fields extracted from the content.

  • Authorization URL - Populates the URL used when checking this result for Results Authorization. Please see the Allow Authorization URL section (3.6.58) for more details.

  • Category - To populate the category via Data From Field, all the possible category names must be entered in the Category setting. Using one or more Data From Field rules to set Category will cause the Appliance to ignore the Categories' URL Patterns and instead set category membership based on these Data From Field rules.

    Note: due to the way categories are stored, if categories are added, reordered, or removed after content has been walked, then a New walk will need to be performed to update the content's categories. Renaming categories does not need a rewalk.

  • Additional Links - This target allows you to use Data From Field to create links that will be walked. These links are subject to the normal indexing rules, will be rejected if they match exclusions, etc.

    Use of this Data From Field target has no effect on the existing links found on the current URL. The links generated by this target will be added to the standard set of links on the page.

  • Subfetch - This causes the Search Appliance to take the value(s) it finds and performs a fetch as URL(s). The URL can be absolute, or relative to the current URL.

    Nothing is changed by the subfetch itself, but any further Data From Field rules will use that fetched document(s) as the source of its content. Please see the Subfetch example below for a situation where this could be used.

  • Additional Fields - If this profile has any Additional Fields, they will be available as a target To Field.

    If you just added the name of a new Additional Field, you will need to hit Update for the new Additional Field to appear in the To Field list.

Append - If set to Y, then the Data From Field content will be appended to the field's existing data instead of overwriting it. Date-type targets, such as Modified, do not support Append.



Copyright © Thunderstone Software     Last updated: Nov 8 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.