Data from Field

 

Syntax: REX expression, Replace expression, field to search, where to store it

This provides alternate means of setting both the HTML fields (Modified, Title, Description etc.) and any Additional Fields. It allows getting page information from non-default places by searching and optionally replacing the data. New blank rows will be provided as rows are used. See below for examples.

REX Search - Allows you to specify a REX expression to narrow down what contents of the From Field will be used. Leave it empty to use the entire field.

Note that a REX Search MUST be specified for the following From field types:

  • HTML

  • HTML, raw output

  • Text
You can specify they entire field for these by using .+ as the REX Search.

Replace - Replace can be used to specify a subset of the value to be stored in the To field (or subset of the match, if you're using REX Search. It uses sandr replacement string syntax.

From Field - specifies what the source field is for the data.

  • HTML - the raw HTML source of the page. After matching, HTML tags are removed and HTML entities are resolved.

  • HTML, raw output - the raw HTML source of the page. Content is left as-is, with tags in place.

  • Text - the text of the page, after HTML rendering has been applied.

  • Title - the HTML title of the page

  • All Meta - the contents of all meta headers specified in the HTML page.

  • Meta Field -> - the contents of a specific meta field, specified in the next input box, From Meta Field.

  • Keywords - the contents of the keywords meta header.

  • Description - the contents of the description meta header.

  • Mime Type the MIME type of the page. This may have been derived from the Content-Type header, a <META HTTP-EQUIV> tag, or the URL extension, depending on what is available.

  • URL - the URL of the page.

  • URL Decoded - the decoded version of the URL. Any %XX 'URL-safe' sequences in the URL are replaced with their real characters. E.g. Pre%20%2D%20Expense%20Report.doc is decoded into Pre - Expense Report.doc.

  • URL Protocol - the URL's protocol, e.g. http.

  • URL Host - the host (without port number) from the URL.

  • URL Host and Port - the host (and port number if given) from the URL.

  • URL Path - the file path from the URL.

  • URL Path Decoded - the file path from the URL, URL-decoded.

  • URL Anchor - the anchor from the URL (if any), i.e. the part after the # (pound sign). May not be available if already stripped.

  • URL Query - the query string from the URL (if any), i.e. the part after the ? (question mark).

  • URL Query Var -> - the value of the URL query-string variable named in From Meta Field, URL-decoded.

  • Referrer's Data - the value of a referring pages field. Store refs is required for this. The field selected will be the same field being populated.

From Meta Field - If Meta Field -> or URL Query Var -> is given as the From Field, this field is used to specify which meta field's or query var's contents to use as data. Leave blank otherwise.

Entering text in this field will force the use of Meta Field ->, if From Field is set to anything besides Meta Field or URL Query Var.

To Field - specifies where information should be stored.

  • Modified, Title, Description, Keywords, Depth, and Body - Override the standard fields extracted from the content.

  • Authorization URL - Populates the URL used when checking this result for Results Authorization. Please see the Allow Authorization URL section (4.6.55) for more details.

  • Category - To populate the category via Data From Field, all the possible category names must be entered in the Category setting. Using one or more Data From Field rules to set Category will cause Webinator to ignore the Categories' URL Patterns and instead set category membership based on these Data From Field rules.

    Note: due to the way categories are stored, if categories are added, reordered, or removed after content has been walked, then a New walk will need to be performed to update the content's categories. Renaming categories does not need a rewalk.

  • Subfetch - This causes the Webinator to take the value(s) it finds and performs a fetch as URL(s). The URL can be absolute, or relative to the current URL.

    Nothing is changed by the subfetch, but any further Data From Field rules will use that fetched document(s) as the source of its content. Please see the Subfetch example below for a situation where this could be used.

  • Additional Fields - If this profile has any Additional Fields, they will be available as a target To Field.

    If you just added the name of a new Additional Field, you will need to hit Update for the new Additional Field to appear in the To Field list.

    Additional Fields are supported in the full Texis product, but not Webinator-only.

Append - If set to Y, then the Data From Field content will be appended to the field's existing data instead of overwriting it. Date-type targets, such as Modified, do not support Append.



Copyright © Thunderstone Software     Last updated: Mar 7 2019
Copyright © 2019 Thunderstone Software LLC. All rights reserved.