Data From Field Example - Subfetch to use PDF Contents for a Web Page

Subfetches allow you to use content from other URLs to populate the current URL's record. We may have a site about articles, where each article has a web page describing the article, and a link to a PDF of the actual article. We'd like searches that match article contents to take us to the web page, not the article PDF itself.

If the web page has a meta header called "pdfLink" with a URL to the article PDF, we can use the body of the PDF as a replacement for the web page's body with two Data from Field rules like this:

First Data from Field rule:

  • REX Search: (Empty)

  • Replace: (Empty)

  • From Field: Meta Field ->

  • From Meta Field: pdfLink

  • To Field: Subfetch

Second Data from Field rule:

  • REX Search: .+

  • Replace: (Empty)

  • From Field: Text

  • From Meta Field: (Empty)

  • To Field: Body

The Subfetch Data from Field rule fetches the URL specified in the pdfLink header. While this grabs the PDF, it doesn't change anything on its own. We then pull from the PDF's text output, and use that as the Body of the current web page.


Copyright © Thunderstone Software     Last updated: Nov 8 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.