14.2 Detailed Page Info

Once a Web document is fetched, Vortex automatically processes it for additional information. Several calls exist to provide this information about the most recently fetched document:

  • <urllinks>
    We can obtain the URL links from the page with <urllinks> . These are returned fully-qualified, ie. relative links have the host name prepended, so they can be passed directly to <fetch> .

  • <urltext>
    Often we want the plain, formatted text that the user sees, without any HTML tags. This is useful for text indexing, and it is returned by the <urltext> function.

  • <urlinfo nbsp;title>
    This returns the title of the document, if any.

  • <urlinfo nbsp;contenttype>
    This returns the MIME Content-Type of the document. For example, if we really fetched a GIF, it would be "image/gif ".

  • <urlinfo nbsp;metaname nbsp;Description>
    This shows the value of the <META nbsp;NAME=Description> tag in the document, if any. Useful for adding to a search database.

  • <urlinfo nbsp;errmsg>
    This gives us the error message from the last fetch. For example, Document not found if the Web server couldn't find the page.

There are more options available; see the Vortex manual on <urlinfo> .

Back: The fetch Statement - Continued Next: Controlling Fetch Behavior
Copyright © 2024 Thunderstone Software LLC. All rights reserved.