refInfoGetProcessedDocOffset

SYNOPSIS

int64 refInfoGetProcessedDocOffset(refInfo ref)

Parameters:

  • ref - A refInfo object

Returns:

  • The int64 character offset of the reference's element in the processed document (HTML)


DESCRIPTION
The refInfoGetProcessedDocOffset function returns the int64 character (not byte) offset of the reference's element in the processed document (here), which is the HTML that was processed (downloaded HTML plus any dynamic HTML, in UTF-8). The offset is the number of characters from the start of the processed document to the start of the opening tag of the reference.


EXAMPLE

<urlinfo processeddoc><$html = $ret>
  <$offset = (refInfoGetProcessedDocOffset($ref))>
  <$length = (refInfoGetProcessedDocLength($ref))>
  <substr $html $offset $length>
  Reference's HTML: $ret


CAVEATS
Note that the returned offset is in characters, not bytes. Thus the length is typically correct for use with <substr> (with default stringcomparemode) on <urlinfo processeddoc>, which is always UTF-8 if possible.

The offset may be -1 (unavailable) in some instances, e.g. if the reference was generated by JavaScript.


SEE ALSO
refInfoGetRawDocOffset


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.