refInfoGetRawDocOffset

SYNOPSIS

int64 refInfoGetRawDocOffset(refInfo ref)

Parameters:

  • ref - A refInfo object

Returns:

  • The int64 character offset of the reference's element in the raw document (HTML)


DESCRIPTION
The refInfoGetRawDocOffset function returns the int64 character (not byte) offset of the reference's element in the raw document (here), which is the original static HTML that was downloaded (after any transfer/content encodings are decoded). The offset is the number of characters from the start of the document to the opening tag of the reference.


EXAMPLE

<urlinfo rawdoc><$html = $ret>
  <$offset = (refInfoGetRawDocOffset($ref))>
  <$length = (refInfoGetRawDocLength($ref))>
  <substr $html $offset $length>
  Reference's HTML: $ret


CAVEATS
Note that the returned offset is in characters, not bytes. Thus when using <substr>, a $mode argument suitable for the raw document's character set may need to be provided; i.e. ISO-8859-1 if the document is ISO-8859-1 not UTF-8.

The offset may be -1 (unavailable) in some instances, e.g. if the reference was generated by JavaScript.


SEE ALSO
refInfoGetProcessedDocOffset


Copyright © Thunderstone Software     Last updated: Oct 24 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.