<pdfxml $query $Body [options ...]>
pdfxml function enables Metamorph hit markup to be viewed
in a PDF (Adobe Acrobat) file. It executes the Metamorph
$query on a PDF document (
$Body) and returns the
information needed by a user's PDF browser plugin to mark up the
$Body must be the exact text returned by the PDF
plugin for Thunderstone's Webinator or Texis. If it is modified
in any way, character and word counts may be off, causing incorrect
highlighting information to be sent to the browser.
Options that may be specified are:
$colorThe color to show highlighted terms in. Must be a color specified in RGB of the form
#RRGGBB(by Adobe specification).
wordsFlag that indicates that the hit markup should use word mode. Mutually-exclusive with
charactersmode. This is the default, and should be used for
anytotxplugins that use the "Adobe Acrobat TK" library (
anytotx --identifydoes not show a
pdf:version; prior to version 4.02.1038324681 20021126). Added in version 4.00.999800000 20010906.
charactersFlag that indicates that the hit markup should use character mode. Mutually-exclusive with
wordsmode. This mode should be used with
anytotxplugins that use the XPDF library (
anytotx --identifyshows a
pdf:version; after version 4.02.1038324681 20021126). Added in version 4.00.999800000 20010906.
activeIndicates that the browser's PDF viewer should jump to the first match upon displaying the document. Mutually-exclusive with
passive. This is the default. Added in version 4.00.999800000 20010906.
passiveIndicates that the browser's PDF viewer should not jump to the first match upon displaying the document. Mutually-exclusive with
active. Added in version 4.00.999800000 20010906.
showhitsIndicates that the matching terms should be included in the XML output as comments. This is primarily for debugging purposes and should generally not be used in a production environment. Added in version 4.00.999800000 20010906.
startpg $pgSpecifies the page number that the
$Bodydocument actually starts at. The default is 0 (the first page), i.e.
$Bodyis the complete document. This option is used to keep the browser plugin in sync when partial-document
$Bodyarguments (e.g. single pages) are used. For example, if
$Bodyactually starts at the third page of the original document, use
startpg 2. Added in version 5.00.1092761457 20040817.
charset $charsetSpecifies the character set of
$Body. The default is ISO-8859-1. Multi-byte character sets such as UTF-8 can cause erroneous highlighting offsets if the character set is not specified with this option. Note that this is the character set of
$Body, not necessarily that of the original PDF. Added in version 5.01.1104778576 20050103.
pdfxml function returns a list of strings to be sent to
the Web browser's PDF viewer plugin.
<SQL MAX=1 "select Body from html where id = $id">
<pdfxml $query $Body "#00FF00">
<SQL "select Url, Title, id from html
where Title\Body like $query">
<substr $Title 0 14>
<IF $ret eq "PDF Document (">
<$ret = "">
In this example, the
main function excerpt prints the URLs
for documents matching
$query, from a Webinator
table. For most documents this is
http:// plus the
$Url. For PDF documents however, an anchor is attached
#xml=...) that contains the URL to hit markup information. If
the user's Web browser is configured with a PDF viewer, the viewer
will fetch this hit information URL (which points to the
function here) and use it to mark up the PDF document.
pdfxml function was added in version 2.1.864700000 19970527.
The PDF (
anytotx) plugin for Thunderstone's Webinator must be
used to generate the
$Body value passed to
Also, the web user's browser must have a configured PDF viewer to
fetch and use the PDF markup information.
The XML generated by this function conforms to Adobe's "Highlight File Format" specified in Adobe Technical Note #5172. It does not necessarily conform to any XML "standard".