SYNOPSIS<pdfxml $query $Body [options ...]>
DESCRIPTION
The pdfxml
function enables Metamorph hit markup to be viewed
in a PDF (Adobe Acrobat) file. It executes the Metamorph
$query
on a PDF document ($Body
) and returns the
information needed by a user's PDF browser plugin to mark up the
resulting hits.
Note that $Body
must be the exact text returned by the PDF
plugin for Thunderstone's Webinator or Texis. If it is modified
in any way, character and word counts may be off, causing incorrect
highlighting information to be sent to the browser.
Options that may be specified are:
$color
The color to show highlighted terms in. Must be a color specified
in RGB of the form #RRGGBB
(by Adobe specification).words
Flag that indicates that the hit markup should use word mode.
Mutually-exclusive with characters
mode. This is the default,
and should be used for anytotx
plugins that use the "Adobe
Acrobat TK" library (anytotx --identify
does not show a
pdf:
version; prior to version 4.02.1038324681 20021126).
Added in version 4.00.999800000 20010906.characters
Flag that indicates that the hit markup should use character mode.
Mutually-exclusive with words
mode. This mode should be
used with anytotx
plugins that use the XPDF library
(anytotx --identify
shows a pdf:
version; after
version 4.02.1038324681 20021126). Added in version
4.00.999800000 20010906.active
Indicates that the browser's PDF viewer should jump to the first
match upon displaying the document. Mutually-exclusive with
passive
. This is the default. Added in version
4.00.999800000 20010906.passive
Indicates that the browser's PDF viewer should not jump to the
first match upon displaying the document. Mutually-exclusive with
active
. Added in version 4.00.999800000 20010906.showhits
Indicates that the matching terms should be included in the XML
output as comments. This is primarily for debugging purposes and
should generally not be used in a production environment. Added
in version 4.00.999800000 20010906.startpage $pg
or startpg $pg
Specifies the page number that the $Body
document actually
starts at. The default is 0 (the first page), i.e. $Body
is
the complete document. This option is used to keep the browser
plugin in sync when partial-document $Body
arguments
(e.g. single pages) are used. For example, if $Body
actually starts at the third page of the original document, use
startpg 2
. Added in version 5.00.1092761457 20040817.charset $charset
Specifies the character set of $Body
. The default is
ISO-8859-1. Multi-byte character sets such as UTF-8 can cause
erroneous highlighting offsets if the character set is not
specified with this option. Note that this is the character set
of $Body
, not necessarily that of the original PDF.
Added in version 5.01.1104778576 20050103.
DIAGNOSTICS
The pdfxml
function returns a list of strings to be sent to
the Web browser's PDF viewer plugin.
EXAMPLE<EXPORT $query>
<EXPORT $id>
<A NAME=xml>
<SQL MAX=1 "select Body from html where id = $id">
<pdfxml $query $Body "#00FF00">
<LOOP $ret>
$ret
</LOOP>
</SQL>
</A>
<A NAME=main>
...
<SQL "select Url, Title, id from html
where Title\Body like $query">
<substr $Title 0 14>
<IF $ret eq "PDF Document (">
<CAPTURE>#xml=http://$HTTP_HOST$url/xml.txt</CAPTURE>
<ELSE>
<$ret = "">
</IF>
<A HREF="http://$Url$ret">$Title</A>
</SQL>
</A>
In this example, the main
function excerpt prints the URLs
for documents matching $query
, from a Webinator html
table. For most documents this is http://
plus the
$Url
. For PDF documents however, an anchor is attached
(#xml=
...) that contains the URL to hit markup information. If
the user's Web browser is configured with a PDF viewer, the viewer
will fetch this hit information URL (which points to the xml
function here) and use it to mark up the PDF document.
CAVEATS
The pdfxml
function was added in version 2.1.864700000 19970527.
The PDF (anytotx
) plugin for Thunderstone's Webinator must be
used to generate the $Body
value passed to pdfxml
.
Also, the web user's browser must have a configured PDF viewer to
fetch and use the PDF markup information.
The XML generated by this function conforms to Adobe's "Highlight File Format" specified in Adobe Technical Note #5172. It does not necessarily conform to any XML "standard".