SYNOPSIS<urlinfo $name [$which]>
DESCRIPTION
The urlinfo
function returns information about the last page
retrieved with fetch
or submit
. Inside a fetch
loop (e.g. with the PARALLEL
flag), this is the page just
returned for the current loop. The $name
argument describes
what to return; some values take a second $which
argument (as
noted below). Possible values for $name
and what they return
are:
actualurl
(string)
The last URL retrieved. It may differ from the argument to
fetch
or submit
, e.g. if redirects were followed.
Note that a fetch-terminating permanent redirect when
<urlcp followpermanentredirects>
(here) is off)
does not change actualurl
, as the redirect target is not fetched.
See also intermediateurls
(here).allmetas
(list)
The http-equiv name property itemprop values (i.e. meta names) of all such <meta> tags in the document. Added in version 8.01.1652308127 20220511.
allmeta $metaName
(list)
The entire content attribute values of the <meta> tags
(see allmetas definition) with single meta name
$metaName
. Names are case-insensitive. Added in version
8.01.1652308127 20220511.
allmetavalue $metaName
The leading parsed content value (i.e. before the ";") of the <meta> tags (see allmetas definition)
with single meta name $metaName
, where the content
value is in semicolon-parameter format (see headervalue
here for format example). Added in
version 8.01.1652308127 20220511.
allmetaparams $metaName
The parameter names from the semicolon-parameter-format content of the <meta> tags (see allmetas
definition) content with single meta name $metaName
.
See headervalue
here for
format example. Added in version 8.01.1652308127 20220511.
allmetaparam $metaName $paramName
The parameter values of the parameters with single name
$paramName
from the semicolon-parameter-format <meta>
tags (see allmetas definition) content attribute with
single meta name $metaName
(see headervalue
here for format example). Added in
version 8.01.1652308127 20220511.
allrefs [refInfo]
(list)
The list of all links, images, frames, iframes and string links
from the document, including ones suppressed by e.g
ignorerefsselectors
(here). This is essentially a
concantenation of the links
, images
, frames
,
iframes
and strlinks
values, but with duplicate
entries from the same tag/attribute tuple removed (e.g. a URL that
is both a frame and a link from the same tuple will only appear
once). Added in version 7.06.1463100000 20160512. Versions prior
to 8.01.1664481650 20220929 did not include suppressed references
(the concept did not exist). Note that the urlcp
settings
getframes
, getiframes
and/or getscripts
may
affect which URLs are returned. Note also that suppressed refs in
the list can be identified by their
refInfoGetSuppressedReason()
(here) value: it will be a
message other than the token ok.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
authparams
(list)
The list of names of parsed authentication parameters sent by the
server. The value for a particular parameter name can be obtained
with authparam $param
. Added in version 5.1. Note that
authentication parameters may not be available even if authentication
is used, if the server does not send them. For example, the
second and later requests on a connection may not need parameters,
if credentials are sent with the initial request and thus the server
does not need to challenge in the response.authparam $param
(string)
The authentication parameter $param
from the server.
$param
may be realm
for the Basic authentication
realm, target
for the NTLM target (i.e. domain), or
serverchallenge
for the NTLM server challenge nonce.
Added in version 5.1. Authentication parameter names are
case-insensitive.authscheme
(string)
The authentication scheme used. Returns one of the scheme tokens
used by <urlcp authschemes> (here).
Added in version 5.01.1239140000 20090407.authschemes
(list)
The list of authentication schemes currently allowed via <urlcp authschemes> (here). Added in version 7.04.
authschemehighest
(string)
The highest (most secure) authentication scheme used during the
entire transaction, i.e. across redirects (if any). E.g. if a
Basic
authentication protected page was fetched, which then
redirected to an anonymous-access page, authschemehighest
would return Basic
, even though authscheme
would
return anonymous
(from the last page). Returns one of the
scheme tokens used by <urlcp authschemes>
(here). Added in version 5.01.1239140000
20090407.canonicalurl
(string)
The canonical URL for the document, i.e. the URL that the document
(original URL) should be fetched as - but for robots only (not
humans). It is the <link rel=canonical href=...> value of
the last page fetched, if there were no non-permanent (HTTP code
302, 303 etc.) redirects seen. If there is no such <link>,
or non-permanent redirect(s) were seen, it is the same value as
the permanenturl
(here).
If <urlcp followpermanentredirects>
(here) is off), and
a fetch-terminating permanent redirect is encountered whose target
contains such a <link>, the link does not take effect, since
such a redirect target is not fetched. Otherwise
followpermanentredirects
does not affect
canonicalurl
.
Setting added in version 8.01.1656111533 20220624.
charsetconfigtotext
(string)
The current charset configuration, in the format used by
<urlcp charsetconfigfromfile>
(here). Added in version 6.charsetdetected
(string)
The charset of the source page, as detected by scanning the document,
without parsing explicit charset labels. Added in version 5.charsetexplicit
(string)
The charset of the source page, as explicitly set in a header or
<META HTTP-EQUIV> label. Returns Unknown
if
unknown or not set. Added in version 5.charsetsrc
or charsetsource
(string)
The charset of the source page, as interpreted by the parser.
This is taken from the first available source, in descending
priority: the charset as set by <urlcp charsetsrc>
; the
charset explicitly set in the page (header or meta); the charset
detected by scanning the document; or the <urlcp
charsetsrcdefault>
charset. Added in version 5.charsettxt
or charsettext
(string)
The charset of the formatted text (as returned by <urlinfo text>
.
Added in version 5.contenttype
(string)
The MIME content type of the page (without any parameters).
This may have been derived from the Content-Type header,
a <META HTTP-EQUIV> tag, or the URL extension, depending
on what is available. In version 7.06.1477065000 20161021 and later,
the value is returned lower-case for easier comparison, since
media types are case-insensitive.contenttypeparams
(list)
The names of parameters in the MIME content type, if any. In version 7.06.1477065000 20161021 and later, the names are returned lower-case for easier comparison, since media types parameter names are case-insensitive.
contenttypeparam
(list, 2 args)
The value(s) of the content type parameter(s) named $which
.
Multiple values may be given in $which
. Parameter names
are case-insensitive.contenttypesrc
(string)
Returns the source of contenttype
and related data, i.e.
how it was determined. One of "generated
", "header
",
"doctype
", "metaheader
", "urlpath
",
"contentscan
" or "unknown
". Added in
version 5.01.1116341784 20050517. Aka contenttypesource
.cookiejar [all] [netscape4x]
The contents of the "cookie jar" (Vortex's internal cache of
cookies received or set). Returned as a Netscape-cookie-file
format text buffer. By default, only persistent (non-session)
cookies are returned, i.e. the ones to be preserved across browser
invocations. If the argument all
is given, all cookies,
including session cookies, are returned. Added in version
4.01.1022000000 20020521.
In version 5.01.1244880000 20090613 and later, a new fifth column
was inserted in the output, containing the IsHttpOnly
boolean
value. To obtain the Netscape-4.x-compatible format of prior
versions, set the netscape4x
flag. <urlcp cookiejar>
will accept input in either format.
domvalue $dompath
Gets the value of the DOM item indicated by $dompath
. Note
that this is not the JavaScript DOM, but the near-parallel page
DOM. This can be used to get the submit URL and content for a
form on the page just fetched, e.g. document.forms.myForm.submitUrl
and document.forms.myForm.submitContent
, after optionally
setting form input values via <urlcp domvalue>
. Added in
version 5.downloaddoc
(string or varbyte
)
The network-transferred downloaded document body. This is the
same as rawdoc
if the document had no content/transfer
encodings. If it did have encodings, this is the
chunked/compressed/etc. document, before decompression into
rawdoc
. The downloaded document is normally discarded if
different from rawdoc
, to save memory; thus it may be empty
for documents with encodings. Set <urlcp savedownloaddoc on>
(normally off) to preserve the downloaded document (at potential
cost in memory). Added in version 5.01.1249203000 20090802. See
also rawdoc
, which is usually more useful.encodings
(list)
The list of content/transfer encodings of the response document,
in the order they were applied by the server. Known encodings
(e.g. gzip
) are canonicalized and lowercase. Note that
known and enabled encodings are already decoded (in reverse order)
in the <fetch>
or <urlinfo rawdoc>
returned document.
Added in version 5.01.1249203000 20090802.frames [refInfo]
(list)
The list of frame URLs in the document. If the urlcp
setting getframes
is true, the list is empty since the
frames have been fetched and appended to the document.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
iframes [refInfo]
(list)
The list of <IFRAME>
URLs in the document. If the
urlcp
setting getiframes
is true, the list is empty
since the iframes have been fetched and inserted into the
document.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
headers
(list)
The names of protocol (e.g. HTTP, HTTPS, or
generated-for-file://) response headers received with the document.header $hdrName
(list)
The full value(s) of the protocol response header(s) with single name
$hdrName
. Header names are case-insensitive.
headervalue $hdrName
The leading value
(i.e. before the ";") of the response
header(s) with single name $hdrName
, where the header is in
semicolon-parameterized format, i.e.:
value; param1=val1; param2="val 2"; ...Added in version 6.00.1287436000 20101018.
headerparams $hdrName
The parameter name(s) from the semicolon-parameterized response header(s)
with single name $hdrName
. Added in version
6.00.1287436000 20101018.
headerparam $hdrName $paramName
The parameter value(s) of the parameter(s) with single name
$paramName
from the semicolon-parameterized response header(s) with
single name $hdrName
. Added in version 6.00.1287436000
20101018.
errnum
(integer)
The Vortex fetch error code (not the HTTP or other protocol
code), indicating a problem with the fetch. This can be non-zero
even for a partially successful fetch, e.g. 15 if the page is too
big. 0 indicates a completely successful fetch. See
here for a list of errnum
codes and what
they mean.errtoken
(string)
A string token representing the numeric errnum
code, e.g.
DocNotFound
for error 24 (Document not found). This can be
used in scripts as a more readable and self-documenting value than
errnum
integer values, and more constant than errmsg
values (which may change in future releases). See
here for a list of tokens and corresponding
numbers and meanings. Added in version 5.01.1246963000 20090707.errmsg
(string)
A human-readable string description of the errnum
code.
See here for a list of possible error messages
and numbers.httpcode
(integer)
The value of the protocol response code, if any (for HTTP or FTP).
Note that this varies depending on the fetched URL protocol; the
errnum
value is more consistent. Typical HTTP codes and
what they mean are listed below. Note that this is not an
exhaustive list, as the protocol code is created and sent by the
web server, not Vortex. Codes will also vary for other (non-HTTP)
protocols, e.g. FTP:
httpmsg
(string)
The protocol response string, if any (HTTP or FTP). Varies by
protocol and server; check errmsg
instead for more
portable (platform-independent) messages.images [refInfo]
(list)
The list of image URLs in the document, e.g. <IMG>
tags,
background images, etc.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
intermediateurls
(list)
The list of intermediate URLs, if any, that were fetched before
the final URL returned. This includes redirects, FTP/file
dir/file retries, authorization retries, OPTIONS
Upgrades, CONNECT tunnels, and proxy retries. Added
in version 7.05.1450220000 20151215. See also actualurl
(here).
ipprotocols
(list)
The list of IP protocols currently allowed via
<urlcp ipprotocols>
(here).
Added in version 8.
ipprotocolsavailable
(list)
The list of IP protocols available, i.e. ostensibly supported by
the operating system. Returns zero or more of IPv4 and/or
IPv6. "Supported" merely means that a socket of the given
type may be opened; it does not necessarily mean a connection with
that protocol will succeed to a given host or address (e.g. OS
configuration, DNS, routing, firewall etc. issues may still
prevent it, and it must be allowed via <urlcp ipprotocols>
,
here). Added in version 8.
links [refInfo]
(list)
The list of non-image link URLs in the document, e.g.
<A HREF>
tags, <FORM>
tags, etc. Same as the return
value of the obsolescent urllinks
function. Note that
frames will be listed as links if the urlcp
setting
getframes
is false, iframes will be listed if
getiframes
is false, and script sources will be listed if
getscripts
is false. Note also that JavaScript string
links (here) are not included in this
list, as they are unreliable; but ordinary JavaScript links
are included.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
metaheaders
(list)
The names of <META HTTP-EQUIV> tags in the document.metaheader $hdrName
(list)
The entire value(s) of the <META HTTP-EQUIV> tag(s) with
single name $hdrName
. Header names are case-insensitive.
metaheadervalue $hdrName
The leading value
(i.e. before the ";") of the
meta header(s) with single name $hdrName
, where the header
is in semicolon-parameterized format (see headervalue
here for format example). Added in
version 6.00.1287436000 20101018.
metaheaderparams $hdrName
The parameter names from the semicolon-parameter-format content of the meta http-equiv headers with single name
$hdrName
(see headervalue
here for format example). Added in
version 6.00.1287436000 20101018.
metaheaderparam $hdrName $paramName
The parameter value(s) of the parameter(s) with single name
$paramName
from the semicolon-parameter-format meta
header(s) with single name $hdrName
(see headervalue
here for format example). Added in
version 6.00.1287436000 20101018.
metaitemprops
(list)
The itemprop values of <meta itemprop> tags in the document.
Added in version 8.01.1652308127 20220511.metaitemprop $metaItemprop
(list)
The entire content attribute values of the <meta itemprop>
tags with single itemprop $metaItemprop
. Names are
case-insensitive.
Added in version 8.01.1652308127 20220511.
metaitempropvalue $metaItemprop
The leading parsed content value (i.e. before the ";") of the <meta itemprop> tags with single itemprop $metaItemprop
, where the content is
in semicolon-parameter format (see headervalue
here for format example). Added in
version 8.01.1652308127 20220511.
metaitempropparams $metaItemprop
The parameter names from the semicolon-parameter-format content of the <meta itemprop> tags with single itemprop $metaItemprop
. Ssee headervalue
here for format example. Added in
version 8.01.1652308127 20220511.
metaitempropparam $metaItemprop $paramName
The parameter values of the parameters with single name
$paramName
from the semicolon-parameter-format <meta
itemprop> content attribute with single itemprop
$metaItemprop
(see headervalue
here for format example).
Added in version 8.01.1652308127 20220511.
metanames
(list)
The names of <meta name> tags in the document.metaname $metaName
(list)
The entire content attribute values of the <meta name>
tags with single name $metaName
. Names are
case-insensitive.
metanamevalue $metaName
The leading parsed content value (i.e. before the ";") of the <meta name> tags with single name
$metaName
, where the content is in
semicolon-parameter format (see headervalue
here for format example). Added in
version 6.00.1287436000 20101018.
metanameparams $metaName
The parameter names from the semicolon-parameter-format content of the <meta name> tags with
single name $metaName
. See headervalue
here for format example. Added in
version 6.00.1287436000 20101018.
metanameparam $metaName $paramName
The parameter values of the parameters with single name
$paramName
from the semicolon-parameter-format <meta
name> content attribute with single name
$metaName
(see headervalue
here for format example). Added in
version 6.00.1287436000 20101018.
metaproperties
(list)
The property values of <meta property> tags in the document.
Added in version 8.01.1652308127 20220511.
In version 8.01.1655305964 20220615 and later, the alias
metapropertys
is also available, for consistent pluralization
when iterating over property, itemprop etc.metaproperty $metaProperty
(list)
The entire content attribute values of the <meta property>
tags with single property $metaProperty
. Property names are
case-insensitive.
Added in version 8.01.1652308127 20220511.
metapropertyvalue $metaProperty
The leading parsed content value (i.e. before the ";") of the <meta property> tags with single property $metaProperty
, where the content is
in semicolon-parameter format (see headervalue
here for format example). Added in
version 8.01.1652308127 20220511.
metapropertyparams $metaProperty
The parameter names from the semicolon-parameter-format content of the <meta property> tags with single property $metaProperty
. See headervalue
here for format example. Added in
version 8.01.1652308127 20220511.
metapropertyparam $metaProperty $paramName
The parameter values of the parameters with single name
$paramName
from the semicolon-parameter-format <meta
property> content attribute with single property
$metaProperty
(see headervalue
here for format example).
Added in version 8.01.1652308127 20220511.
originalurl
(string)
The original URL retrieved (i.e. the one given to fetch
or
submit
). It may differ from the actual last URL retrieved, e.g.
if redirects were followed. Added in version 5.01.1205285000 20080311.originalrequestdate
(string)
The date the original URL (i.e. the one given to fetch
or
submit
) started its HTTP etc. command (i.e. after DNS).
Returned as a double
. Added in version 8.00.1636495457 20211109.
permanenturl
(string)
The permanent URL for the document, i.e. the URL that the document
(original URL) should be fetched as - for both humans and robots.
This is the last of the zero or more contiguous permanent (HTTP
code 301 or equivalent) redirects seen (treating the original URL
as one too), including any fetch-terminating one (i.e. if
<urlcp followpermanentredirects>
(here) is off). In
other words, the first non-permanent (HTTP code 302, 303 etc.)
redirect encountered (or the end of all zero or more redirects)
makes the previous (permanent redirect target, or original) URL
the permanent URL. Other intermediate URLs (e.g. for multi-fetch
authorization) do not affect the permanent URL. See also
canonicalurl
(here). Added in
version 8.01.1654749166 20220609.
prngdpid [$path]
(integer, 2 args)
The process ID of the prngd
daemon (entropy gatherer)
running on Unix file pipe $path
, 0 if none detected, -1 on
error. If no $path
(or an empty one) is given, all
standard paths ("/var/run/egd-pool
",
"/dev/egd-pool
", "/etc/egd-pool
",
"/etc/entropy
") and the configured path ([Texis]
Entropy Pipe
value in texis.ini
) are checked.
The prngd
daemon is used on certain Unix platforms (those
without /dev/random
) to provide entropy to seed the random
number generator for the SSL/HTTPS plugin. The prngdpid
value provides a way to check if the daemon is running. Note that
not all platforms require an entropy daemon. Added in version
4.01.1031761163 20020911. See also the entropypipe
setting
of urlcp
(here).putmsgs
(list)
The fetch-related putmsg
s since the most recent
<fetch>
or <submit>
. When called inside a <fetch parallel> loop, only the messages from the just-completed
fetch are returned, making disambiguation much easier than with
the standard <putmsg>
function callback mechanism. If <urlcp putmsg save> is off (here), no messages
will be saved or returned. The message buffer is cleared at the
start of each <fetch>
or <submit>
. If parsing these
messages, it may be helpful to turn off <urlcp putmsg pass>,
so that the same messages need not be seen and parsed by the
script-wide <putmsg>
function callback. Added in version 6.processedchunks
(strings or varbyte
values)
The ordered list of HTML document chunks that were actually
processed during HTML parsing. The concatenation of these is
normally the same as rawdoc
. However, the chunks may
differ if rawdoc
is not UTF-8, as the chunks always are.
The chunks may also differ from rawdoc
if JavaScript was
run and modified the document; e.g. some of the chunks may be the
output of document.write()
statements, whereas
rawdoc
is always the static original document. The chunks
may be zero-length/empty if no HTML processing was done, e.g. for
an image. Added in version 6. The concantenation of
processedchunks
is available as processeddoc
, which
may be easier to use if individual chunks (e.g. static vs. dynamic
content) are not needed.
processedchunksbufnums
(list of integers)
The ordered list of buffer numbers that the corresponding
processedchunks
values come from. During HTML and
JavaScript processing, a document will end up with one or more
buffers, the first of which (buffer 0) is the original static
document source itself. JavaScript processing may create further
buffers (e.g. the output of document.write()
). A buffer
may end up split into multiple chunks for HTML formatting if
such JavaScript output occurs mid-buffer. For example, a
document.write()
in the middle of an HTML page may result
in 3 chunks: the first part of buffer 0 (static doc), all of buffer 1
(generated by JavaScript), and the latter part of buffer 0 (rest
of static doc). Added in version 6.processeddoc
(string or varbyte
)
The concatentation of processedchunks
. Added in version
7.06.1463504000 20160517.
rawdoc
(string or varbyte
)
The document source (after any content/transfer encodings are
decoded). Same as the return value of the original fetch
or submit
. See also downloaddoc
.
redirs
(integer)
The number of redirects encountered (not necessarily followed).
Permanent (301) redirects are always counted, even if not
followed due to <urlcp followpermanentredirects>
(here) being off.requestheaders
(list)
The names of protocol (e.g. HTTP, HTTPS) request headers sent.
Added in version 8.01.1696358459 20231003.requestheader $hdrName
(list)
The full value(s) of the protocol request header(s) with single name
$hdrName
. Header names are case-insensitive.
Added in version 8.01.1696358459 20231003.
requestheadervalue $hdrName
The leading value
(i.e. before the ";") of the
request header(s) with single name $hdrName
, where the
header is in semicolon-parameterized format, i.e.:
value; param1=val1; param2="val 2"; ...Added in version 8.01.1696358459 20231003.
requestheaderparams $hdrName
The parameter name(s) from the semicolon-parameterized request
header(s) with single name $hdrName
. Added in version
8.01.1696358459 20231003.
requestheaderparam $hdrName $paramName
The parameter value(s) of the parameter(s) with single name
$paramName
from the semicolon-parameterized request
header(s) with single name $hdrName
. Added in version
8.01.1696358459 20231003.
saslmechanisms
(list)
The list of enabled SASL mechanisms (under Negotiate
authentication). See <urlcp saslmechanisms>
(here) for more info. A
putmsg
is generated if SASL is not supported on the current
platform.
saslmechanismsavailable
(list)
The list of available SASL mechanisms. See <urlcp saslmechanisms>
(here) for more info. A putmsg
is generated if SASL is not supported on the current platform.saslpluginpath
(string)
The colon-separated path to look for SASL plugins in. See <urlcp saslpluginpath> (here) for more info. A putmsg is generated if SASL is not supported on the current platform.
secure
(list)
Which parts of the transaction were conducted securely (via SSL).
Zero or more of the following values:
request
- The final URL request to the server was
secure.response
- The final response from the server was
secure.ancestors
- All previous requests and responses
that led to the final fetch (i.e. earlier redirects) were
secure.descendants
- All requests and responses made to
components on the final page (e.g. frames, scripts) were
secure.all
- All requests and responses for the entire
transaction - ancestors (if any), final page, and descendants
(if any) - were secure.
insecure
option.insecure
(list)
Which parts of the transaction were insecure, i.e. not
conducted securely via SSL.
Zero or more of the following values:
request
- The final URL request to the server was
insecure.response
- The final response from the server was
insecure.ancestors
- One or more previous requests or responses
that led to the final fetch (i.e. earlier redirects) were
insecure.descendants
- One or more requests or responses made to
components on the final page (e.g. frames, scripts) were
insecure.all
- The request and response for the final page
were insecure, one or more ancestors (if any) were insecure,
and one or more descendants (if any) were insecure.
secure
option.sslsecuritylevel
(integer)
Returns the currently set OpenSSL security level, an integer from
0-5. See the same-name urlcp
setting
(here) for details. Added in
version 8.01.1686081586 20230606.
sslciphers [$group]
(string)
Returns the list of SSL ciphers currently set with <urlcp sslciphers> (here), or empty string if none set (i.e. the OpenSSL default list is in effect). Added in version 7.03.1436205000 20150706.
In version 7.07 and later, an optional cipher $group
may be
given, to return the cipher list for that protocol group. The
group may be SSL (the default) for protocols TLSv1.2 and
below, or TLSv1.3 for TLSv1.3 ciphers; the two lists are
independent.
sslservercertificate
(PEM string)
Returns the SSL certificate obtained from the server, in PEM
format, or empty if none (e.g. no HTTPS/SSL server contacted). If
the server is an Apache or Texis Monitor web server, this
certificate is typically from the server's
SSLCertificateFile
setting. The urlutil
action
sslcertificate
(here) may be
used to decode the certificate into a human-readable string
format. Note that a server certificate may sometimes be
obtainable from an HTTPS/SSL server even if the connection fails
(e.g. due to verification problems). Added in version
6.00.1320460000 20111104.
sslclientcalist
(list)
Returns the list of CA (certificate authority) certificate names
that the HTTPS/SSL server requested as acceptable issuers of the
client's certificate. (If the server is an Apache or Texis
Monitor web server, this list is typically from the server's
SSLCADNRequestFile
or SSLCACertificateFile
setting.)
This is a list of certificate issuers that the server indicates it
will accept as signers of the client's (Vortex fetch lib's)
certificate. In other words, the certificate set with <urlcp
sslcertificatefile> (here) should
have been signed by one of these issuers, or the server might
reject the connection with a "Cannot complete SSL handshake:
... alert bad certificate" (or "... alert unknown ca")
or similar error.
If an HTTPS/SSL server was not contacted, or the server did not request a client (Vortex) certificate for verification, this list may be empty. Added in version 6.00.1320460000 20111104.
sslverifyservererrtoken
(string)
The string token that identifies the reason for the <urlcp
sslverifyserver> error, i.e. the token for the reason part of
the "Cannot verify certificate from
host:
port:
reason at depth N"
message. If no server-certificate verification was performed
(e.g. sslverifyserver
is off, or no SSL server was
contacted), the token is empty or "unknown
". If
verification was performed successfully (no errors), "Ok
"
is returned.
To continue to verify SSL server certificates - but ignore this
particular sub-type of verification error - this error can be
disabled by adding the token prepended with a "-
" (minus
sign) to the <urlcp sslverifyserver>
(here) setting. Added in version
6.00.1320460000 20111104. The list of possible tokens is detailed
in the SSL Client/Server Certificate Verification appendix,
here. Note that disabling
individual sslverifyserver
errors should be done with
caution, as it can weaken the security provided by those checks.
strlinks [refInfo]
(list)
The list of JavaScript string links. These may be unreliable or
require further processing, so they are not returned as part of
the normal links
list. See also
<urlcp scriptstrlinks>
(here). Added in version
5.00.1086804521 20040609.
In version 8 and later, an optional flag refInfo
may be
given, in which case a list of refInfo
objects is returned
(here) instead of string URLs.
sspipackages
(list)
The list of enabled SSPI packages enabled/offered under
Negotiate
authentication. See <urlcp sspipackages>
(here) for more info. A
putmsg
is generated if SSPI is not supported on the current
platform (e.g. non-Windows).
sspipackagesavailable
(list)
The list of available SSPI packages. See
<urlcp sspipackages> (here) for more
info. A putmsg
is generated if SSPI is not supported on the
current platform (e.g. non-Windows).strbaseurls
(list)
The list of JavaScript base URLs corresponding to strlinks
.
If <urlcp scriptstrlinksabs>
is off, this enables the
strlinks
list to be made absolute, perhaps after some
post-processing. Added in version 5.00.1086804521 20040609.
text
(string)
The formatted text of the document. Same as the return value of the
obsolescent urltext
function.
textformatter
(string)
A token describing what formatter was used to produce the
<urlinfo text> value; one of the following:
unknown
Formatter is unknown.rawdoc
No formatting: text is the raw document source.text
Plain-text document formatter.gopher
Gopher menu formatter.html
HTML document formatter.rss
RSS feed formatter.frame
Framed document formatter/aggregator.
rss
was added in version 7.02.1407881000 20140812.title
(string)
The formatted title text of the document.time
or totaltime
(double)
The total time in seconds (including fraction) to retrieve the page. This includes DNS resolution plus request and content transfer time, across all fetches (including redirects/auth/etc.) for the request. Added in version 3.01.966019604 20000811.
dnstime
(double)
The time in seconds (including fraction) to resolve the hostname(s) via DNS, across all fetches (including redirects/auth/etc.) for the request. Added in version 3.01.966019604 20000811.
transfertime
(double)
The time in seconds (including fraction) to request and transfer content to/from the web server, i.e. time from DNS completion to response transfer completion, across all fetches (including redirects/auth/etc.) for the request. This is a more accurate measure of web server throughput because it does not include the time to resolve the hostname(s). Added in version 3.01.966019604 20000811.
The possible errnum
, errtoken
and errmsg
values are:
errtoken | errmsg | |
0 | Ok | Ok |
1 | ClientErr | Unknown client error |
2 | ServerErr | Server error |
3 | UnkResponseCode | Unrecognized response code |
4 | UnkProtocolVersion | Unrecognized protocol version |
5 | ConnTimeout | Connection timeout |
6 | UnkHost | Unknown host |
7 | CannotConn | Cannot connect to host |
8 | NotConn | Not connected |
9 | CannotCloseConn | Cannot close connection |
10 | CannotWriteConn | Cannot write to connection |
11 | CannotReadConn | Cannot read from connection |
12 | CannotWriteFile | Cannot write to file |
13 | OutOfMem | Out of memory |
14 | PageTrunc | Page not expected size, possibly truncated |
15 | MaxPageSizeExceeded | Max page size exceeded, truncated |
16 | TooManyRedirs | Too many redirects |
17 | OffsiteRef | Off-site or unapproved redirect or frame |
18 | UnkProtocol | Unknown/unimplemented access method |
19 | BadParam | Bad parameter |
20 | UnkErr | Unknown error |
21 | BadRedir | Bad redirect |
22 | DocUnauth | Document access unauthorized |
23 | DocForbidden | Document access forbidden |
24 | DocNotFound | Document not found |
25 | ServerNotImplemented | Server did not recognize request (unimplemented) |
26 | ServiceUnavailable | Service unavailable |
27 | UnkMethod | Unknown request method |
28 | CannotReadFile | Cannot read from file |
29 | CannotLoadLib | Cannot load dynamic library |
30 | ScriptErr | Script error |
31 | ScriptTimeout | Script timeout |
32 | ScriptMemExceeded | Script memory limit exceeded |
33 | DisallowedProtocol | Disallowed protocol |
34 | SslErr | SSL error |
35 | ProxyUnauth | Proxy access unauthorized |
36 | EmbeddedSecurityChange | Embedded object security change |
37 | DisallowedFilePrefix | Disallowed file prefix |
38 | DisallowedFileType | Disallowed file type |
39 | DisallowedNonlocalFileUrl | Disallowed non-local file URL |
40 | CannotConvertCharset | Cannot convert character set |
41 | DisallowedAuthScheme | Disallowed authentication scheme |
42 | SecureTransNotPossible | Secure transaction not possible |
43 | UnexpectedResponseCode | Unexpected server response |
44 | DisallowedMethod | Disallowed request method |
45 | ConnUpgradeToSslRequired | Connection upgrade to SSL required |
46 | FetchNotPermittedByLicense | Fetch not permitted by license |
47 | UnknownContentEncoding | Unknown Content- or Transfer-Encoding |
48 | DisallowedContentEncoding | Disallowed Content- or Transfer-Encoding |
49 | CannotDecodeContentEncoding | Cannot decode Content- or Transfer-Encoding |
errtoken | errmsg | |
50 | NotAcceptable | Client-acceptable version not found |
51 | CannotVerifyServerCertificate | Cannot verify server certificate |
52 | ConnectionNotReusable | Connection not reusable |
53 | CannotTunnelProtocol | Cannot tunnel protocol |
54 | PacError | Proxy auto-config error |
55 | UserDataFetchNeedsMoreData | User-data fetch needs mode data |
56 | ComponentError | Page component (frame/iframe/script etc.) error |
UserDataFetchNeedsMoreData
can result when a
user-data fetch (i.e. when giving $downloaddoc) needs
more data - such as a redirect target document indicated by
headers - but the data is not / cannot be provided by the
user-data API, as only one
statusline/header-list/download-document tuple may be given.ComponentError
was added in version 7.07.1611356000
20210122. Previous versions would return the component error
(e.g. DocNotFound
) directly for the parent, making it
difficult to determine whether there was an error in the parent
or its component(s).
DIAGNOSTICSurlinfo
returns the requested value(s).
EXAMPLE<fetch "http://www.somesite.com/mypage.html">
<urlinfo "metanames">
<$names = $ret>
Meta data:
<LOOP $names>
<urlinfo "metaname" $names>
$names = <LOOP $ret> "$ret" </LOOP>
</LOOP>
CAVEATS
The urlinfo
function was added in version 2.1.884800000 19980114.
If submit
is used with TOFILE
, then content and
content-derived items such as links
are unavailable in
urlinfo
, because the content was not held in memory for
processing.