The following urlcp
settings control how links from formatted
documents are processed:
baseurl
(string)
The base URL to interpret relative links from, when making them
absolute during formatting. The default (if empty) is to use the
page URL or <base href>
value. Added in version
7.05.1452546000 20160111.
contentlocationasbaseurl
(boolean)
Whether to interpret the Content-Location
header (if
present) as the base URL for the document (can be overridden by
<base>
tag). The default is on. Added in version
5.01.1249455000 20090805.eatlinkspace
(boolean)
Whether to strip leading/trailing whitespace from links before
processing into absolute links. The default is on. Added in
version 3.01.968173351 20000905.keeprefsselectors
(list), ignorerefsselectors
(list)
Lists of CSS selectors to match elements (i.e. through and
including balanced close tags, if defined) containing references
that should be kept or ignored (e.g. in <urlinfo links>,
images etc.). Only refs inside/part of
keeprefsselectors
elements (and outside
ignorerefsselectors
elements if given) are retained. If no
keeprefsselectors
are given (the default), all refs are
considered kept.
A limited subset of CSS selector syntax is supported; see
ignoretextselectors
(here)
for details. Settings added in version 8.01.1664337014 20220927.
They return nonzero on success, 0 on error. See also
strictkeepselectors
(here).
refs
(2, 4 or 6 list arguments)
Which HTML tag/attributes are to be considered links, images,
frames or iframes when formatting or reparenting HTML. Removing
or adding tag/attributes can remove or add them from the returned
lists of <urlinfo links>
etc. This setting takes several
parallel argument lists, in the form: refs
action
flags
... tag
attr
[attr2
val2
]:
action
(single value)
A single value of add
, del
, or set
,
indicating how to apply the following arguments as a whole.
If add
, the arguments are added (flags ORed) to the
existing values; if del
, the arguments are deleted
(flags cleared); if set
, the existing values are
cleared first and replaced with the arguments.flags,...
(list)
Each value is a comma-separated list of one or more flags to
apply:
tag
's attr
value should be
considered a link.<urlinfo links>
)
and reparenting (here). If
format
or reparent
is appended to a flag, the
flag applies only to that action instead.tag
(list)
The HTML tag referred to.attr
(list)
The attribute whose value(s) the flags apply to.attr2
(list, optional)
An optional second attribute that if specified, must be
present and have the value val2
for the flags to take
effect. For example, by default <INPUT SRC=...>
values are considered images, but only if the attribute
TYPE is also present with the value IMAGE.
This value can be empty or unspecified if not needed.val2
(list, optional)
Value for attr2
. Required only if attr2
value given.
tag
, attr
, attr2
and val2
arguments space-separated in each value. The return value may
given as a single refs set argument to restore the previous
settings. Also, the single argument defaults
may be given
to refs set to restore the built-in default settings. Note
that HTML tags/attributes that are not currently known by the
internal parser cannot be specified. Added in version
5.01.1159397148 20060927.
This example removes treating <INPUT
SRC=...TYPE=IMAGE> values as images, and adds <LINK SRC=...> values as both images and links:
<urlcp refs del image input src type image>
<urlcp refs add link,image link src>
scriptstrlinks
(string)
Which types of JavaScript String
links (those determined from
scanning all JavaScript strings, instead of known true JavaScript
links) to return. One or more of the values none
(for no
strings at all), file
(for strings that resemble files),
protocol
(for strings that resemble URL protocols), or
all
(for all strings) may be specified. Note that script
string links are unreliable and not guaranteed to be legitimate or
even syntactically correct. This is a method of attempting to
obtain links that the JavaScript module is otherwise missing. The
strings are returned via <urlinfo strlinks>
(here). Returns previous setting.
Added in version 5.00.1087588168 20040618. Default is
protocol
and file
.scriptstrlinkabs
or scriptstrlinksabs
(boolean)
Whether to absolute URLs from JavaScript String
links. If on (the
default) these URLs will be absolute. If off, they are left as-is
(i.e. so the caller can perform additional scans or cleanup).
Returns previous setting. Added in version 5.00.1087588646 20040618.urlnonprint
(string)
How to treat non-printable bytes (those outside the range ! through inclusive) encountered in URL links of fetched pages:
asis
Leave non-printable bytes alone.strip
Remove non-printable bytes.encode
Default: URL-encode non-printable bytes.