urlcp settings control how links from formatted
documents are processed:
The base URL to interpret relative links from, when making them
absolute during formatting. The default (if empty) is to use the
page URL or
<base href> value. Added in version
contentlocationasbaseurl(boolean) Whether to interpret the
Content-Locationheader (if present) as the base URL for the document (can be overridden by
<base>tag). The default is on. Added in version 5.01.1249455000 20090805.
eatlinkspace(boolean) Whether to strip leading/trailing whitespace from links before processing into absolute links. The default is on. Added in version 3.01.968173351 20000905.
refs(2, 4 or 6 list arguments) Which HTML tag/attributes are to be considered links, images, frames or iframes when formatting or reparenting HTML. Removing or adding tag/attributes can remove or add them from the returned lists of
<urlinfo links>etc. This setting takes several parallel argument lists, in the form:
action(single value) A single value of
set, indicating how to apply the following arguments as a whole. If
add, the arguments are added (flags ORed) to the existing values; if
del, the arguments are deleted (flags cleared); if
set, the existing values are cleared first and replaced with the arguments.
flags,...(list) Each value is a comma-separated list of one or more flags to apply:
attrvalue should be considered a link.
<urlinfo links>) and reparenting (here). If
reparentis appended to a flag, the flag applies only to that action instead.
tag(list) The HTML tag referred to.
attr(list) The attribute whose value(s) the flags apply to.
attr2(list, optional) An optional second attribute that if specified, must be present and have the value
val2for the flags to take effect. For example, by default <INPUT SRC=...> values are considered images, but only if the attribute TYPE is also present with the value IMAGE. This value can be empty or unspecified if not needed.
val2(list, optional) Value for
attr2. Required only if
val2arguments space-separated in each value. The return value may given as a single refs set argument to restore the previous settings. Also, the single argument
defaultsmay be given to refs set to restore the built-in default settings. Note that HTML tags/attributes that are not currently known by the internal parser cannot be specified. Added in version 5.01.1159397148 20060927.
This example removes treating <INPUT
SRC=...TYPE=IMAGE> values as images, and adds
<LINK SRC=...> values as both images and links:
<urlcp refs del image input src type image>
<urlcp refs add link,image link src>
none(for no strings at all),
file(for strings that resemble files),
protocol(for strings that resemble URL protocols), or
<urlinfo strlinks>(here). Returns previous setting. Added in version 5.00.1087588168 20040618. Default is
Stringlinks. If on (the default) these URLs will be absolute. If off, they are left as-is (i.e. so the caller can perform additional scans or cleanup). Returns previous setting. Added in version 5.00.1087588646 20040618.
How to treat non-printable bytes (those outside the range ! through inclusive) encountered in URL links of fetched pages:
asisLeave non-printable bytes alone.
stripRemove non-printable bytes.
encodeDefault: URL-encode non-printable bytes.