Links

The following urlcp settings control how links from formatted documents are processed:

  • baseurl (string)

    The base URL to interpret relative links from, when making them absolute during formatting. The default (if empty) is to use the page URL or <base href> value. Added in version 7.05.1452546000 20160111.

  • contentlocationasbaseurl (boolean) Whether to interpret the Content-Location header (if present) as the base URL for the document (can be overridden by <base> tag). The default is on. Added in version 5.01.1249455000 20090805.

  • eatlinkspace (boolean) Whether to strip leading/trailing whitespace from links before processing into absolute links. The default is on. Added in version 3.01.968173351 20000905.

  • keeprefsselectors (list), ignorerefsselectors (list) Lists of CSS selectors to match elements (i.e. through and including balanced close tags, if defined) containing references that should be kept or ignored (e.g. in <urlinfo links>, images etc.). Only refs inside/part of keeprefsselectors elements (and outside ignorerefsselectors elements if given) are retained. If no keeprefsselectors are given (the default), all refs are considered kept.

    A limited subset of CSS selector syntax is supported; see ignoretextselectors (here) for details. Settings added in version 8.01.1664337014 20220927. They return nonzero on success, 0 on error. See also strictkeepselectors (here).

  • refs (2, 4 or 6 list arguments) Which HTML tag/attributes are to be considered links, images, frames or iframes when formatting or reparenting HTML. Removing or adding tag/attributes can remove or add them from the returned lists of <urlinfo links> etc. This setting takes several parallel argument lists, in the form: refs action flags... tag attr [attr2 val2]:

    • action (single value) A single value of add, del, or set, indicating how to apply the following arguments as a whole. If add, the arguments are added (flags ORed) to the existing values; if del, the arguments are deleted (flags cleared); if set, the existing values are cleared first and replaced with the arguments.

    • flags,... (list) Each value is a comma-separated list of one or more flags to apply:

      • link The tag's attr value should be considered a link.

      • image The value should be considered an image.

      • frame The value should be considered a frame.

      • iframe The value should be considered an iframe.
      The above flags apply to both formatting (<urlinfo links>) and reparenting (here). If format or reparent is appended to a flag, the flag applies only to that action instead.

    • tag (list) The HTML tag referred to.

    • attr (list) The attribute whose value(s) the flags apply to.

    • attr2 (list, optional) An optional second attribute that if specified, must be present and have the value val2 for the flags to take effect. For example, by default <INPUT SRC=...> values are considered images, but only if the attribute TYPE is also present with the value IMAGE. This value can be empty or unspecified if not needed.

    • val2 (list, optional) Value for attr2. Required only if attr2 value given.
    The return value is the previous setting, as a single list with the tag, attr, attr2 and val2 arguments space-separated in each value. The return value may given as a single refs set argument to restore the previous settings. Also, the single argument defaults may be given to refs set to restore the built-in default settings. Note that HTML tags/attributes that are not currently known by the internal parser cannot be specified. Added in version 5.01.1159397148 20060927.

    This example removes treating <INPUT SRC=...TYPE=IMAGE> values as images, and adds <LINK SRC=...> values as both images and links:

    <urlcp refs del image      input src type image>
    <urlcp refs add link,image link  src>

  • scriptstrlinks (string) Which types of JavaScript String links (those determined from scanning all JavaScript strings, instead of known true JavaScript links) to return. One or more of the values none (for no strings at all), file (for strings that resemble files), protocol (for strings that resemble URL protocols), or all (for all strings) may be specified. Note that script string links are unreliable and not guaranteed to be legitimate or even syntactically correct. This is a method of attempting to obtain links that the JavaScript module is otherwise missing. The strings are returned via <urlinfo strlinks> (here). Returns previous setting. Added in version 5.00.1087588168 20040618. Default is protocol and file.

  • scriptstrlinkabs or scriptstrlinksabs (boolean) Whether to absolute URLs from JavaScript String links. If on (the default) these URLs will be absolute. If off, they are left as-is (i.e. so the caller can perform additional scans or cleanup). Returns previous setting. Added in version 5.00.1087588646 20040618.

  • urlnonprint (string)

    How to treat non-printable bytes (those outside the range ! through   inclusive) encountered in URL links of fetched pages:

    • asis Leave non-printable bytes alone.

    • strip Remove non-printable bytes.

    • encode Default: URL-encode non-printable bytes.
    Added in version 4.00.1006200000 20011119.

Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.