urlutil - URL/network utility



<urlutil $action [$arg ...]>

The urlutil function provides URL and other network-related utility functions. The $action argument determines what it does:

  • abs $absurl $relurl or absurl $absurl $relurl Makes URLs absolute (fully specified). The $absurl values are one or more absolute page URLs. The $relurl values are corresponding links - relative or not - from those page(s). For each $relurl value, its absolute value is returned, as if it were a link on that page. If there are fewer $absurl values than $relurl values, the last $absurl value is re-used. The protocol and hostname (if any) in each returned value will be lowercase.

  • charsetcanon $charset Returns canonical name for charset name $charset, according to current Charset Config file (here). Can be used to map charset aliases to canonical names.

  • charsetconv $buf $from [$to] Converts text buffer $buf from charset $from to charset $to. The default for $to if unspecified or empty is the current <urlcp charsettxt> setting. Some character sets may require the use of an external charset converter (the default is iconv, see <urlcp charsetconverter> to change it), which is automatically executed when needed. Added in version 5.00.1090598954 20040723.

  • charsetdetect $buf Returns guess at charset for text buffer $buf, or "Unknown" if charset unknown. Only limited charset detection is supported, primarily UTF-8, UTF-16BE/UTF-16LE, and all-7-bit ISO-8859-1. Added in version 7.02.1398457000 20140425.

  • filepath $u Takes $u, which must be a file:// URL, and returns the local file path that would be used to read the file, as determined by the current <urlcp fileroot> etc. settings.

  • pacinit

    Initializes proxy auto-config by fetching PAC script (if configured, here) and running it. Returns 1 if successful, 0 if not. The error from the fetch, messages from the fetch and script execution, and the body of the script (if fetched) are available afterwards via <urlinfo>. If no PAC script nor URL is configured, or the script was already initialized, no action is taken, and 1 (success) is returned.

    Calling <urlutil pacinit> when using a PAC script is not necessary: the PAC script is automatically fetched and run when needed, i.e. at the first <fetch> or <submit>, and any messages at PAC initialization are reported. However any PAC failure during such automatic initialization merely translates into a Proxy auto-config error for the <fetch>. The <urlutil pacinit> action provides a way to get more detailed information about the PAC script, if desired for diagnostic purposes.

  • split $u $parts Splits a URL into one or more parts. The $u value is the URL to split. The $parts values are a list of the parts to return, in the same order, as $ret values. The parts can be any of protocol, user, pass, host, port, path, type, query or anchor. Note that user and pass are not yet supported.

  • sslcertificate $pem tostring   Parses an SSL certificate string buffer $pem (in PEM format). The tostring sub-action returns a human-readable string version of the certificate, with subject, issuer, expiration etc. printed. This can be used to view a server certificate returned from <urlinfo sslservercertificate>.

Several actions take inet style argument(s). This is an IPv4 address string, optionally followed by a netmask.

For IPv4, the format is dotted-decimal, i.e. N[.N[.[N.N]]] where N is a decimal, octal or hexadecimal integer from 0 to 255. If x < 4 values of N are given, the last N is taken as the last 5-x bytes instead of 1 byte, with missing bytes padded to the right. E.g. 192.258 is valid and equivalent to the last N is 2 bytes in size, and covers 5 - 2 = 3 needed bytes, including 1 zero pad to the right. Conversely, is not valid: the last N is too large.

An IPv4 address may optionally be followed by a netmask, either of the form /B or :IPv4, where B is a decimal, octal or hexadecimal netmask integer from 0 to 32, and IPv4 is a dotted-decimal IPv4 address of the same format described above. If an :IPv4 netmask is given, only the largest contiguous set of most-significant 1 bits are used (because netmasks are contiguous). If no netmask is given, it will be calculated from standard IPv4 class A/B/C/D/E rules, but will be large enough to include all given bytes of the IP. E.g. is Class A which has a netmask of 8, but the netmask will be extended to 32 to include all 4 given bytes.

In version 7.07.1554395000 20190404 and later, error messages are reported.

The inet actions were added in version 5.01.1112986377 20050408, and include the following (see also the SQL equivalents):

  • inetabbrev $inet

    Returns a possibly shorter-than-canonical representation of $inet, where trailing zero byte(s) of an IPv4 address may be omitted. All bytes of the network, and leading non-zero bytes of the host, will be included. E.g. <urlutil inetabbrev ""> returns 192.100.0/24. The /B netmask is included, except if (in version 7.07.1554840000 20190409 and later) the network is host-only (i.e. netmask is the full size of the IP address). Empty string is returned on error.

  • inetcanon $inet

    Returns canonical representation of $inet. For IPv4, this is dotted-decimal with all 4 bytes. The /B netmask is included, except if (in version 7.07.1554840000 20190409 and later) the network is host-only (i.e. netmask is the full size of the IP address). Empty string is returned on error.

  • inetnetwork $inet Returns string IP address with the network bits of $inet, and the host bits set to 0. Empty string is returned on error.

  • inethost $inet Returns string IP address with the host bits of $inet, and the network bits set to 0. Empty string is returned on error.

  • inetbroadcast $inet Returns string IP broadcast address for $inet, i.e. with the network bits, and host bits set to 1. Empty string is returned on error.

  • inetnetmask $inet Returns string IP netmask for $inet, i.e. with the network bits set to 1, and host bits set to 0. Empty string is returned on error.

  • inetnetmasklen $inet Returns integer netmask length of $inet. -1 is returned on error.

  • inetcontains $inetA $inetB Returns 1 if $inetA contains $inetB, i.e. every address in $inetB occurs within the $inetA network. 0 is returned if not, or -1 on error.

  • inetclass $inet Returns class of $inet, e.g. A, B, C, D, E or classless if a different netmask is used (or the address is IPv6). Empty string is returned on error.

  • inet2int $inet

    Returns integer representation of IP network/host bits of $inet (i.e. without netmask); useful for compact storage of address as integer(s) instead of string. Returns -1 is returned on error (note that -1 may also be returned for an all-ones IP address, e.g.

  • int2inet $i Returns inet string for 1- or 4-value varint $i taken as an IP address. Since no netmask can be stored in the integer form of an IP address, the returned IP string will not have a netmask. Empty string is returned on error.


<urlutil abs "http://example.com/dir/page.html" "other.html">

The return value in $ret would be http://example.com/dir/other.html.

The urlutil function was added in version 3.0.957600000 20000505.

fetch, urlinfo

Copyright © Thunderstone Software     Last updated: Aug 4 2020
Copyright © 2021 Thunderstone Software LLC. All rights reserved.