SYNOPSIS<urlutil $action [$arg ...]>
DESCRIPTION
The urlutil
function provides URL and other network-related
utility functions. The $action
argument determines what it does:
abs $absurl $relurl
or absurl $absurl $relurl
Makes URLs absolute (fully specified). The $absurl
values
are one or more absolute page URLs. The $relurl
values are
corresponding links - relative or not - from those page(s). For
each $relurl
value, its absolute value is returned, as if it
were a link on that page. If there are fewer $absurl
values
than $relurl
values, the last $absurl
value is re-used.
The protocol and hostname (if any) in each returned value will be
lowercase.charsetcanon $charset
Returns canonical name for charset name $charset
, according
to current Charset Config file (here).
Can be used to map charset aliases to canonical names.charsetconv $buf $from [$to]
Converts text buffer $buf
from charset $from
to
charset $to
. The default for $to
if unspecified or
empty is the current <urlcp charsettxt>
setting. Some
character sets may require the use of an external charset
converter (the default is iconv
, see
<urlcp charsetconverter>
to change it), which is automatically
executed when needed. Added in version 5.00.1090598954 20040723.charsetdetect $buf
Returns guess at charset for text buffer $buf
, or
"Unknown
" if charset unknown. Only limited charset
detection is supported, primarily UTF-8, UTF-16BE/UTF-16LE, and
all-7-bit ISO-8859-1. Added in version 7.02.1398457000 20140425.filepath $u
Takes $u
, which must be a file://
URL, and
returns the local file path that would be used to read the file,
as determined by the current <urlcp fileroot>
etc. settings.pacinit
Initializes proxy auto-config by fetching PAC script (if
configured, here) and running it. Returns 1 if
successful, 0 if not. The error from the fetch, messages from the
fetch and script execution, and the body of the script (if
fetched) are available afterwards via <urlinfo>
. If no PAC
script nor URL is configured, or the script was already
initialized, no action is taken, and 1 (success) is returned.
Calling <urlutil pacinit>
when using a PAC script is not
necessary: the PAC script is automatically fetched and run when
needed, i.e. at the first <fetch>
or <submit>
, and
any messages at PAC initialization are reported. However any PAC
failure during such automatic initialization merely translates
into a Proxy auto-config error for the <fetch>
. The
<urlutil pacinit>
action provides a way to get more
detailed information about the PAC script, if desired for
diagnostic purposes.
split $u $part
Splits a URL into parts. The $u
value is the
URL to split. The $part
value is a single part to
return. The part
can be any of protocol
, user
, pass
,
authority
,
host
,
hostIsIPv6
,
port
, path
, type
, query
, anchor
,
or allpartnames
.
In Texis version 8 and later, authority
and
hostIsIPv6
were added. The authority
part is a
composite/alias of user
, pass
, host
, and
port
: it is the part of the URL after the trailing //
of the protocol and before the path, including all separators
therein. Thus if present, it contains the host (with any IPv6
brackets), optional user/pass info, and optional port (with
colon). The hostIsIPv6
value is 1 if the host looks
like a bracketed IPv6 address - the host
value will have
the brackets stripped then - or 0 if not; in version
8.00.1637010861 20211115 and later, it is a long
value, in
earlier versions, a string.
In version 8.00.1637010861 20211115, user
and pass
support was added, and allpartnames
was added. Also in
this version, support for multiple parts in $part
was
removed (now gives an error message). This allows a missing part
(zero return values) to be distinguished from a present but empty
part (one empty string return value). In previous versions,
multiple parts could be requested, and thus the return values were
in sync with $part
, which required missing part(s) to be
returned as empty string instead; user
/pass
were
also always silently returned as empty. allpartnames
will
return a list of the names of the zero or more part(s) that are
present in the URL.
sslcertificate $pem tostring
Parses an SSL certificate string buffer $pem
(in PEM
format). The tostring
sub-action returns a human-readable
string version of the certificate, with subject, issuer,
expiration etc. printed. This can be used to view a server
certificate returned from <urlinfo sslservercertificate>.
Several actions take inet
style argument(s).
This is an IPv4
or IPv6
address string, optionally followed by a netmask.
For IPv4, the format is dotted-decimal, i.e. N[.N[.[N.N]]] where N is a decimal, octal or hexadecimal integer from 0 to 255. If x < 4 values of N are given, the last N is taken as the last 5-x bytes instead of 1 byte, with missing bytes padded to the right. E.g. 192.258 is valid and equivalent to 192.1.2.0: the last N is 2 bytes in size, and covers 5 - 2 = 3 needed bytes, including 1 zero pad to the right. Conversely, 192.168.4.1027 is not valid: the last N is too large.
An IPv4 address may optionally be followed by a netmask, either of the form /B or :IPv4, where B is a decimal, octal or hexadecimal netmask integer from 0 to 32, and IPv4 is a dotted-decimal IPv4 address of the same format described above. If an :IPv4 netmask is given, only the largest contiguous set of most-significant 1 bits are used (because netmasks are contiguous). If no netmask is given, it will be calculated from standard IPv4 class A/B/C/D/E rules, but will be large enough to include all given bytes of the IP. E.g. 1.2.3.4 is Class A which has a netmask of 8, but the netmask will be extended to 32 to include all 4 given bytes.
In version 8 and later, IPv6 addresses are supported as well. These are given in standard IPv6 hex format, i.e. H:H:H:H where H is a 16-bit hexadecimal number, with :: supported for a single span of zero bits, as per canonical IPv6 text representation.
An IPv6 address may optionally be followed by a netmask, of the form /B, where B is a decimal, octal or hexadecimal netmask integer from 0 to 128. If no netmask is given, it defaults to the host-only network (i.e. 128).
In version 7.07.1554395000 20190404 and later, error messages are reported.
The inet
actions were added in version 5.01.1112986377 20050408,
and include the following (see also the SQL equivalents):
inetabbrev $inet
Returns a possibly shorter-than-canonical representation of
$inet
, where trailing zero byte(s) of an IPv4 address may
be omitted. All bytes of the network, and leading non-zero bytes
of the host, will be included. E.g. <urlutil inetabbrev
"192.100.0.0/24"> returns 192.100.0/24. The /B
netmask is included, except if (in version 7.07.1554840000
20190409 and later) the network is host-only (i.e. netmask is the
full size of the IP address). Empty string is returned on error.
inetcanon $inet
Returns canonical representation of $inet
. For IPv4, this
is dotted-decimal with all 4 bytes.
For IPv6, this is 8 16-bit hexadecimal integers (no leading
zeroes), colon-separated, possibly with a :: for zero bits.
The /B netmask is included, except if (in version
7.07.1554840000 20190409 and later) the network is host-only
(i.e. netmask is the full size of the IP address). Empty string
is returned on error.
inetnetwork $inet
Returns string IP address with the network bits of $inet
,
and the host bits set to 0. Empty string is returned on error.inethost $inet
Returns string IP address with the host bits of $inet
,
and the network bits set to 0. Empty string is returned on error.inetbroadcast $inet
Returns string IP broadcast address for $inet
, i.e. with
the network bits, and host bits set to 1. Empty string is
returned on error.inetnetmask $inet
Returns string IP netmask for $inet
, i.e. with the
network bits set to 1, and host bits set to 0. Empty string is
returned on error.inetnetmasklen $inet
Returns integer netmask length of $inet
. -1 is returned
on error.inetcontains $inetA $inetB
Returns 1 if $inetA
contains $inetB
, i.e. every
address in $inetB
occurs within the $inetA
network.
0 is returned if not, or -1 on error.
Note that an IPv4 address is not considered to be contained
within the equivalent IPv4-mapped IPv6 address, nor vice-versa
(e.g. ::ffff:1.2.3.4 is considered different from
1.2.3.4). To treat IPv4 addresses the same as their
IPv4-mapped IPv6 equivalents, promote both arguments to IPv6
with inetToIPv6
(here).inetclass $inet
Returns class of $inet
, e.g. A
, B
, C
,
D
, E
or classless
if a different netmask is
used (or the address is IPv6). Empty string is returned on error.inet2int $inet
Returns integer representation of IP network/host bits of
$inet
(i.e. without netmask); useful for compact storage of
address as integer(s) instead of string.
Returns a varint with 1 value for IPv4 addresses, 4 for IPv6
addresses, or 0 values on error (i.e. return compares equal to
empty string on error). Note that in version 7 and earlier, a
single int was always returned, with -1 for error (or 255.255.255.255).
int2inet $i
Returns inet
string for
integer $i
taken as an IP address. Since no netmask can be stored in the
integer form of an IP address, the returned IP string will not
have a netmask. Empty string is returned on error.inetToIPv4 $inet
Converts $inet
to IPv4 (including netmask), iff IPv4-mapped
IPv6. Returns the equivalent IPv4 address for $inet
iff it
is an IPv4-mapped IPv6 address; e.g. ::ffff:1.2.3.4 would
return 1.2.3.4. Otherwise, returns canonical version of
$inet
iff it is some other IPv6 address; e.g. 2000::a:000b:c:d would return 2000::a:b:c:d. Otherwise
returns empty string (i.e. on error). May be useful when storing
both IPv4 and IPv6 addresses in a common compact int(4)
field from inet2int
, in order to recover original IP family
format on display (after int2inet
reconversion). Added in
version 8.
inetToIPv6 $inet
Converts $inet
to IPv4-mapped IPv6 (including netmask), iff
IPv4. Returns the equivalent IPv4-mapped IPv6 address for
$inet
iff it is IPv4; e.g. 1.2.3.4 would return ::ffff:1.2.3.4. Otherwise, returns canonical version of
$inet
iff it is IPv6; e.g. 2000::a:000b:c:d would
return 2000::a:b:c:d. Otherwise returns empty string
(i.e. on error). May be useful when storing both IPv4 and IPv6
addresses in a common compact int(4) field from
inet2int
, in order to convert potential IPv4 addresses to
IPv6 before inet2int
conversion. Added in version 8.
inetAddressFamily $inet
Returns IP address family for $inet
: IPv4 iff IPv4
address, IPv6 iff IPv6 address, otherwise empty string.
Added in version 8.
EXAMPLE<urlutil abs "http://example.com/dir/page.html" "other.html">
The return value in $ret
would be
http://example.com/dir/other.html.
CAVEATS
The urlutil
function was added in version 3.0.957600000 20000505.