urlutil - URL/network utility

 

SYNOPSIS

<urlutil $action [$arg ...]>


DESCRIPTION
The urlutil function provides URL and other network-related utility functions. The $action argument determines what it does:

  • abs $absurl $relurl or absurl $absurl $relurl Makes URLs absolute (fully specified). The $absurl values are one or more absolute page URLs. The $relurl values are corresponding links - relative or not - from those page(s). For each $relurl value, its absolute value is returned, as if it were a link on that page. If there are fewer $absurl values than $relurl values, the last $absurl value is re-used.

  • charsetcanon $charset Returns canonical name for charset name $charset, according to current Charset Config file (here). Can be used to map charset aliases to canonical names.

  • charsetconv $buf $from [$to] Converts text buffer $buf from charset $from to charset $to. The default for $to if unspecified or empty is the current <urlcp charsettxt> setting. Some character sets may require the use of an external charset converter (the default is iconv, see <urlcp charsetconverter> to change it), which is automatically executed when needed. Added in version 5.00.1090598954 20040723.

  • charsetdetect $buf Returns guess at charset for text buffer $buf, or "Unknown" if charset unknown. Only limited charset detection is supported, primarily UTF-8, UTF-16BE/UTF-16LE, and all-7-bit ISO-8859-1. Added in version 7.02.1398457000 20140425.

  • filepath $u Takes $u, which must be a file:// URL, and returns the local file path that would be used to read the file, as determined by the current <urlcp fileroot> etc. settings.

  • pacinit

    Initializes proxy auto-config by fetching PAC script (if configured, here) and running it. Returns 1 if successful, 0 if not. The error from the fetch, messages from the fetch and script execution, and the body of the script (if fetched) are available afterwards via <urlinfo>. If no PAC script nor URL is configured, or the script was already initialized, no action is taken, and 1 (success) is returned.

    Calling <urlutil pacinit> when using a PAC script is not necessary: the PAC script is automatically fetched and run when needed, i.e. at the first <fetch> or <submit>, and any messages at PAC initialization are reported. However any PAC failure during such automatic initialization merely translates into a Proxy auto-config error for the <fetch>. The <urlutil pacinit> action provides a way to get more detailed information about the PAC script, if desired for diagnostic purposes.

  • split $u $parts Splits a URL into one or more parts. The $u value is the URL to split. The $parts values are a list of the parts to return, in the same order, as values in $ret. The parts can be protocol, user, pass, host, port, path, type, query or anchor.

  • sslcertificate $pem tostring   Parses an SSL certificate string buffer $pem (in PEM format). The tostring sub-action returns a human-readable string version of the certificate, with subject, issuer, expiration etc. printed. This can be used to view a server certificate returned from <urlinfo sslservercertificate>.

Several actions take inet type arguments, which are IP network and/or host address specification strings of the form: N[.N[.N[.N]]]{/B|:IP}] where N is a decimal, octal or hexadecimal integer from 0 to 255, B is a decimal, octal or hexadecimal netmask integer from 0 to 32, and IP is an IP address netmask of the form N[.N[.N[.N]]]. If only x Ns are specified, the last N may be 5-x bytes in size instead of 1 byte. E.g. "1.2.65535" is legal (last N is 2 bytes), whereas "1.2.3.65535" is not. If no netmask (/B|:IP) is specified, the netmask will be calculated from standard class A/B/C/D/E rules, but it will be at least large enough to include all specified bytes of the IP. (Thus, to get the class A/B/C/D/E netmask of an IP address via inetnetmask, just give the first/highest N byte of the IP, as this is the sole determiner of class.) If an IP netmask is specified, only the largest contiguous set of most-significant 1 bits are used. Examples: "1.2.3.4" (netmask is /32 because 4 bytes given), "10" (netmask is /8 because it is a Class A address), "1.2.3.4/10", "67305985" (e.g. 4.3.2.1).

The inet actions were added in version 5.01.1112986377 20050408, and include the following (see also the SQL equivalents):

  • inetabbrev $inet Returns the shortest representation of $inet. This will include all contiguous most-significant bytes of the network, non-zero bytes of the host, and a netmask. Other trailing 0 bytes of the IP may be trimmed. Empty string is returned on error.

  • inetcanon $inet Returns canonical representation of $inet. This is dotted-decimal with all 4 bytes, and a /N netmask. Empty string is returned on error.

  • inetnetwork $inet Returns 4-decimal IP address with the network bits of $inet, and the host bits set to 0. Empty string is returned on error.

  • inethost $inet Returns 4-decimal IP address with the host bits of $inet, and the network bits set to 0. Empty string is returned on error.

  • inetbroadcast $inet Returns 4-decimal IP broadcast address for $inet, i.e. with the network bits, and host bits set to 1. Empty string is returned on error.

  • inetnetmask $inet Returns 4-decimal IP netmask for $inet, i.e. with the network bits set to 1, and host bits set to 0. Empty string is returned on error.

  • inetnetmasklen $inet Returns integer netmask length of $inet. -1 is returned on error.

  • inetcontains $inetA $inetB Returns 1 if $inetA contains $inetB, i.e. every address in $inetB occurs within the $inetA network. 0 is returned if not, or -1 on error.

  • inetclass $inet Returns class of $inet, e.g. A, B, C, D, E or classless if a different netmask is used. Empty string is returned on error.

  • inet2int $inet Returns integer representation of IP network/host bits of $inet (i.e. without netmask). Useful for compact storage of IPv4 address as integers instead of strings. -1 is returned on error (note that -1 may also be legitimately returned for an all-ones IP address, e.g. "255.255.255.255").

  • int2inet $i Returns inet string for integer $i taken as an IP address. Since no netmask can be stored in the integer form of an IP address, the returned IP string will not have a netmask. Empty string is returned on error.


EXAMPLE

<urlutil abs "http://example.com/dir/page.html" "other.html">

The return value in $ret would be "http://example.com/dir/other.html".


CAVEATS
The urlutil function was added in version 3.0.957600000 20000505.


SEE ALSO
fetch, urlinfo


Copyright © Thunderstone Software     Last updated: Dec 10 2018
Copyright © 2019 Thunderstone Software LLC. All rights reserved.