The following urlcp settings control how or whether pages and
related URLs are fetched, such as frames and iframes:
encodings [add|del|set] [$encodings ...]
Sets the list of allowed content/transfer encodings for pages
fetched. The $encodings argument(s) are zero or more of
the values 7bit, 8bit, binary,
identity, chunked, gzip, deflate or
compress. The chunked encoding only applies to
transfer encodings; the remainder apply to both content and
transfer encodings. If the first value of the first argument is
add, the given encoding(s) will be added to the allowed
list; if del, deleted from it; if set, the list is
cleared and set to $encodings (this is the default action
if no add/del/set action is given). The
keyword all may be used to refer to all encodings, and
default may be used (with set) to re-set the default
(which is identity, chunked, gzip,
deflate and compress).
The Vortex fetch library will declare the list of encodings it
allows in Accept-Encoding and TE request headers, if
httpversion is set to 1.1 (these are 1.1 headers,
and some servers do not handle them as expected in a 1.0 request;
httpversion is 1.1 by default in version 6 and later).
It is up to the remote server to then choose encoding(s) from the
declared list(s). The content encoding(s) (if any) of the returned
document should be declared by the server in the
Content-Encoding header, and transfer encoding(s) in the
Transfer-Encoding header. Both types of encodings will be
decoded before the document is returned from <fetch> or
<urlinfo rawdoc>. If an encoding that is not allowed is
encountered, a "Disallowed Content- or Transfer-Encoding"
error is generated.
Added in version 5.01.1249073000 20090731. Returns previous list
of allowed encodings. See also the maxpgsize and
maxdownloadsize settings for how they interact with
encodings.
7bit, 8bit and binary were added in version
7.03.1430243000 20150428. These are MIME
Content-Transfer-Encoding values; some web servers (Apache)
are known to use them as HTTP Content-Encoding values
however.
fileexclude (list)
List of file trees to exclude (disallow) when fetching a local
file:// URL. The default is none (no restrictions) for
Windows, and "/dev/", "/proc/" and
"/debug/" for Unix. After fileroot is applied, if
the resulting local file path from a file:// URL has one of
these paths as a prefix, the URL will not be fetched. This can be
used to protect certain unsafe or private directories on a local
filesystem from being inadvertently walked. Does not apply to
FTP-mapped non-localhost file:// URLs. Added in version
4.02.1048785087 20030327. Aka fileexcludes. Returns
previous setting.fileinclude (list)
List of file trees to include (require) when fetching a local
file:// URL. The default is none (no restrictions). After
fileroot is applied, if the resulting local file path from
a file:// URL does not have one of these paths as a
prefix, the URL will not be fetched. This can be used to keep a
local filesystem walk within certain directories. Does not apply
to FTP-mapped non-localhost file:// URLs. Added in version
4.02.1048785087 20030327. Aka fileincludes. Returns
previous setting.filenonlocal (string)
How to handle non-localhost file:// URLs, i.e. ones
with a specific host other than empty string or
"localhost". The value can be one of:
off
Default: do not allow non-localhost file:// URLs. This
ensures that no FTP or UNC paths are used.unc
Map non-localhost file:// URLs to their UNC paths and
attempt to open as a local file. E.g. the URL
"file://myhost/mydir/myfile" would map to the file
"\\myhost\mydir\myfile" under Windows and
"//myhost/mydir/myfile" under Unix (but see
modifications under fileroot below). This allows the
behavior of web browsers that support UNC paths to be emulated
on operating systems that support UNC, for consistency with
browser views.ftp
Map non-localhost file:// URLs to FTP. E.g. the URL
"file://myhost/mydir/myfile" would map to the URL
"ftp://myhost/mydir/myfile" and be fetched as such.
This allows the behavior of some browsers/operating systems
that do not support UNC paths to be emulated.
filenonlocal only applies when a
proxy is not set; when a proxy is active, all file://
URLs are passed to the proxy.fileroot (string)
Sets the root directory to prepend to local file:// URL
paths; default none. E.g. with fileroot set to
"/docs", the URL "file://localhost/dir/file.txt"
would be read from the file
"/docs/localhost/dir/file.txt". Also applies to
non-localhost URLs when filenonlocal is set to
unc, e.g. the URL "file://myhost/mydir/myfile" is
read from the file "/docs/myhost/mydir/myfile". This
allows both localhost and non-localhost
file:// URLs to be mapped to a single directory hierarchy,
perhaps where network filesystems corresponding to individual
host(s) are mounted. Added in version 4.02.1048785087 20030327.
Returns previous setting.
filetypes [add|del|set] [file|dir|device|symlink|other ...]
Sets the list of allowed file types for local file:// URLs.
The possible values are file for ordinary files, dir
for directories, device for devices, symlink for
symbolic links (if supported by operating system), and
other for other types (sockets etc.). If the first value
of the first argument is add, the given list will be added
to the allowed list; if del, deleted from; if set,
cleared and set (the default). The default list is
file, dir and symlink. If the file derived
from a local file:// URL is not one of these types, it is
disallowed. This prevents links to URLs like
"file://localhost/dev/zero" from hampering a walk. Added
in version 4.02.1048785087 20030327. Returns previous
setting.followpermanentredirects (boolean)
Whether to follow (fetch) permanent (301) redirects and
their equivalents (e.g. file:// directory trailing-slash
redirects). The default is on, which follows them. Turning
this off results in a fetch error when such redirects are
encountered - the <urlinfo errtoken> NotFollowingPermanentRedirect. See
<urlinfo permanenturl>, canonicalurl,
actualurl, redirs for how they are affected when
followpermanentredirects is off.
Stopping at permanent redirects allows a script to take other action when they are encountered (such as updating stored URL) before re-fetching the redirect. Added in version 8.01.1689976778 20230721; previous versions behaved as if this setting were always on.
ftpactivepassivefallback (boolean)
If on (the default), FTP passive mode fetches will fall back to
active mode on failure, and vice-versa. This may help resolve a
fetch to an FTP server that does not support the current mode (or
is firewalled), i.e. in cases where ftppassive is not set
properly for the given situation. Only failures of the
PORT or PASV command, or a temporary (5nn) error
response to the main (RETR/STOR/etc.) command will
trigger the mode switch. Added in version 6.00.1304040000
20110428. Returns previous setting.
Note that if the correct mode (active or passive) is already known
in advance, it is preferable to set it from the outset via the
ftppassive setting, to avoid potential delays and/or errors
from relying on this fallback switchover.
ftppassive (boolean)
If on (the default), FTP passive mode is used first for FTP protocol fetches. If off, FTP active mode is used first. Passive mode can be useful in situations where a firewall on the client (Vortex) side of the network prevents an FTP transfer (e.g. timeout). This is due to the nature of active-mode data transfers, where the remote (server) side is required to initiate a separate socket connection back to the client (even though the client initiates the original control connection). Many firewalls will block such incoming connections, causing the transfer to timeout. Passive mode allows the client to initiate both the control and data connections, which is often permitted by the client's firewall. Added in version 5.01.1121350905 20050714. Note: Prior to version 6.00.1304040000 20110428 this setting was off by default. Returns previous setting.
Note that if ftpactivepassivefallback is on (the default),
the alternate mode may be used if the first mode (set by this
setting) fails.
ftprelativepaths (boolean)
If on (the default), FTP paths are assumed to be
login-dir-relative, so the URL "ftp://host/dir/file.txt"
would be fetched with "RETR home/dir/file.txt"
instead of "RETR /dir/file.txt" (where home is the FTP
user's login directory). For most (i.e. anonymous) FTP URLs this
makes no difference, as the FTP login dir is typically at the root
of the FTP-accessible tree. However, for many FTP URLs that
require a true login, the FTP login dir is not the root dir, but
the user's home directory. Thus, with ftprelativepaths on,
the above URL would fetch "dir/file.txt" from the user's
home directory - not "/dir/file.txt" from the root dir,
where it may not exist. With ftprelativepaths off, the
user's home directory - which may be unknown or vary from user to
user - would have to be specified in the FTP URL in order to get
back to the FTP login dir.
Dirs outside the FTP login dir may still be accessed when
ftprelativepaths is on, however, by encoding an extra slash
in the URL, e.g. "ftp://host/%2Fdir/file.txt". Added in
version 6. In previous versions the setting was effectively off.
ftpsendrelativepathsasabsolute (boolean)
If on (the default), relative FTP paths (i.e. due to
ftprelativepaths) are changed to absolute paths when sent
to the server, by prefixing the login directory (obtained with a
PWD after login). This avoids the occasional need for a
no-argument "CWD" command to go back to the login
directory (which some servers do not support), while still
supporting the functionality of ftprelativepaths (no home
dir needed in URLs). If off, the login directory is not prefixed;
i.e. the URL "ftp://host/dir/file.txt" is fetched with
"RETR dir/file.txt". This setting has no effect if
ftprelativepaths is off. Added in version 6.00.1301360000
20110328.
getframes (boolean)
If on, the <frame> objects of documents are fetched. The
raw HTML returned will remain the same (the original document),
but the formatted text from <urlinfo text> will be replaced
and instead contain each frame in sequence. The links returned by
<urlinfo links> or allrefs will be the list of all the
frames' links (e.g. the original URL - frame parent - will not
have its links nor frames etc. included). The default is false,
e.g. frames are not fetched. Note that only one level of frames
is fetched, i.e. a <frame> link inside a <frame>
link will not be fetched.
getiframes (boolean)
If on, inline <iframe> documents are fetched. The raw HTML
returned will remain the same (the original document). The
formatted text from <urlinfo text> will also remain, except
that <iframe> blocks will be replaced with their referenced
document text in-line. Note that like frames, only one level of
<iframe>s is fetched. The <iframe> links are
removed from the iframes, links, and allrefs
lists returned by the urlinfo function. The default is
false. Added in version 3.01.963000000 20000707.
getscripts (boolean)
If on, and javascript is on,
<SCRIPT SRC=...></SCRIPT> URLs on a page will also be
fetched and run if they refer to JavaScript (and their URLs
removed from the urlinfo links and allrefs
lists). If off (default), such URLs are not fetched, and only
inline <SCRIPT>...</SCRIPT> scripts are run (if
javascript is on). Returns previous value. Added in
version 4.01.1023800000 20020611.
httpversion $version
Sets the HTTP version to use for requests. The $version
argument is one of 0.9, 1.0 or 1.1. HTTP/1.0
is the default for Texis/Webinator version 5 and earlier; HTTP/1.1
is the default for Texis/Webinator version 6 and later (and is
only conditionally supported). It may be necessary to set
1.1 to fully utilize some features, e.g. content/transfer
encodings (see the encodings setting,
here). Added in version 5.01.1249039000
20090731. Returns previous version.ignoreanchorframes (boolean)
Whether to ignore frames and IFRAMEs that are just anchors, e.g.
src="#". These usually just contain JavaScript, and fetching
them just doubles up the content, links etc. of the parent URL.
On by default. Added in version 6.inputfileroot (string)
If set, all set/non-empty <input type="file"> values must
be within this local directory tree (and not contain
"../" components to get out of it), when <urlcp
domvalue "...submitContent"> or variants are called. Value(s)
that are outside this setting will cause an error such as "Will not add form input `...' file `...' to submit content: Not in
inputfileroot directory or contains `../'", and will be treated
as empty (i.e. sent as empty value with no file). This is for
security, to ensure all to-be-uploaded files are from a known
directory. Added in version 6.00.1335222312 20120423. Default is
unset (i.e. no check is performed). Returns 1 on success, 0 on
error.
ipprotocols [add|del|set] [$protocol ...]
Sets the list of IP protocols to allow for page fetches. One or
more of IPv4 and/or IPv6. If the first value of the
first argument is add, the given list will be added to the
allowed list; if del, deleted from; if set, cleared
and set (the default). The default list, settable with set
default, is currently IPv4 IPv6. Returns nonzero
on success, zero on error. Added in version 8.
Note that for DNS, the IP protocol used over-the-wire for lookup
is not affected by this setting - as opposed to the DNS
query type (A vs. AAAA), which is. The DNS lookup
over-the-wire protocol is determinely solely by the IP family of
the nameservers IP address(es)
(here). Thus it is possible to use an
IPv4 nameserver to look up and connect with IPv6 hosts, or
vice-versa.
Note that allowing both IPv4 and IPv6 may result in undesired behavior on occasion, i.e. if network/DNS/etc. configuration is inconsistent. For example, an IPv6 address may be found for a hostname, but fail to connect if the server only responds to its IPv4 address.
Note that not all protocols are available (supported) on all
systems; see <urlinfo ipprotocolsavailable>
(here). See also
<urlinfo ipprotocols> (here) for
a list of allowed protocols, i.e. this setting's current value.
ipv6scopeidinhostheader (boolean)
Whether to print the scope id (e.g. %eth0 part) of an IPv6
link-local address in the HTTP Host header, if such an
address is used in the URL. Some web servers (e.g. Apache) do not
accept scope ids (i.e. due to a strict interpretation of RFC 4007
11.2), and will fail web requests containing them. Turning off
ipv6scopeidinhostheader (the default) causes scope ids not
to be printed in Host headers, to conform with such servers.
Added in version 8. Returns previous value of setting.
linkprotocols [add|del|set] [$protocols|allowed ...]
Sets the list of protocols allowed to be returned in links from a
page (i.e. the links value of the urlinfo function,
here). Note that this setting does not
control what can be fetched, only the list of links returned from a
page. It can be used as a filter to remove invalid-protocol links
returned by a page. The $protocols argument(s) are a list
of zero or more values, each of which is either a recognized
protocol (see protocols below), the value unknown
for unknown protocols, or the value allowed for just
protocols permitted by the protocols setting. The default
is all protocols plus unknown.
If the first value of the first argument is add, the given
list will be added to the allowed list; if del, deleted
from; if set, cleared and set (the default). Returns
previous setting. Added in version 4.01.1029180431 20020812.methods [add|del|set] [$methods ...]
Sets the list of request methods allowed for page fetching
(default all). The $methods argument(s) are zero or more
of the values OPTIONS, GET, HEAD,
POST, PUT, DELETE, TRACE,
MKDIR, RENAME, SCHEDULE, COMPILE or
RUN. Not all methods are supported by all protocols;
e.g. MKDIR is only supported by FTP. If the first value of
the first argument is add, the given method list will be
added to the allowed list; if del, deleted from; if
set, cleared and set (the default). Alternately, the
default methods may be restored with set default. Returns
previous setting. Added in version 5.01.1232696000 20090123.netmode (string)
Sets the routines to use for page fetching. The default is
int, which uses Texis' internal routines. For Windows
versions, netmode may be set to sys, which uses the
system routines. This may allow certain authenticated sites to be
accessed, if the internal routines' NTLM authentication is not
sufficient for example. However, parallelization and many other
settings and features are not unavailable. Added in version
4.04.1068000000 20031104.offsiteok (boolean)
If on (default), URLs that are off-site from the original URL will be fetched if needed. If false, such URLs will not be fetched. This includes redirects, components such as frames, iframes and scripts, and FTP data sockets. This setting does not affect the original (given) URL.
protocols [add|del|set] [$protocols ...]
Sets the list of URL protocols allowed to be fetched.
$protocols is a list of zero or more of the supported
protocols http, ftp, gopher,
javascript, https or file. The default list
is http, ftp, gopher, javascript and
https. (Note that javascript must be also on if
JavaScript URLs are to work.) If the first value of the first
argument is add, the given list will be added to the
allowed list; if del, deleted from; if set, cleared
and set (the default). Returns the previous list of allowed
protocols. Added in version 4.01.1024300000 20020617.
file support added in version 4.02.1048785087 20030327.
Note that changing these protocols may affect what links are
returned by <urlinfo links> (here), if
linkprotocols (here) is
allowed.
proxy [type] $proxyUrl
Takes a URL as argument. The protocol, host
and port of $proxyUrl will be used as the proxy or tunnel
to fetch all future URL requests (except for javascript:
URLs, which are always evaluated internally as they are source
code, not resource locations). The $proxyUrl protocol must
be HTTP or (in version 4.02.1048785087 20030327 and later) HTTPS.
In version 4.04.1077500000 20040222 and later, an empty string
value will clear the proxy, i.e. turn it off and resume direct
connections.
In version 7.05 and later, a type may be given; one of:
pacurl
The URL given is a proxy auto-config URL, i.e. a JavaScript
proxy.pac file containing a FindProxyForURL(url,
host) function
will return the
prox(ies) to use for a given URL. The URL MIME type should be
application/x-ns-proxy-autoconfig. The PAC script will
be automatically fetched at the next <fetch> or
<submit> statement, as needed.
pacscript
The URL argument is instead the PAC script itself.
If proxy auto-config is enabled via either of these types, the
script's FindProxyForURL() function will be called for
every URL. This function returns a list of one or more proxies to
use for the given URL (or DIRECT to not use a proxy).
Thus, proxy auto-config allows URL-by-URL customization of
proxies, and/or organization-wide proxy configuration. The PAC
script is run in a restricted environment; see
findproxyforurl.com for details on the JavaScript functions
available to PAC scripts.
Failure to fetch the pacurl script will cause the current
<fetch> to fail with PacError, as well as all
subsequent <fetch>es until pacfetchretrydelay
expires (here). Proxies that are
deemed "bad" (e.g. unresponsive) will have a lower priority for
proxyretrydelay seconds (here).
An optional proxy mode argument (same as proxymode) may
also be given after the URL, if no type is specified. This
syntax is for back-compatibility and is deprecated in favor of the
proxymode setting; it may be removed in future releases.
proxymode $mode
Determines how to use a proxy (if set). $mode is one of:
auto
Automatically select proxy or tunnel mode depending on the
requested URL: tunnel for HTTPS, otherwise proxy. This is the
default if $mode is not specified.
proxy
Always use proxy mode, i.e. tell proxy to GET (or
whatever method was requested) the requested absolute URL.
tunnel
Always use tunnel mode, i.e. tell proxy to CONNECT to
the requested host and port, then proceed with request as if
directly connected to the requested server. If the request
URL's protocol cannot be tunneled (e.g. file: or FTP),
the request fails.
Added in version 7.05. In versions prior to 7.00.1363052000
20130311, the mode was always proxy, even for
javascript: URLs. In version 7.00.1363052000
20130311 through 7.04, the default mode was auto.
Note that an HTTPS tunnel to an HTTPS origin server is not currently supported by the fetch library (an HTTPS proxy to an HTTPS server is supported, however). Many tunnels do not support an HTTPS tunnel to an HTTP origin server, and some proxies do not support HTTPS origin servers (since the proxy would then have to provide a certificate etc.).