The following urlcp
settings control how or whether pages and
related URLs are fetched, such as frames and iframes:
encodings [add|del|set] [$encodings ...]
Sets the list of allowed content/transfer encodings for pages
fetched. The $encodings
argument(s) are zero or more of
the values 7bit
, 8bit
, binary
,
identity
, chunked
, gzip
, deflate
or
compress
. The chunked
encoding only applies to
transfer encodings; the remainder apply to both content and
transfer encodings. If the first value of the first argument is
add
, the given encoding(s) will be added to the allowed
list; if del
, deleted from it; if set
, the list is
cleared and set to $encodings
(this is the default action
if no add
/del
/set
action is given). The
keyword all
may be used to refer to all encodings, and
default
may be used (with set
) to re-set the default
(which is identity
, chunked
, gzip
,
deflate
and compress
).
The Vortex fetch library will declare the list of encodings it
allows in Accept-Encoding
and TE
request headers, if
httpversion
is set to 1.1
(these are 1.1 headers,
and some servers do not handle them as expected in a 1.0 request;
httpversion
is 1.1
by default in version 6 and later).
It is up to the remote server to then choose encoding(s) from the
declared list(s). The content encoding(s) (if any) of the returned
document should be declared by the server in the
Content-Encoding
header, and transfer encoding(s) in the
Transfer-Encoding
header. Both types of encodings will be
decoded before the document is returned from <fetch>
or
<urlinfo rawdoc>
. If an encoding that is not allowed is
encountered, a "Disallowed Content- or Transfer-Encoding"
error is generated.
Added in version 5.01.1249073000 20090731. Returns previous list
of allowed encodings. See also the maxpgsize
and
maxdownloadsize
settings for how they interact with
encodings.
7bit
, 8bit
and binary
were added in version
7.03.1430243000 20150428. These are MIME
Content-Transfer-Encoding
values; some web servers (Apache)
are known to use them as HTTP Content-Encoding
values
however.
fileexclude
(list)
List of file trees to exclude (disallow) when fetching a local
file://
URL. The default is none (no restrictions) for
Windows, and "/dev/
", "/proc/
" and
"/debug/
" for Unix. After fileroot
is applied, if
the resulting local file path from a file://
URL has one of
these paths as a prefix, the URL will not be fetched. This can be
used to protect certain unsafe or private directories on a local
filesystem from being inadvertently walked. Does not apply to
FTP-mapped non-localhost
file://
URLs. Added in version
4.02.1048785087 20030327. Aka fileexcludes
. Returns
previous setting.fileinclude
(list)
List of file trees to include (require) when fetching a local
file://
URL. The default is none (no restrictions). After
fileroot
is applied, if the resulting local file path from
a file://
URL does not have one of these paths as a
prefix, the URL will not be fetched. This can be used to keep a
local filesystem walk within certain directories. Does not apply
to FTP-mapped non-localhost
file://
URLs. Added in version
4.02.1048785087 20030327. Aka fileincludes
. Returns
previous setting.filenonlocal
(string)
How to handle non-localhost
file://
URLs, i.e. ones
with a specific host other than empty string or
"localhost
". The value can be one of:
off
Default: do not allow non-localhost
file://
URLs. This
ensures that no FTP or UNC paths are used.unc
Map non-localhost
file://
URLs to their UNC paths and
attempt to open as a local file. E.g. the URL
"file://myhost/mydir/myfile
" would map to the file
"\\myhost\mydir\myfile
" under Windows and
"//myhost/mydir/myfile
" under Unix (but see
modifications under fileroot
below). This allows the
behavior of web browsers that support UNC paths to be emulated
on operating systems that support UNC, for consistency with
browser views.ftp
Map non-localhost
file://
URLs to FTP. E.g. the URL
"file://myhost/mydir/myfile
" would map to the URL
"ftp://myhost/mydir/myfile
" and be fetched as such.
This allows the behavior of some browsers/operating systems
that do not support UNC paths to be emulated.
filenonlocal
only applies when a
proxy is not set; when a proxy is active, all file://
URLs are passed to the proxy.fileroot
(string)
Sets the root directory to prepend to local file://
URL
paths; default none. E.g. with fileroot
set to
"/docs
", the URL "file://localhost/dir/file.txt
"
would be read from the file
"/docs/localhost/dir/file.txt
". Also applies to
non-localhost
URLs when filenonlocal
is set to
unc
, e.g. the URL "file://myhost/mydir/myfile
" is
read from the file "/docs/myhost/mydir/myfile
". This
allows both localhost
and non-localhost
file://
URLs to be mapped to a single directory hierarchy,
perhaps where network filesystems corresponding to individual
host(s) are mounted. Added in version 4.02.1048785087 20030327.
Returns previous setting.
filetypes [add|del|set] [file|dir|device|symlink|other ...]
Sets the list of allowed file types for local file://
URLs.
The possible values are file
for ordinary files, dir
for directories, device
for devices, symlink
for
symbolic links (if supported by operating system), and
other
for other types (sockets etc.). If the first value
of the first argument is add
, the given list will be added
to the allowed list; if del
, deleted from; if set
,
cleared and set (the default). The default list is
file
, dir
and symlink
. If the file derived
from a local file://
URL is not one of these types, it is
disallowed. This prevents links to URLs like
"file://localhost/dev/zero
" from hampering a walk. Added
in version 4.02.1048785087 20030327. Returns previous
setting.followpermanentredirects
(boolean)
Whether to follow (fetch) permanent (301) redirects and
their equivalents (e.g. file:// directory trailing-slash
redirects). The default is on, which follows them. Turning
this off results in a fetch error when such redirects are
encountered - the <urlinfo errtoken>
NotFollowingPermanentRedirect. See
<urlinfo permanenturl>
, canonicalurl
,
actualurl
, redirs
for how they are affected when
followpermanentredirects
is off.
Stopping at permanent redirects allows a script to take other action when they are encountered (such as updating stored URL) before re-fetching the redirect. Added in version 8.01.1689976778 20230721; previous versions behaved as if this setting were always on.
ftpactivepassivefallback
(boolean)
If on (the default), FTP passive mode fetches will fall back to
active mode on failure, and vice-versa. This may help resolve a
fetch to an FTP server that does not support the current mode (or
is firewalled), i.e. in cases where ftppassive
is not set
properly for the given situation. Only failures of the
PORT
or PASV
command, or a temporary (5nn) error
response to the main (RETR
/STOR
/etc.) command will
trigger the mode switch. Added in version 6.00.1304040000
20110428. Returns previous setting.
Note that if the correct mode (active or passive) is already known
in advance, it is preferable to set it from the outset via the
ftppassive
setting, to avoid potential delays and/or errors
from relying on this fallback switchover.
ftppassive
(boolean)
If on (the default), FTP passive mode is used first for FTP protocol fetches. If off, FTP active mode is used first. Passive mode can be useful in situations where a firewall on the client (Vortex) side of the network prevents an FTP transfer (e.g. timeout). This is due to the nature of active-mode data transfers, where the remote (server) side is required to initiate a separate socket connection back to the client (even though the client initiates the original control connection). Many firewalls will block such incoming connections, causing the transfer to timeout. Passive mode allows the client to initiate both the control and data connections, which is often permitted by the client's firewall. Added in version 5.01.1121350905 20050714. Note: Prior to version 6.00.1304040000 20110428 this setting was off by default. Returns previous setting.
Note that if ftpactivepassivefallback
is on (the default),
the alternate mode may be used if the first mode (set by this
setting) fails.
ftprelativepaths
(boolean)
If on (the default), FTP paths are assumed to be
login-dir-relative, so the URL "ftp://host/dir/file.txt
"
would be fetched with "RETR
home/dir/file.txt
"
instead of "RETR /dir/file.txt" (where home is the FTP
user's login directory). For most (i.e. anonymous) FTP URLs this
makes no difference, as the FTP login dir is typically at the root
of the FTP-accessible tree. However, for many FTP URLs that
require a true login, the FTP login dir is not the root dir, but
the user's home directory. Thus, with ftprelativepaths
on,
the above URL would fetch "dir/file.txt
" from the user's
home directory - not "/dir/file.txt
" from the root dir,
where it may not exist. With ftprelativepaths
off, the
user's home directory - which may be unknown or vary from user to
user - would have to be specified in the FTP URL in order to get
back to the FTP login dir.
Dirs outside the FTP login dir may still be accessed when
ftprelativepaths
is on, however, by encoding an extra slash
in the URL, e.g. "ftp://host/%2Fdir/file.txt
". Added in
version 6. In previous versions the setting was effectively off.
ftpsendrelativepathsasabsolute
(boolean)
If on (the default), relative FTP paths (i.e. due to
ftprelativepaths
) are changed to absolute paths when sent
to the server, by prefixing the login directory (obtained with a
PWD
after login). This avoids the occasional need for a
no-argument "CWD
" command to go back to the login
directory (which some servers do not support), while still
supporting the functionality of ftprelativepaths
(no home
dir needed in URLs). If off, the login directory is not prefixed;
i.e. the URL "ftp://host/dir/file.txt
" is fetched with
"RETR dir/file.txt". This setting has no effect if
ftprelativepaths
is off. Added in version 6.00.1301360000
20110328.
getframes
(boolean)
If on, the <frame>
objects of documents are fetched. The
raw HTML returned will remain the same (the original document),
but the formatted text from <urlinfo text> will be replaced
and instead contain each frame in sequence. The links returned by
<urlinfo links> or allrefs will be the list of all the
frames' links (e.g. the original URL - frame parent - will not
have its links nor frames etc. included). The default is false,
e.g. frames are not fetched. Note that only one level of frames
is fetched, i.e. a <frame>
link inside a <frame>
link will not be fetched.
getiframes
(boolean)
If on, inline <iframe>
documents are fetched. The raw HTML
returned will remain the same (the original document). The
formatted text from <urlinfo text> will also remain, except
that <iframe>
blocks will be replaced with their referenced
document text in-line. Note that like frames, only one level of
<iframe>
s is fetched. The <iframe>
links are
removed from the iframes
, links
, and allrefs
lists returned by the urlinfo
function. The default is
false. Added in version 3.01.963000000 20000707.
getscripts
(boolean)
If on, and javascript
is on,
<SCRIPT SRC=...></SCRIPT>
URLs on a page will also be
fetched and run if they refer to JavaScript (and their URLs
removed from the urlinfo
links
and allrefs
lists). If off (default), such URLs are not fetched, and only
inline <SCRIPT>...</SCRIPT>
scripts are run (if
javascript
is on). Returns previous value. Added in
version 4.01.1023800000 20020611.
httpversion $version
Sets the HTTP version to use for requests. The $version
argument is one of 0.9
, 1.0
or 1.1
. HTTP/1.0
is the default for Texis/Webinator version 5 and earlier; HTTP/1.1
is the default for Texis/Webinator version 6 and later (and is
only conditionally supported). It may be necessary to set
1.1
to fully utilize some features, e.g. content/transfer
encodings (see the encodings
setting,
here). Added in version 5.01.1249039000
20090731. Returns previous version.ignoreanchorframes
(boolean)
Whether to ignore frames and IFRAMEs that are just anchors, e.g.
src="#". These usually just contain JavaScript, and fetching
them just doubles up the content, links etc. of the parent URL.
On by default. Added in version 6.inputfileroot
(string)
If set, all set/non-empty <input type="file">
values must
be within this local directory tree (and not contain
"../
" components to get out of it), when <urlcp
domvalue "...submitContent"> or variants are called. Value(s)
that are outside this setting will cause an error such as "Will not add form input `...' file `...' to submit content: Not in
inputfileroot directory or contains `../'", and will be treated
as empty (i.e. sent as empty value with no file). This is for
security, to ensure all to-be-uploaded files are from a known
directory. Added in version 6.00.1335222312 20120423. Default is
unset (i.e. no check is performed). Returns 1 on success, 0 on
error.
ipprotocols [add|del|set] [$protocol ...]
Sets the list of IP protocols to allow for page fetches. One or
more of IPv4 and/or IPv6. If the first value of the
first argument is add
, the given list will be added to the
allowed list; if del
, deleted from; if set
, cleared
and set (the default). The default list, settable with set
default, is currently IPv4 IPv6. Returns nonzero
on success, zero on error. Added in version 8.
Note that for DNS, the IP protocol used over-the-wire for lookup
is not affected by this setting - as opposed to the DNS
query type (A vs. AAAA), which is. The DNS lookup
over-the-wire protocol is determinely solely by the IP family of
the nameservers
IP address(es)
(here). Thus it is possible to use an
IPv4 nameserver to look up and connect with IPv6 hosts, or
vice-versa.
Note that allowing both IPv4 and IPv6 may result in undesired behavior on occasion, i.e. if network/DNS/etc. configuration is inconsistent. For example, an IPv6 address may be found for a hostname, but fail to connect if the server only responds to its IPv4 address.
Note that not all protocols are available (supported) on all
systems; see <urlinfo ipprotocolsavailable>
(here). See also
<urlinfo ipprotocols>
(here) for
a list of allowed protocols, i.e. this setting's current value.
ipv6scopeidinhostheader
(boolean)
Whether to print the scope id (e.g. %eth0 part) of an IPv6
link-local address in the HTTP Host header, if such an
address is used in the URL. Some web servers (e.g. Apache) do not
accept scope ids (i.e. due to a strict interpretation of RFC 4007
11.2), and will fail web requests containing them. Turning off
ipv6scopeidinhostheader
(the default) causes scope ids not
to be printed in Host headers, to conform with such servers.
Added in version 8. Returns previous value of setting.
linkprotocols [add|del|set] [$protocols|allowed ...]
Sets the list of protocols allowed to be returned in links from a
page (i.e. the links
value of the urlinfo
function,
here). Note that this setting does not
control what can be fetched, only the list of links returned from a
page. It can be used as a filter to remove invalid-protocol links
returned by a page. The $protocols
argument(s) are a list
of zero or more values, each of which is either a recognized
protocol (see protocols
below), the value unknown
for unknown protocols, or the value allowed
for just
protocols permitted by the protocols
setting. The default
is all protocols plus unknown
.
If the first value of the first argument is add
, the given
list will be added to the allowed list; if del
, deleted
from; if set
, cleared and set (the default). Returns
previous setting. Added in version 4.01.1029180431 20020812.methods [add|del|set] [$methods ...]
Sets the list of request methods allowed for page fetching
(default all). The $methods
argument(s) are zero or more
of the values OPTIONS
, GET
, HEAD
,
POST
, PUT
, DELETE
, TRACE
,
MKDIR
, RENAME
, SCHEDULE
, COMPILE
or
RUN
. Not all methods are supported by all protocols;
e.g. MKDIR
is only supported by FTP. If the first value of
the first argument is add
, the given method list will be
added to the allowed list; if del
, deleted from; if
set
, cleared and set (the default). Alternately, the
default methods may be restored with set default. Returns
previous setting. Added in version 5.01.1232696000 20090123.netmode
(string)
Sets the routines to use for page fetching. The default is
int
, which uses Texis' internal routines. For Windows
versions, netmode
may be set to sys
, which uses the
system routines. This may allow certain authenticated sites to be
accessed, if the internal routines' NTLM authentication is not
sufficient for example. However, parallelization and many other
settings and features are not unavailable. Added in version
4.04.1068000000 20031104.offsiteok
(boolean)
If on (default), URLs that are off-site from the original URL will be fetched if needed. If false, such URLs will not be fetched. This includes redirects, components such as frames, iframes and scripts, and FTP data sockets. This setting does not affect the original (given) URL.
protocols [add|del|set] [$protocols ...]
Sets the list of URL protocols allowed to be fetched.
$protocols
is a list of zero or more of the supported
protocols http
, ftp
, gopher
,
javascript
, https
or file
. The default list
is http
, ftp
, gopher
, javascript
and
https
. (Note that javascript
must be also on if
JavaScript URLs are to work.) If the first value of the first
argument is add
, the given list will be added to the
allowed list; if del
, deleted from; if set
, cleared
and set (the default). Returns the previous list of allowed
protocols. Added in version 4.01.1024300000 20020617.
file
support added in version 4.02.1048785087 20030327.
Note that changing these protocols may affect what links are
returned by <urlinfo links>
(here), if
linkprotocols
(here) is
allowed
.
proxy [type] $proxyUrl
Takes a URL as argument. The protocol, host
and port of $proxyUrl
will be used as the proxy or tunnel
to fetch all future URL requests (except for javascript:
URLs, which are always evaluated internally as they are source
code, not resource locations). The $proxyUrl
protocol must
be HTTP or (in version 4.02.1048785087 20030327 and later) HTTPS.
In version 4.04.1077500000 20040222 and later, an empty string
value will clear the proxy, i.e. turn it off and resume direct
connections.
In version 7.05 and later, a type
may be given; one of:
pacurl
The URL given is a proxy auto-config URL, i.e. a JavaScript
proxy.pac file containing a FindProxyForURL(url,
host) function will return the
prox(ies) to use for a given URL. The URL MIME type should be
application/x-ns-proxy-autoconfig
. The PAC script will
be automatically fetched at the next <fetch>
or
<submit>
statement, as needed.
pacscript
The URL argument is instead the PAC script itself.
If proxy auto-config is enabled via either of these types, the
script's FindProxyForURL()
function will be called for
every URL. This function returns a list of one or more proxies to
use for the given URL (or DIRECT
to not use a proxy).
Thus, proxy auto-config allows URL-by-URL customization of
proxies, and/or organization-wide proxy configuration. The PAC
script is run in a restricted environment; see
findproxyforurl.com
for details on the JavaScript functions
available to PAC scripts.
Failure to fetch the pacurl
script will cause the current
<fetch>
to fail with PacError
, as well as all
subsequent <fetch>
es until pacfetchretrydelay
expires (here). Proxies that are
deemed "bad" (e.g. unresponsive) will have a lower priority for
proxyretrydelay
seconds (here).
An optional proxy mode argument (same as proxymode
) may
also be given after the URL, if no type
is specified. This
syntax is for back-compatibility and is deprecated in favor of the
proxymode
setting; it may be removed in future releases.
proxymode $mode
Determines how to use a proxy (if set). $mode
is one of:
auto
Automatically select proxy or tunnel mode depending on the
requested URL: tunnel for HTTPS, otherwise proxy. This is the
default if $mode
is not specified.
proxy
Always use proxy mode, i.e. tell proxy to GET
(or
whatever method was requested) the requested absolute URL.
tunnel
Always use tunnel mode, i.e. tell proxy to CONNECT
to
the requested host and port, then proceed with request as if
directly connected to the requested server. If the request
URL's protocol cannot be tunneled (e.g. file:
or FTP),
the request fails.
Added in version 7.05. In versions prior to 7.00.1363052000
20130311, the mode was always proxy
, even for
javascript:
URLs. In version 7.00.1363052000
20130311 through 7.04, the default mode was auto
.
Note that an HTTPS tunnel to an HTTPS origin server is not currently supported by the fetch library (an HTTPS proxy to an HTTPS server is supported, however). Many tunnels do not support an HTTPS tunnel to an HTTP origin server, and some proxies do not support HTTPS origin servers (since the proxy would then have to provide a certificate etc.).