The following urlcp
settings control miscellaneous page
fetching behaviors:
allowbadchunkedinfo
(boolean)
If on (the default), try to allow certain bad chunked
Transfer-Encoding
information if encountered in a response
(e.g. missing chunk size), by just passing through remaining data
as-is (because chunked coding is mostly clear text anyway). If
off, fail the transaction. Note that regardless of the setting's
value, bad chunked information may indicate corrupt data in the
response. May help recover a fetch when the server erroneously
reports chunked coding in the response. Added in version
6.00.1315620000 20110909; previous versions behaved as if setting
was off.
alarmclose
(boolean)
Whether to use an alarm()
to terminate a blocking
connection close()
. The default is off. Added in version
4.04.1050700000 20030418.badhdrmidok
or badheadermidok
(boolean)
Whether to accept malformed headers in the middle (i.e. not last)
of the response headers. If on, malformed headers will be
discarded if followed by at least one valid header. If off, or no
valid header follows, the malformed header will be considered the
start of the body (which it might very well be). Added (and
defaults to on) in version 5.01.1102538511 20041208. Returns
previous setting.checkidleconneof
(boolean)
If on (the default), idle connections in the Keep-Alive cache are
checked for EOF (i.e. server closure) before being reused. The
server may have timed out the connection while the socket was
idle, which would otherwise cause the next fetch to fail. Added
in version 5.01.1115130972 20050503 (default off in previous
versions). Returns previous setting.clearproxycache
(no arguments)
Clears the cache of "bad" (non-responsive) proxies, which are
set to lower priority in PAC responses when other proxies are
listed (see proxyretrydelay
,
here). Also clears the last PAC
fetch timestamp (see pacfetchretrydelay
,
here). Added in version 7.05.
closeidleconn
(no arguments)
Closes any currently idle connections in the Keep-Alive cache.
Returns 1 if successful, 0 on error. Added in version
5.00.1093895662 20040830.defaults
(no arguments)
Resets all urlcp
settings to their default values.delaysave
(boolean)
Sets whether to delay the saving of output when using
<submit TOFILE=$file>
until the connection starts returning
data. The default is off, e.g. open the file immediately (always
deleting previous copy). Turning this setting on is useful when
repeatedly downloading to the same file, e.g. obtaining a periodic
update of a large data file. The original file will then be
preserved if the new fetch fails immediately (e.g. remote server
down), yet saves the disk space of a separate backup copy. Added
in version 4.0.997840000 20010814.domvalue $dompath $value
Sets the value of the DOM item indicated by $dompath
to
$value
. Note that this does not affect the JavaScript DOM,
but the near-parallel page DOM. This can be used to set form
input values, etc. and then obtain the submit URL and content via
<urlinfo domvalue>
. Added in version 5. Returns 0 on error.
emptyhttp09ok
(boolean)
Whether to accept empty HTTP/0.9 responses, i.e. a 0-byte response
with no headers. Such responses are technically legal (an empty
HTTP/0.9 document), but since few pre-HTTP/1.0 servers exist, are
more likely indicative of a server error. If off, an error
message is issued and an error is set. Added (and defaults to
off) in version 5.01.1097502096 20041011. Returns previous
setting.linger
(boolean)
Sets whether to set SO_LINGER
time of 4 seconds on sockets.
Added (and defaults to off) in version 5.01.1105153893 20050107.
Returns previous setting.reparent
(string)
If given a full path (e.g. "/local/tree
"), sets reroot
reparent mode and uses that path as the local tree root. If
given a full URL (e.g. "http://somesite.com/dir/page.html
"),
sets abs
reparent mode and uses that URL as the page's
URL (this is not recommended; links may become incorrect).reparentimg
(boolean)
If true (default), image links will be reparented; if false, they
will not. Only significant if reparentmode
is not off
.reparentmode
(string)
The returned HTML from a page will be reparented: all the links
will be changed in the raw document returned. How the links are
modified depends on the mode:
abs
or 1
Make all links absolute. With this mode, a page can be
fetched from a remote site and the returned document placed
directly in a local source tree, and even relative links will
correctly point to the original locations. If a URL is set
with reparent
, the page is reparented as if it were
fetched from there, instead of its actual location (the
default). (Setting a URL is not recommended, as the links may
become incorrect.)reroot
or 2
Re-path same-site links as if the entire remote site were
being copied locally to a subtree rooted at the URL path given
with reparent
. For example, with a reparent
path of /local/tree
, if the URL
http://somesite.com/dir/page.html
is fetched, it is
assumed it will be saved to /local/tree/dir/page.html
.
Thus a link such as /top/list.html
will become
/local/tree/top/list.html
. The link
../upone.html
would become
/local/tree/upone.html
. If no path is set with
reparent
, links become relative, as if the root were
/
.mirror
or 3
Make all links absolute, URL-encode them, and prefix the
reparent
URL.relatedfiles
or 4
For an email message, change all internal links (to other
parts of the message) to their safe filenames, as same-dir
relative links. All other (external) links will be made
absolute. This mode is used internally by
the mimeEntityGetBody()
function
(here)
when reparenting. Note: since it requires additional
parsed email message information, it cannot currently explicitly
be used by Vortex scripts.
hideexternal
or 5
Change all external links - those referring to outside the
page's current directory or below - to have a prefix of
"thismessage:
", to prevent their access. This can be
used to hide/disable external references in HTML email message
bodies during web display, after the HTML has been
message-reparented by the
mimeEntityGetBody()
(here)
function.
off
or 0
Turn off all reparenting. Added in version 3.01.968705387 20000911.
Default.
reparent
and reparentmode
settings do
not affect the links returned by <urlinfo links>.sendemptycontent
(boolean)
Whether to set Content-Length: 0
for empty requests. Some
servers will time out, expecting an EOF from the client, if an
empty request is sent with no Content-Length
. Added (and
defaults to on) in version 5.01.1097006042 20041005.
Returns previous setting.shutdownwr
(boolean)
Turns on or off the use of shutdown(SHUT_WR)
on HTTP or
Gopher sockets when all data has been sent and the connection is
not to be re-used (e.g. Keep-Alive has expired or is not in use).
This sends an EOF to the server to indicate that the client has
finished sending data. Some broken servers may expect such an EOF
even if Content-Length
is set properly in the request, and
may thus time out the request waiting for one. (Note that for
shutdown()
to actually be used, it may be necessary to
disable Keep-Alive via <urlcp maxconnrequests 1>
.)
Added (and defaults to on) in version 5.01.1105300267 20050109.
Returns previous setting.urlcanonslash
(boolean)
Whether to canonicalize backslashes ("\
") to forward
slashes ("/
") in URLs. On by default. Turning off may
impair URL parsing.urlcollapseslashes
(boolean)
Whether to collapse multiple forward slashes (e.g. "//
")
to a single forward slash in the path part of URLs. Note: does
not affect the double-slash that immediately follows the protocol
plus colon in some URL protocols. Off by default. Added in
version 7.03.1434400000 20150615.
DIAGNOSTICSurlcp
returns 1 on success, or 0 or nothing on error,
except as noted under specific options.
EXAMPLE<urlcp "maxpgsize" "1MB"> <urlcp "timeout" 300>
<fetch "http://www.somesite.com/bigpage.html">
CAVEATS
The urlcp
function was added Mar. 26 1997. Various settings
were added later.
SEE ALSOfetch
, submit
, urlinfo
, nslookup