Headers

The following urlcp settings control what headers are sent, which can affect what document the remote web server will return. Some settings control what headers are "received" or returned.

  • accept (2 arguments) Set the HTTP Accept header list of acceptable/desired MIME types. Each value of the first argument ($value1) is a MIME media range, e.g. "text/html" or "image/*". The corresponding value of $value2, if given, is a "quality" value, a percentage number from 0-100. If greater than 0, the q value of the corresponding media range is set to that value. If $value2 has fewer values than $value1, the last value of $value2, if any, is reused. See the HTTP specification for details on how these values are used by Web servers. The default Accept list (if not set) is "*/*", e.g. any type.

    Changing the Accept list may affect the content type of the document a Web server will send for a given URL, but it is no guarantee that the requested type(s) will be returned. It is up to the server to send the most appropriate form of a document based on the Accept list.

  • clearheaders (no arguments) Undo all headers set with header. Any header values that overrode builtin headers will be restored to their builtin values.

  • fileresolveownership (boolean)

    Whether to resolve the owner and group SIDs (under Windows) and names of locally-fetched file:// URLs. If enabled, these will be returned in the response headers File-Owner-SID, File-Group-SID (under Windows), and File-Owner-Name, File-Group-Name (all platforms). (File-Uid, File-Gid are always returned under Unix, since no addditional traffic is needed to determine them.)

    Off by default, since resolving this information uses extra network traffic and time, possibly blocking if the domain controller or NIS server cannot be reached. Added in version 8.01.1669072604 20221121.

  • header (list, 2 arguments)

    Set the HTTP request headers given in the first argument, to the corresponding values in the second argument. This can be used to set additional headers not otherwise settable. Note that cookies are automatically handled in version 4.01.1022000000 20020521 and later and thus Cookie headers do not generally need to be set in those versions.

    In version 5.01.1245974000 20090625 and later, headers specified with this setting will replace builtin headers of the same name (e.g. Host etc.), instead of causing a second copy of the header to be sent. Note that setting/overriding builtin headers can cause erratic behavior, as user-specified values may interfere with library functionality. Builtin headers include Accept, Authorization, Connection, Content-Length, Content-Type, Cookie, Host, If-Modified-Since, Proxy-Authorization, Upgrade and User-Agent. All of these are set automatically by the library and/or have other <urlcp> settings that are the preferred method of controlling them.

    Setting a single empty value for a header will clear it (prevent it from being sent, even if there is normally a builtin value for the header). Setting no values (i.e. $null in version 8+) will undo any previous <urlcp header> set for the header, i.e. the builtin value (if any) will be sent.

    It is not possible to send the same header multiple times: later values set will merely replace earlier ones and the header will be sent at most once. To send multiple values for a single header, set a single value with multiple tokens according to the HTTP syntax for the given header (typically comma-separated).

  • ifmodsince (string) Sets the HTTP If-Modified-Since header to the given value. The argument is a time, either in Texis-parseable format or HTTP date format (www, dd mmm yyyy hh:mm:ss GMT). If the argument is empty, the header is cancelled.

    Setting the If-Modified-Since header creates a conditional request: the document is only returned if it has been changed since the given time, otherwise an empty document is returned. Setting this header on a per-page basis, to the Last-Modified value from the previous fetch, can reduce the traffic when re-walking a site: only new documents are returned. Note that it is up to the remote server to handle the If-Modified-Since header, and the given time is interpreted in its domain.

  • useragent (string) Sets the User-Agent header sent with HTTP requests. The default is Mozilla/5.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E).

Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.