hash - produce a hash or checksum for data


<hash [options ...] [DATA=]$buf[ /]>
<hash [options ...]> ... </hash>

The hash function produces a hash or checksum value of the data $buf (or block output), using a specified algorithm. The value returned - a small-size type, usually counter - is fairly unique to the given data, i.e. it is very unlikely that a different input data value will have the same hash value. The hash value can be used as a fast way of comparing large data fields: if the hash values are the same, the data fields are probably identical. This can replace a large number of very long string compares with much faster counter (or other compact type) compares. Hash values are also used to verify the integrity of transmitted data: if the hash computed by the receiver does not match the value transmitted, the data is corrupted.

With no options, hash computes a gw (i.e. Webinator) style hash of $buf, returned as a counter value. In Vortex version 4.04.1066340000 20031016 and later, new hash types and options are available. No further options are available in earlier versions (including the block-function version).

If the DATA parameter is not supplied (either explicitly named, or as the last unnamed parameter), then the hash function becomes a block function, and the input is taken from the output of the enclosed Vortex code, rather than the $buf parameter. This is useful for producing a checksum of the output of part of a script, without having to first save the (possibly large) output value in a variable first (see examples).

Options are as follows:

  • TYPE=$type Specifies what type of hash to produce (i.e. the algorithm). Also gives a default return variable type (see RET option), as different algorithms produce different native data types. The possible values for $type, the PROVTYPE which supports them (need not be given; see below), and default RET types are:

    • gw (supported by PROVTYPE=texis) A gw (Webinator) style hash. This is the default if TYPE is unspecified or empty. The default RET type for gw is counter, which returns a cross-platform-compatible value. Other RET types may truncate or return different values across platforms for gw.

    • crc32 (supported by PROVTYPE=texis) A CRC32-compatible 32-bit checksum. The default RET type is long, which returns a cross-platform-compatible value. Added in version 4.04.1083100000 20040427.

    • adler32 (supported by PROVTYPE=texis) An Adler32-compatible 32-bit checksum (similar to crc32 but slightly faster to compute, though incompatible). The default RET type is long, which returns a cross-platform-compatible value. Added in version 4.04.1083100000 20040427.

    • An OpenSSL digest name (supported by PROVTYPE=openssl) An OpenSSL algorithm, such as md5, md2, sha1, dss1, DSA-SHA1-old, RSA-SHA1-2, etc. The default RET type is hex, which returns a cross-platform-compatible value. Consult an OpenSSL manual (online at http://www.openssl.org/) for details on algorithms.

    • A Microsoft CryptoAPI algorithm (supported by PROVTYPE=PROV_...) A Microsoft CALG_... algorithm, such as:

      CALG_3DES_112 CALG_RC5

      Or an integer numeric value for such a CALG_... algorithm may be given. The default RET type is hex, which returns a cross-platform-compatible value. Consult a Microsoft SDK manual for details on algorithms.

  • OUTPUT=$what Specifies what to output (print). The default (or if an empty string is given) is none, i.e. nothing, because the hash value is returned in $ret. See RET for a list of the possible values for $what. This option can be used to re-print the input of a block function at the same time the hash is being returned in $ret (see examples).

  • RET=$what Specifies the value and variable type to return in $ret. The value $what may be one of:

    • bin - the hash as a varbyte value

    • counter - the hash as a single big-endian counter value

    • hex - the hash as a hexadecimal varchar value

    • input - the input data (same type), instead of the hash value; useful for OUTPUT option

    • long - the hash as a single big-endian long value (added in version 4.04.1083100000 20040427)

    • none - nothing

    Note: not all return types are compatible with all hash types, e.g. a counter value may truncate an MD5 hash on some platforms. The default return type (if unspecified or empty) is determined by what hash type is selected. See the list under the TYPE option for more information.

    Note that numeric return types (e.g. "counter", "long") are big-endian (first byte of hash is most-significant in the returned number) for consistency, regardless of whether the platform is big- or little-endian. This helps ensure portability: two platforms with the same-size counter and long types will produce the same hash values for those types (for the same input and TYPE), regardless of platform endian-ness.

  • DATA=$buf Specifies the input data value(s) to produce a hash of. If this option is given, the function is a non-block-function call, i.e. no ending </hash> is expected, and a hash is returned for each corresponding value of $buf. If this option is missing, the function is a block call, and input is taken from the output produced by the script up to the next matching required </hash> tag. Note that for back-compatibility, an unnamed parameter is taken as the DATA parameter (and must be the last option).

  • PROVTYPE=$provtype This parameter can be left unspecified; a default is usually correctly determined. It specifies the type of provider to use (and indirectly, the API). Along with PROVIDER and TYPE, this determines the exact algorithm, padding scheme, key length etc. to use to produce the hash or checksum. PROVTYPE may be texis for the Texis API, openssl for OpenSSL API, or on Windows platforms, any of the following Microsoft CryptoAPI PROV_... provider types (not all supported on all versions of Windows):


    The default (if PROVTYPE is unspecified or empty) is the first provider type that can support the TYPE specified, in the priority order texis, openssl, PROV_RSA_FULL. E.g. if TYPE=gw is specified, PROVTYPE defaults to texis; if TYPE=md5 is given, PROVTYPE defaults to openssl (even though PROV_RSA_FULL supports md5 too); if a recognized Microsoft CryptoAPI CALG_... algorithm is given for TYPE, PROVTYPE defaults to PROV_RSA_FULL.

  • PROVIDER=$provider This parameter can be left unspecified; a default is usually correctly determined. It specifies the engine or cryptographic provider to use, depending on the value of PROVTYPE:

    • PROVTYPE=texis PROVIDER must be (and the default is) texis.

    • PROVTYPE=openssl PROVIDER specifies the engine to use. The value must be a registered OpenSSL engine; the default if unspecified or empty is provided by the OpenSSL API.

    • PROVTYPE=PROV_... value (Microsoft CryptoAPI) PROVIDER specifies the cryptographic provider to use. It may be one of the following values:



      • MS_DEF_PROV





      • Microsoft Base Cryptographic Provider v1.0

      • Microsoft Enhanced Cryptographic Provider v1.0

      • Microsoft RSA Signature Cryptographic Provider

      • Microsoft Base RSA SChannel Cryptographic Provider

      • Microsoft Enhanced RSA SChannel Cryptographic Provider

      • Microsoft Base DSS Cryptographic Provider

      • Microsoft Base DSS and Diffie-Hellman Cryptographic Provider

      Or PROVIDER may be another provider name recognized by CryptoAPI. The default if unspecified or empty is provided by the CryptoAPI.

The hash function returns a hash or checksum value for each value of DATA=$buf (or the output of the enclosed block). The return type varies with options but defaults to counter.


<$theurl = "http://some.host.com/">
  <fetch $theurl>
  <urlinfo text>
  <$page = $ret>
  <hash $page>
  <$hash = $ret>
  <SQL "select Hash from pages where Hash=$hash"></SQL>
  <IF $loop gt 0>
    Page already seen elsewhere.
    <SQL "insert into pages values($theurl, $page, $hash)"></SQL>

In the above example, a table of fetched HTML pages is kept, along with a hash value for each page (and an index on the Hash field). The hash values are used as a quick way of keeping only unique pages. If multiple URLs being fetched point to the same page, only one copy of the page is desired. To do this, the hash values are compared: if the hash value of a newly fetched page is already present in the table, it's virtually certain that page was already fetched. Since no options are given to <hash>, a gw-style (Webinator) hash is used.

Comparing hash values is far faster than attempting to compare each page's text itself, since the hash is just a small fixed size value (counter in this case) whereas the page text may be megabytes long.

<hash TYPE=md5 OUTPUT=input>
    Start of message
    End of message

In the above example, the block-function mode of hash is used to generate an MD5 checksum of the message being printed by the Vortex code enclosed by the block. By using the block mode, we do not have to construct the message and assign it to a variable, which not only saves code but in this case memory too, as <SendLargeLogFile> (not shown) would otherwise generate a very large variable. By using the OUTPUT=input option, the message is also printed as it's checksummed, so we do not lose it.

The hash function was added May 16 1997. New options such as block mode were added in version 4.04.1066340000 20031016 and 4.04.1083100000 20040427.

The OpenSSL algorithms depend on the OpenSSL plugin; this is provided with all versions of Vortex and is loaded automatically when needed, but does use on external libraries (i.e. which may be missing in a broken install).

The Microsoft CryptoAPI algorithms are only available on Windows versions of Vortex.

It is possible, though unlikely, that two distinct input data values will have the same hash value. The probability of such a hash collision varies with the algorithm (TYPE) used; md5 is currently one of the best (i.e. lowest probability).

fetch, urlinfo

Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.