SYNOPSIS<hash [options ...] [DATA=]$buf[ /]>
or
<hash [options ...]> ... </hash>
DESCRIPTION
The hash
function produces a hash or checksum value of the
data $buf
(or block output), using a specified algorithm. The
value returned - a small-size type, usually counter
- is fairly
unique to the given data, i.e. it is very unlikely that a different
input data value will have the same hash value. The hash value can be
used as a fast way of comparing large data fields: if the hash values
are the same, the data fields are probably identical. This can
replace a large number of very long string compares with much faster
counter
(or other compact type) compares. Hash values are also
used to verify the integrity of transmitted data: if the hash computed
by the receiver does not match the value transmitted, the data is
corrupted.
With no options, hash
computes a gw
(i.e. Webinator)
style hash of $buf
, returned as a counter
value. In
Vortex version 4.04.1066340000 20031016 and later, new hash
types and options are available. No further options are available in
earlier versions (including the block-function version).
If the DATA
parameter is not supplied (either explicitly
named, or as the last unnamed parameter), then the hash
function becomes a block function, and the input is taken from the
output of the enclosed Vortex code, rather than the $buf
parameter. This is useful for producing a checksum of the output of
part of a script, without having to first save the (possibly large)
output value in a variable first (see examples).
Options are as follows:
TYPE=$type
Specifies what type of hash to produce (i.e. the algorithm). Also
gives a default return variable type (see RET
option), as
different algorithms produce different native data types. The
possible values for $type
, the PROVTYPE
which
supports them (need not be given; see below), and default
RET
types are:
gw
(supported by PROVTYPE=texis
)
A gw
(Webinator) style hash. This is the default if
TYPE
is unspecified or empty. The default RET
type for gw
is counter
, which returns a
cross-platform-compatible value. Other RET
types may
truncate or return different values across platforms for
gw
.crc32
(supported by PROVTYPE=texis
)
A CRC32-compatible 32-bit checksum. The default RET
type is long
, which returns a cross-platform-compatible
value. Added in version 4.04.1083100000 20040427.adler32
(supported by PROVTYPE=texis
)
An Adler32-compatible 32-bit checksum (similar to crc32
but slightly faster to compute, though incompatible). The
default RET
type is long
, which returns a
cross-platform-compatible value. Added in version
4.04.1083100000 20040427.PROVTYPE=openssl
)
An OpenSSL algorithm, such as md5
, md2
,
sha1
, dss1
, DSA-SHA1-old
,
RSA-SHA1-2
, etc. The default RET
type is
hex
, which returns a cross-platform-compatible value.
Consult an OpenSSL manual (online at
http://www.openssl.org/
) for details on algorithms.PROVTYPE=PROV_
...)
A Microsoft CALG_
... algorithm, such as:
CALG_3DES | CALG_RC4 |
CALG_3DES_112 | CALG_RC5 |
CALG_AGREEDKEY_ANY | CALG_RSA_KEYX |
CALG_CYLINK_MEK | CALG_RSA_SIGN |
CALG_DES | CALG_SCHANNEL_ENC_KEY |
CALG_DH_EPHEM | CALG_SCHANNEL_MAC_KEY |
CALG_DH_SF | CALG_SCHANNEL_MASTER_HASH |
CALG_DSS_SIGN | CALG_SEAL |
CALG_HMAC | CALG_SHA |
CALG_HUGHES_MD5 | CALG_SHA1 |
CALG_KEA_KEYX | CALG_SKIPJACK |
CALG_MAC | CALG_SSL2_MASTER |
CALG_MD2 | CALG_SSL3_MASTER |
CALG_MD4 | CALG_SSL3_SHAMD5 |
CALG_MD5 | CALG_TEK |
CALG_PCT1_MASTER | CALG_TLS1_MASTER |
CALG_RC2 |
Or an integer numeric value for such a
CALG_
... algorithm may be given. The default
RET
type is hex
, which returns a
cross-platform-compatible value. Consult a Microsoft SDK
manual for details on algorithms.
OUTPUT=$what
Specifies what to output (print). The default (or if an empty
string is given) is none
, i.e. nothing, because the hash
value is returned in $ret
. See RET
for a list of
the possible values for $what
. This option can be used to
re-print the input of a block function at the same time the hash
is being returned in $ret
(see examples).RET=$what
Specifies the value and variable type to return in $ret
.
The value $what
may be one of:
bin
- the hash as a varbyte
valuecounter
- the hash as a single big-endian
counter
valuehex
- the hash as a hexadecimal varchar
valueinput
- the input data (same type), instead
of the hash value; useful for OUTPUT
optionlong
- the hash as a single big-endian long
value (added in version 4.04.1083100000 20040427)none
- nothing
Note: not all return types are compatible with all hash
types, e.g. a counter
value may truncate an MD5
hash
on some platforms. The default return type (if unspecified or
empty) is determined by what hash type is selected. See the list
under the TYPE
option for more information.
Note that numeric return types (e.g. "counter
",
"long
") are big-endian (first byte of hash is
most-significant in the returned number) for consistency,
regardless of whether the platform is big- or little-endian. This
helps ensure portability: two platforms with the same-size
counter
and long
types will produce the same hash
values for those types (for the same input and TYPE
),
regardless of platform endian-ness.
DATA=$buf
Specifies the input data value(s) to produce a hash of. If this
option is given, the function is a non-block-function call, i.e. no
ending </hash>
is expected, and a hash is returned for each
corresponding value of $buf
. If this option is missing,
the function is a block call, and input is taken from the output
produced by the script up to the next matching required
</hash>
tag. Note that for back-compatibility, an unnamed
parameter is taken as the DATA
parameter (and must be the
last option).PROVTYPE=$provtype
This parameter can be left unspecified; a default is usually
correctly determined. It specifies the type of provider to use
(and indirectly, the API). Along with PROVIDER
and
TYPE
, this determines the exact algorithm, padding scheme,
key length etc. to use to produce the hash or checksum.
PROVTYPE
may be texis
for the Texis API,
openssl
for OpenSSL API, or on Windows platforms, any of
the following Microsoft CryptoAPI PROV_
... provider types
(not all supported on all versions of Windows):
PROV_DH_SCHANNEL | PROV_MS_EXCHANGE |
PROV_DSS | PROV_RSA_AES |
PROV_DSS_DH | PROV_RSA_FULL |
PROV_EC_ECDSA_FULL | PROV_RSA_SCHANNEL |
PROV_EC_ECDSA_SIG | PROV_RSA_SIG |
PROV_EC_ECNRA_FULL | PROV_SPYRUS_LYNKS |
PROV_EC_ECNRA_SIG | PROV_SSL |
PROV_FORTEZZA |
The default (if PROVTYPE
is unspecified or empty) is the
first provider type that can support the TYPE
specified, in
the priority order texis
, openssl
, PROV_RSA_FULL
.
E.g. if TYPE=gw
is specified, PROVTYPE
defaults to
texis
; if TYPE=md5
is given, PROVTYPE
defaults to openssl
(even though PROV_RSA_FULL
supports
md5
too); if a recognized Microsoft CryptoAPI
CALG_
... algorithm is given for TYPE
,
PROVTYPE
defaults to PROV_RSA_FULL
.
PROVIDER=$provider
This parameter can be left unspecified; a default is usually
correctly determined. It specifies the engine or cryptographic
provider to use, depending on the value of PROVTYPE
:
PROVTYPE=texis
PROVIDER
must be (and the default is) texis
.PROVTYPE=openssl
PROVIDER
specifies the engine to use. The value must
be a registered OpenSSL engine; the default if unspecified or
empty is provided by the OpenSSL API.PROVTYPE=PROV_
... value (Microsoft CryptoAPI)
PROVIDER
specifies the cryptographic provider to use.
It may be one of the following values:
MS_DEF_DSS_DH_PROV
MS_DEF_DSS_PROV
MS_DEF_PROV
MS_DEF_RSA_SCHANNEL_PROV
MS_DEF_RSA_SIG_PROV
MS_ENHANCED_PROV
MS_ENHANCED_RSA_SCHANNEL_PROV
Or PROVIDER
may be another provider name recognized by
CryptoAPI. The default if unspecified or empty is provided by
the CryptoAPI.
DIAGNOSTICS
The hash
function returns a hash or checksum value for each
value of DATA=$buf
(or the output of the enclosed block). The
return type varies with options but defaults to counter
.
EXAMPLE<$theurl = "http://some.host.com/">
<fetch $theurl>
<urlinfo text>
<$page = $ret>
<hash $page>
<$hash = $ret>
<SQL "select Hash from pages where Hash=$hash"></SQL>
<IF $loop gt 0>
Page already seen elsewhere.
<ELSE>
<SQL "insert into pages values($theurl, $page, $hash)"></SQL>
</IF>
In the above example, a table of fetched HTML pages is kept, along
with a hash value for each page (and an index on the Hash
field). The hash values are used as a quick way of keeping only
unique pages. If multiple URLs being fetched point to the same page,
only one copy of the page is desired. To do this, the hash values are
compared: if the hash value of a newly fetched page is already present
in the table, it's virtually certain that page was already fetched.
Since no options are given to <hash>
, a gw
-style
(Webinator) hash is used.
Comparing hash values is far faster than attempting to compare each
page's text itself, since the hash is just a small fixed size value
(counter
in this case) whereas the page text may be megabytes
long.
<hash TYPE=md5 OUTPUT=input>
Start of message
<SendLargeLogFile>
End of message
</hash>
MD5-Checksum=$ret
In the above example, the block-function mode of hash
is used
to generate an MD5 checksum of the message being printed by the Vortex
code enclosed by the block. By using the block mode, we do not have to
construct the message and assign it to a variable, which not only
saves code but in this case memory too, as <SendLargeLogFile>
(not shown) would otherwise generate a very large variable. By using
the OUTPUT=input
option, the message is also printed as it's
checksummed, so we do not lose it.
CAVEATS
The hash
function was added May 16 1997. New options such as
block mode were added in version 4.04.1066340000 20031016 and
4.04.1083100000 20040427.
The OpenSSL algorithms depend on the OpenSSL plugin; this is provided with all versions of Vortex and is loaded automatically when needed, but does use on external libraries (i.e. which may be missing in a broken install).
The Microsoft CryptoAPI algorithms are only available on Windows versions of Vortex.
It is possible, though unlikely, that two distinct input data values
will have the same hash
value. The probability of such a hash
collision varies with the algorithm (TYPE
) used; md5
is
currently one of the best (i.e. lowest probability).