Hash Query Functions
Functions for creating or validating string hashes.
Table: Event and Hash Query Functions
| Function | Default Argument | Availability | Description |
|---|---|---|---|
crypto:md5([as], field) | field | Computes a cryptographic MD5-hashing of an input string. | |
crypto:sha1([as], field) | field | Computes a cryptographic SHA1-hashing of an input string. | |
crypto:sha256([as], field) | field | Computes a cryptographic SHA256-hashing of an input string. | |
hash([as], field, [limit], [seed]) | field | Computes a non-cryptographic hash of a list of fields. | |
tokenHash([as], field) | field | Calculates a hash by tokenizing the input string (split by spaces), creating a hash for each token and then added the result together. This generates the same hash value, even if the order of the individual values in the source string is different. |
Hashes are used to create a consistent string value that can be used for comparison and identification without having to use or manipulate the original values. Hashes are typically used for three different purposes:
General hashing to create unique identifiers, see General Hashing
General hashing for PII or comparison, see Hashing for Privacy or Comparison
Cryptographic hashing for handling passwords or encrypted strings, see Hashing for Cryptography
For all hashes, the principle is that the encoded version of the incoming data (the hash) cannot easily be converted back to it's original format, but encoding the same string should result in a consistent hash value. Therefore, computing a new hash of the same string allows it to be used for comparison.
General Hashing
The hash() computes an integer based on one or more
incoming field values. This is useful for general hashing on
non-sensitive data (for example to create a simplified ID of a complex
value) to create consistency, ensure consistent inputs, or to obtain
faster performance.
Hashing for Privacy or Comparison
Occasionally, the data that is parsed and ingested must be encoded into
a format where the underlying value must be anonymized. To achieve this,
the tokenHash() function is useful for anonymizing
private data and masking Personally Identifiable Information (PII).
tokenHash() tokenizes the incoming string
(separated by spaces), creates a hash for each tokenized element, and
then adds them together. By doing this, the hash generated will be
consistent. Users should be aware that in order for this to execute
properly, each token input must be identical, irrespective of order.
For example, the following two log lines contain the same information even though the order of each word is different:
| strings |
|---|
| abc def ghi |
| def ghi abc |
Executing tokenHash() on each will generate the
same hash value.
This can be useful to compare, filter or deduplicate log lines during parsing or querying, even though the order of the individual values within a set of key/value pairs might be different.
Hashing for Cryptography
Hashes are often used to encode passwords or other security tokens, and LogScale includes tools for creating these hashes to be used for comparison or identification with existing values stored in LogScale.
The following functions support standard methodologies for these types of hashes:
Each function takes a string as input and generates a hexadecimal hash representation of the value.
These functions are not strong encryption keys and should not be used for encryption of text as such.