Hash Query Functions
Functions for creating or validating string hashes.
Table: Event & Hash Query Functions
Function | Default Argument | Availability | Description |
---|---|---|---|
crypto:md5([as], field) | field | added in 1.104 | Computes a cryptographic MD5-hashing of an input string. |
hash([as], field, [limit], [seed]) | field | Computes a non-cryptographic hash of a list of fields. | |
tokenHash([as], field) | field | Calculates a hash by tokenizing the input string (split by spaces), creating a hash for each token and then added the result together. This generates the same hash value, even if the order of the individual values in the source string is different. |
Hashes are used to create a consistent string value that can be used for comparison and identification without having to use or manipulate the original values. Hashes are typically used for three different purposes:
General hashing to create unique identifiers, see General Hashing
General hashing for PII or comparison, see Hashing for Privacy or Comparison
Cryptographic hashing for handling passwords or encrypted strings, see Hashing for Cryptography
For all hashes, the principle is that the encoded version of the incoming data (the hash) cannot easily be converted back to it's original format, but encoding the same string should result in a consistent hash value. Therefore, computing a new hash of the same string allows it to be used for comparison.
General Hashing
The hash()
computes an integer based on one or more
incoming field values. This is useful for general hashing on
non-sensitive data (for example to create a simplified ID of a complex
value) to create consistency, ensure consistent inputs, or to obtain
faster performance.
Hashing for Privacy or Comparison
Often used when parsing and ingesting data and encoding into a format
where the underlying value needs to be anonymized. The
tokenHash()
function is useful for anonymizing
private data, masking data containing personal and/or sensitive
information - also called Personally Identifiable Information (PII).
The tokenHash()
function is useful for masking
Social Security Numbers (SSN), emails or phone numbers for data in
situations where GDPR compliance and/or protection is required.
The tokenHash()
tokenises the incoming string
(separated by spaces), and then creates a hash for each tokenised
elements and adds them together. The hash generated in this form will
therefore consistent, providing each token in the input is identical,
irrespective of the order. For example, the following two log lines
contain the same information even though the order of each word is
different:
strings |
---|
abc def ghi |
def ghi abc |
Executing tokenHash()
on each will generate the
same hash value.
This can be useful to compare, filter or deduplicate log lines during parsing or querying even though the order of the individual values within a set of key/value pairs might be different.
Hashing for Cryptography
Hashes are often used to encode passwords or other security tokens, and LogScale includes tools for creating these hashes to be used for comparison or identification with existing values stored in LogScale.
The following functions support standard methodologies for these types of hashes:
Each function takes a string as input and generates a hexadecimal hash representation of the value.
These functions are not strong encryption keys and should not be used for encryption of text as such.