Hash Query Functions

Functions for creating or validating string hashes.

Table: Event & Hash Query Functions

FunctionDefault ArgumentAvailabilityDescription
crypto:md5([as], field)field  Computes a cryptographic MD5-hashing of an input string.
crypto:sha1([as], field)fieldadded in 1.161 Computes a cryptographic SHA1-hashing of an input string.
crypto:sha256([as], field)fieldadded in 1.161 Computes a cryptographic SHA256-hashing of an input string.
hash([as], field, [limit], [seed])field  Computes a non-cryptographic hash of a list of fields.
tokenHash([as], field)field  Calculates a hash by tokenizing the input string (split by spaces), creating a hash for each token and then added the result together. This generates the same hash value, even if the order of the individual values in the source string is different.

Hashes are used to create a consistent string value that can be used for comparison and identification without having to use or manipulate the original values. Hashes are typically used for three different purposes:

For all hashes, the principle is that the encoded version of the incoming data (the hash) cannot easily be converted back to it's original format, but encoding the same string should result in a consistent hash value. Therefore, computing a new hash of the same string allows it to be used for comparison.

General Hashing

The hash() computes an integer based on one or more incoming field values. This is useful for general hashing on non-sensitive data (for example to create a simplified ID of a complex value) to create consistency, ensure consistent inputs, or to obtain faster performance.

Hashing for Privacy or Comparison

Often used when parsing and ingesting data and encoding into a format where the underlying value needs to be anonymized. The tokenHash() function is useful for anonymizing private data, masking data containing personal and/or sensitive information - also called Personally Identifiable Information (PII).

The tokenHash() function is useful for masking Social Security Numbers (SSN), emails or phone numbers for data in situations where GDPR compliance and/or protection is required.

The tokenHash() tokenises the incoming string (separated by spaces), and then creates a hash for each tokenised elements and adds them together. The hash generated in this form will therefore consistent, providing each token in the input is identical, irrespective of the order. For example, the following two log lines contain the same information even though the order of each word is different:

strings
abc def ghi
def ghi abc

Executing tokenHash() on each will generate the same hash value.

This can be useful to compare, filter or deduplicate log lines during parsing or querying even though the order of the individual values within a set of key/value pairs might be different.

Hashing for Cryptography

Hashes are often used to encode passwords or other security tokens, and LogScale includes tools for creating these hashes to be used for comparison or identification with existing values stored in LogScale.

The following functions support standard methodologies for these types of hashes:

Each function takes a string as input and generates a hexadecimal hash representation of the value.

These functions are not strong encryption keys and should not be used for encryption of text as such.