Hash Query Functions

Functions for creating or validating string hashes.

Table: Event and Hash Query Functions

Function	Default Argument	Description
`crypto:md5([as], field)`	`field`	Computes a cryptographic MD5-hashing of an input string.
`crypto:sha1([as], field)`	`field`	Computes a cryptographic SHA1-hashing of an input string.
`crypto:sha256([as], field)`	`field`	Computes a cryptographic SHA256-hashing of an input string.
`hash([as], field, [limit], [seed])`	`field`	Computes a non-cryptographic hash of a list of fields.
`tokenHash([as], field)`	`field`	Calculates a hash by tokenizing the input string (split by spaces), creating a hash for each token and then added the result together. This generates the same hash value, even if the order of the individual values in the source string is different.

Hashes are used to create a consistent string value that can be used for comparison and identification without having to use or manipulate the original values. Hashes are typically used for three different purposes:

General hashing to create unique identifiers, see General Hashing
General hashing for PII or comparison, see Hashing for Privacy or Comparison
Cryptographic hashing for handling passwords or encrypted strings, see Hashing for Cryptography

For all hashes, the principle is that the encoded version of the incoming data (the hash) cannot easily be converted back to it's original format, but encoding the same string should result in a consistent hash value. Therefore, computing a new hash of the same string allows it to be used for comparison.

General Hashing

The hash() computes an integer based on one or more incoming field values. This is useful for general hashing on non-sensitive data (for example to create a simplified ID of a complex value) to create consistency, ensure consistent inputs, or to obtain faster performance.

Hashing for Privacy or Comparison

Occasionally, the data that is parsed and ingested must be encoded into a format where the underlying value must be anonymized. To achieve this, the tokenHash() function is useful for anonymizing private data and masking Personally Identifiable Information (PII).

tokenHash() tokenizes the incoming string (separated by spaces), creates a hash for each tokenized element, and then adds them together. By doing this, the hash generated will be consistent. Users should be aware that in order for this to execute properly, each token input must be identical, irrespective of order.

For example, the following two log lines contain the same information even though the order of each word is different:

strings
abc def ghi
def ghi abc

Executing tokenHash() on each will generate the same hash value.

This can be useful to compare, filter or deduplicate log lines during parsing or querying, even though the order of the individual values within a set of key/value pairs might be different.

Hashing for Cryptography

Hashes are often used to encode passwords or other security tokens, and LogScale includes tools for creating these hashes to be used for comparison or identification with existing values stored in LogScale.

The following functions support standard methodologies for these types of hashes:

Each function takes a string as input and generates a hexadecimal hash representation of the value.

These functions are not strong encryption keys and should not be used for encryption of text as such.

Versions of this Page

Data Analysis Overview

LogScale Web Interface

Manage Repositories and Views

Manage Your LogScale Account

Parse Data

Search Data

Write Queries

Query Language Syntax

Query Joins and Lookups

Query Functions

Automation

Template Language

Keyboard Shortcuts