Calculates a "structure hash" which is equal for similarly structured input.

ParameterTypeRequiredDefault ValueDescription
asstringoptional[a] _tokenHash The name of output field.
field[b]stringrequired   The name of the field to hash.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

tokenHash() Syntax Examples

The tokenHash() tokenizes the incoming string (separated by spaces), and then creates a hash for each tokenised elements and adds them together. The hash generated in this form will therefore consistent, providing each token in the input is identical, irrespective of the order. For example, the following two log lines contain the same information even though the order of each word is different:

valueString
abc def ghi
def ghi abc

Executing tokenHash() on each will generate the same hash value:

logscale
tokenHash(field=valueString)

This generates the same hash value for both rows, even though the order of each word is different:

_tokenHash
84edeb8f
84edeb8f

This can be useful to compare, filter or deduplicate log lines during parsing or querying even though the order of the individual values within a set of key/value pairs might be different.

tokenHash() Examples

Click + next to an example below to get the full details.

Group Similar Log Lines Using TokenHash

Find patterns in log messages by grouping similar structures using the tokenHash() function

Query
logscale
h := tokenHash(@rawstring)
groupBy(h, limit=max, function=[ count(), collect(@rawstring, limit=3) ])
Introduction

In this example, the tokenHash() function is used to group log messages that share the same structure but contain different values. This helps identify common log patterns in your data.

Note that the purpose of tokenHash() is for grouping related log lines, not for cryptographic use.

Example incoming data might look like this:

@timestamp@rawstring
2023-06-06T10:00:00ZUser john.doe logged in from 192.168.1.100
2023-06-06T10:01:00ZUser jane.smith logged in from 192.168.1.101
2023-06-06T10:02:00ZUser admin logged in from 192.168.1.102
2023-06-06T10:03:00ZFailed login attempt from 10.0.0.1
2023-06-06T10:04:00ZFailed login attempt from 10.0.0.2
2023-06-06T10:05:00ZDatabase connection error: timeout after 30 seconds
2023-06-06T10:06:00ZDatabase connection error: timeout after 45 seconds
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    h := tokenHash(@rawstring)

    Creates a hash value based on the structure of the log message in the @rawstring field and returns the token hash in a new field named h. The tokenHash() function identifies words, numbers, and special characters while ignoring their specific values.

  3. logscale
    groupBy(h, limit=max, function=[ count(), collect(@rawstring, limit=3) ])

    Groups the events by the token hash in the field h. For each group, it:

    The limit=max parameter ensures all groups are returned.

  4. Event Result set.

Summary and Results

The query is used to identify common log message patterns by grouping similar log lines together, regardless of their specific values.

This query is useful, for example, to discover the most common types of log messages in your data, identify unusual or rare log patterns that might indicate problems and create log message templates for parsing or monitoring.

Sample output from the incoming example data:

h_count@rawstring
1111b7963User admin logged in from 192.168.1.102 User jane.smith logged in from 192.168.1.101 User john.doe logged in from 192.168.1.100
356fb7672Failed login attempt from 10.0.0.2 Failed login attempt from 10.0.0.1
90fadc1e2Database connection error: timeout after 45 seconds Database connection error: timeout after 30 seconds

Note that logs with the same structure but different values are grouped together, making it easy to identify common patterns in your log data.

Mask Sensitive SSN Data

Consistently hash social security numbers for privacy using the tokenHash() function

Query
logscale
tokenHash(ssn)
Introduction

In this example, the tokenHash() function is used to hash social security numbers, replacing the original values with consistent hash values that can still be used for analysis and correlation.

Example incoming data might look like this:

@timestampssntransaction_typeamount
2023-08-06T10:00:00Z123-45-6789deposit1000.00
2023-08-06T10:01:00Z987-65-4321withdrawal500.00
2023-08-06T10:02:00Z123-45-6789withdrawal200.00
2023-08-06T10:03:00Z456-78-9012deposit1500.00
2023-08-06T10:04:00Z987-65-4321deposit750.00
2023-08-06T10:05:00Z123-45-6789check300.00
2023-08-06T10:06:00Z456-78-9012withdrawal400.00
2023-08-06T10:07:00Z234-56-7890deposit2000.00
2023-08-06T10:08:00Z987-65-4321withdrawal100.00
2023-08-06T10:09:00Z123-45-6789deposit500.00
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    tokenHash(ssn)

    Creates a consistent hash value for each unique social security number in the ssn field.

    The hash value replaces the original SSN while maintaining uniqueness, allowing for analysis of patterns and relationships without exposing sensitive data. The function uses a secure hashing algorithm and returns the result in the same field.

    The hash values are deterministic, meaning the same input will always produce the same hash value within the same repository, enabling consistent analysis across multiple queries.

  3. Event Result set.

Summary and Results

The query is used to protect sensitive social security numbers while maintaining the ability to analyze patterns and relationships in the data.

This query is useful, for example, to comply with data privacy regulations while still being able to track user behavior, identify patterns, or investigate suspicious activities across multiple transactions.

Sample output from the incoming example data:

@timestampssntransaction_typeamount
2023-08-06T10:00:00Za1b2c3d4e5f6g7h8i9deposit1000.00
2023-08-06T10:01:00Zj9k8l7m6n5o4p3q2r1withdrawal500.00
2023-08-06T10:02:00Za1b2c3d4e5f6g7h8i9withdrawal200.00
2023-08-06T10:03:00Zs2t3u4v5w6x7y8z9a1deposit1500.00
2023-08-06T10:04:00Zj9k8l7m6n5o4p3q2r1deposit750.00
2023-08-06T10:05:00Za1b2c3d4e5f6g7h8i9check300.00

Note that the same SSN values are consistently hashed to the same token values, maintaining the relationships in the data while protecting the original sensitive information.

The hashed data can be used in various dashboard widgets such as tables to show transaction patterns by hashed SSN, or sankey diagrams to visualize transaction flows between accounts. For security monitoring, consider creating alerts based on unusual patterns of activity for specific hashed SSNs.