Computes a non-cryptographic hash of a list of fields. The hash is returned as an integer in the range [0,4294967295]. Calling this function with the same values and the same seed (or no seed) will result in the same hash being computed. This hash is not cryptographic and should not be used to securely obscure data (instead use hashRewrite() and hashMatch() for that). This function can, for example, be used to reduce the number of groups in a groupBy(), at the cost of having collisions.

ParameterTypeRequiredDefault ValueDescription
asstringoptional[a] _hash The output name of the field to set.
field[b]array of stringsrequired   The fields for which to compute hash values.
limitnumberoptional[a]   An upper bound on the number returned by this function. The returned hash will be modulo this value and thus be constrained to the range [0,limit].
  Minimum1 
seedstringoptional[a]   An optional seed for the hash function.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

hash()Syntax Examples

Hash the field a and put the result into _hash:

logscale
hash(a)

Hash the fields a, b, and c and put the result modulo 10 into _hash

logscale
hash([a,b,c], limit=10)

Hash the field a (by setting field explicitly) using a seed of 10

logscale
hash(field=[a], seed=10)

Group events into 10 buckets such that all events with the same value of a ends in the same bucket.

logscale
hash(a, limit=10)
| groupBy(_hash)

hash() Examples

Click + next to an example below to get the full details.

Create Sample Groups Using Hash

Create consistent sample groups of events using the hash()

Query
logscale
hash(ip_address, limit=10)
groupBy(_hash, function=count())
Introduction

In this example, the hash() function is used to create sample groups from web server access logs based on IP addresses. This allows for consistent grouping of events from the same IP address while limiting the total number of groups.

Example incoming data might look like this:

bytes_sentip_addressrequest_pathstatus_code@timestamp
1532192.168.1.100/home2002023-06-15T10:00:00Z
892192.168.1.201/notfound4042023-06-15T10:00:01Z
2341192.168.10.100/about2002023-06-15T10:00:02Z
721192.168.15.102/error5002023-06-15T10:00:03Z
1267192.168.1.101/contact2002023-06-15T10:00:04Z
1843192.168.1.103/products2002023-06-15T10:00:05Z
1654192.168.15.100/cart2002023-06-15T10:00:06Z
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    hash(ip_address, limit=10)

    Creates a hash value from the ip_address field and returns the result in a new field named _hash (default). This creates a consistent mapping where the same IP address will always generate the same hash value.

    .

    The limit parameter is set to 10, which ensures the hash values are distributed across 10 buckets (0-9). All events with the same value of ip-address ends in the same bucket.

  3. logscale
    groupBy(_hash, function=count())

    Groups the events by the _hash field. For each group, it counts the number of events and returns the result in a new field named _count. This aggregation reduces the data to show how many events fall into each hash bucket.

  4. Event Result set.

Summary and Results

The query is used to create consistent sample groups from large datasets by hashing a field value into a limited number of buckets.

This query is useful, for example, to analyze patterns in web traffic by sampling IP addresses into manageable groups while maintaining consistency - the same IP address will always hash to the same group. This can help identify behavioral patterns or anomalies in subsets of your traffic.

Sample output from the incoming example data:

_hash_count
21
31
61
83
91

Note that the hash values remain consistent for the same input, enabling reliable sampling across time periods.

Generate Different But Consistent Hash Values From Same Input Data

Generate different but consistent hash values from the same input data using the hash() function with seed

Query
logscale
hash(field=user_id, limit=5, as="hash1")
| hash(field=user_id, limit=5, seed=10, as="hash2")
| hash(field=user_id, limit=5, seed=20, as="hash3")
Introduction

In this example, the hash() function is used with different seed values to demonstrate how the same input data can generate different hash values. This is useful when you need multiple independent but consistent ways to group or sample the same data.

Example incoming data might look like this:

@timestampuser_idactiondepartment
2023-06-15T10:00:00Zuser123loginsales
2023-06-15T10:00:01Zuser456logoutmarketing
2023-06-15T10:00:02Zuser123viewsales
2023-06-15T10:00:03Zuser789loginengineering
2023-06-15T10:00:04Zuser456loginmarketing
2023-06-15T10:00:05Zuser123searchsales
2023-06-15T10:00:06Zuser789logoutengineering
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    hash(field=user_id, limit=5, as="hash1")

    Creates hash values from the user_id field and returns the results in the field hash1.

  3. logscale
    | hash(field=user_id, limit=5, seed=10, as="hash2")

    Creates a second set of hash values from the same user_id field but uses a seed value of 10 and returns the results in the field hash2. Using a different seed value creates a different but equally consistent distribution of hash values.

  4. logscale
    | hash(field=user_id, limit=5, seed=20, as="hash3")

    Creates a third set of hash values using a seed value of 20 and returns the results in the field hash3. This demonstrates how different seed values create different hash distributions for the same input data.

  5. Event Result set.

Summary and Results

The query is used to generate different but consistent hash values from the same input data by using different seed values.

This query is useful, for example, to create multiple independent sampling groups from the same data set, or to implement A/B/C testing where users need to be consistently assigned to different test groups.

Sample output from the incoming example data:

actiondepartmenthash1hash2hash3user_id
loginsales302user123
logoutmarketing222user456
viewsales302user123
loginengineering441user789
loginmarketing222user456
searchsales302user123
logoutengineering441user789

Important notes about the output:

  • The same user_id always generates the same hash value within each hash field.

  • Different seed values create different hash values for the same input.

  • The hash values remain within the specified limit range (0-4) regardless of the seed value.

  • The distribution pattern changes with different seeds while maintaining consistency for each input value.