Computes a non-cryptographic hash of a list of fields. The hash is returned as an integer in the range [0,4294967295]. Calling this function with the same values and the same seed (or no seed) will result in the same hash being computed. This hash is not cryptographic and should not be used to securely obscure data (instead use hashRewrite() and hashMatch() for that). This function can, for example, be used to reduce the number of groups in a groupBy(), at the cost of having collisions.

ParameterTypeRequiredDefault ValueDescription
asstringoptional[a] _hash The output name of the field to set.
field[b]array of stringsrequired   The fields for which to compute hash values.
limitnumberoptional[a]   An upper bound on the number returned by this function. The returned hash will be modulo this value and thus be constrained to the range [0,limit].
  Minimum1 
seedstringoptional[a]   An optional seed for the hash function.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

hash()Syntax Examples

Hash the field a and put the result into _hash:

logscale
hash(a)

Hash the fields a, b, and c and put the result modulo 10 into _hash

logscale
hash([a,b,c], limit=10)

Hash the field a (by setting field explicitly) using a seed of 10

logscale
hash(field=[a], seed=10)

Group events into 10 buckets such that all events with the same value of a ends in the same bucket.

logscale
hash(a, limit=10)
| groupBy(_hash)

hash() Examples

Click + next to an example below to get the full details.

Create Hash Values from Multiple Fields with Limited Range

Generate hash values from fields with modulo limit using the hash() function

Query
logscale
hash([user_id, department, action], limit=10)
Introduction

In this example, the hash() function is used to generate a hash value from multiple fields, with the result limited to a smaller range of 10 using the limit parameter.

Example incoming data might look like this:

@timestampuser_iddepartmentactionresource
2025-08-06T10:00:00Zuser123salesreaddocument1
2025-08-06T10:00:01Zuser456marketingwritedocument2
2025-08-06T10:00:02Zuser789engineeringdeletedocument3
2025-08-06T10:00:03Zuser234salesupdatedocument4
2025-08-06T10:00:04Zuser567marketingreaddocument5
2025-08-06T10:00:05Zuser890engineeringwritedocument6
2025-08-06T10:00:06Zuser345salesdeletedocument7
2025-08-06T10:00:07Zuser678marketingupdatedocument8
2025-08-06T10:00:08Zuser901engineeringreaddocument9
2025-08-06T10:00:09Zuser432saleswritedocument10
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    hash([user_id, department, action], limit=10)

    Creates a non-cryptographic hash value from the contents of fields user_id, department, and action. While the function normally returns an integer between 0 and 4294967295, the limit parameter set to 10 applies a modulo operation to the hash value, ensuring the result is between 0 and 9.

    The hash() function returns the result in a field named _hash by default. This is particularly useful for reducing the number of groups in subsequent groupBy operations.

  3. Event Result set.

Summary and Results

The query is used to generate consistent numerical hash values from multiple fields while constraining the output to a specified range using modulo.

This query is useful, for example, to reduce the number of distinct groups in a groupBy() operation when dealing with high-cardinality data, accepting that collisions will occur due to the limited output range.

Sample output from the incoming example data:

@timestampuser_iddepartmentactionresource_hash
2025-08-06T10:00:00Zuser123salesreaddocument17
2025-08-06T10:00:01Zuser456marketingwritedocument23
2025-08-06T10:00:02Zuser789engineeringdeletedocument35
2025-08-06T10:00:03Zuser234salesupdatedocument42
2025-08-06T10:00:04Zuser567marketingreaddocument58

Note that without a limit parameter, the function would return values between 0 and 4294967295. The limit parameter uses modulo to reduce the output range, in this case to 0-9.

For visualizing this data, a table widget would be effective to show the original fields alongside their hash values. When using the hash() function with groupBy(), a pie chart widget could help visualize the distribution of events across the limited hash values. To monitor the effectiveness of the hash distribution within the limited range, consider using a bar chart widget to show the frequency of each hash value.

Create Sample Groups Using Hash

Create consistent sample groups of events using the hash() function

Query
logscale
hash(ip_address, limit=10)
groupBy(_hash, function=count())
Introduction

In this example, the hash() function is used to create sample groups from web server access logs based on IP addresses. This allows for consistent grouping of events from the same IP address while limiting the total number of groups.

Example incoming data might look like this:

bytes_sentip_addressrequest_pathstatus_code@timestamp
1532192.168.1.100/home2002023-06-15T10:00:00Z
892192.168.1.201/notfound4042023-06-15T10:00:01Z
2341192.168.10.100/about2002023-06-15T10:00:02Z
721192.168.15.102/error5002023-06-15T10:00:03Z
1267192.168.1.101/contact2002023-06-15T10:00:04Z
1843192.168.1.103/products2002023-06-15T10:00:05Z
1654192.168.15.100/cart2002023-06-15T10:00:06Z
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    hash(ip_address, limit=10)

    Creates a hash value from the ip_address field and returns the result in a new field named _hash (default). This creates a consistent mapping where the same IP address will always generate the same hash value.

    .

    The limit parameter is set to 10, which ensures the hash values are distributed across 10 buckets (0-9). All events with the same value of ip-address ends in the same bucket.

  3. logscale
    groupBy(_hash, function=count())

    Groups the events by the _hash field. For each group, it counts the number of events and returns the result in a new field named _count. This aggregation reduces the data to show how many events fall into each hash bucket.

  4. Event Result set.

Summary and Results

The query is used to create consistent sample groups from large datasets by hashing a field value into a limited number of buckets.

This query is useful, for example, to analyze patterns in web traffic by sampling IP addresses into manageable groups while maintaining consistency - the same IP address will always hash to the same group. This can help identify behavioral patterns or anomalies in subsets of your traffic.

Sample output from the incoming example data:

_hash_count
21
31
61
83
91

Note that the hash values remain consistent for the same input, enabling reliable sampling across time periods.

Generate Different But Consistent Hash Values From Same Input Data

Generate different but consistent hash values from the same input data using the hash() function with seed

Query
logscale
hash(field=user_id, limit=5, as="hash1")
| hash(field=user_id, limit=5, seed=10, as="hash2")
| hash(field=user_id, limit=5, seed=20, as="hash3")
Introduction

In this example, the hash() function is used with different seed values to demonstrate how the same input data can generate different hash values. This is useful when you need multiple independent but consistent ways to group or sample the same data.

Example incoming data might look like this:

@timestampuser_idactiondepartment
2023-06-15T10:00:00Zuser123loginsales
2023-06-15T10:00:01Zuser456logoutmarketing
2023-06-15T10:00:02Zuser123viewsales
2023-06-15T10:00:03Zuser789loginengineering
2023-06-15T10:00:04Zuser456loginmarketing
2023-06-15T10:00:05Zuser123searchsales
2023-06-15T10:00:06Zuser789logoutengineering
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    hash(field=user_id, limit=5, as="hash1")

    Creates hash values from the user_id field and returns the results in the field hash1.

  3. logscale
    | hash(field=user_id, limit=5, seed=10, as="hash2")

    Creates a second set of hash values from the same user_id field but uses a seed value of 10 and returns the results in the field hash2. Using a different seed value creates a different but equally consistent distribution of hash values.

  4. logscale
    | hash(field=user_id, limit=5, seed=20, as="hash3")

    Creates a third set of hash values using a seed value of 20 and returns the results in the field hash3. This demonstrates how different seed values create different hash distributions for the same input data.

  5. Event Result set.

Summary and Results

The query is used to generate different but consistent hash values from the same input data by using different seed values.

This query is useful, for example, to create multiple independent sampling groups from the same data set, or to implement A/B/C testing where users need to be consistently assigned to different test groups.

Sample output from the incoming example data:

actiondepartmenthash1hash2hash3user_id
loginsales302user123
logoutmarketing222user456
viewsales302user123
loginengineering441user789
loginmarketing222user456
searchsales302user123
logoutengineering441user789

Important notes about the output:

  • The same user_id always generates the same hash value within each hash field.

  • Different seed values create different hash values for the same input.

  • The hash values remain within the specified limit range (0-4) regardless of the seed value.

  • The distribution pattern changes with different seeds while maintaining consistency for each input value.

Hash a Field Using Different Seeds

Generate hash values using the hash() function with different seeds

Query
logscale
| hash_seed10 := hash(field=[username], seed=10)
| hash_seed20 := hash(field=[username], seed=20)
Introduction

In this example, the hash() function is used to demonstrate how different seed values affect the hash output while maintaining consistency for the same input values.

Example incoming data might look like this:

@timestampusernameaction
2025-08-27T08:51:51.312Zalicelogin
2025-08-27T09:15:22.445Zboblogin
2025-08-27T10:30:15.891Zalicelogout
2025-08-27T11:45:33.167Zcharlielogin
2025-08-27T12:20:44.723Zboblogout
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    | hash_seed10 := hash(field=[username], seed=10)
    | hash_seed20 := hash(field=[username], seed=20)

    Creates two new fields with different hash values for the same input:

    • Field hash_seed10 contains hash values generated with seed=10

    • Field hash_seed20 contains hash values generated with seed=20

    The field parameter specifies username as the input field in an array format. The seed parameter initializes the hashing algorithm with different values, producing different but consistent hash patterns.

  3. Event Result set.

Summary and Results

The query is used to demonstrate how different seed values affect hash generation while maintaining consistency for identical inputs.

This query is useful, for example, to create multiple different pseudonymous identifiers for the same data, compare hash distributions with different seeds, or understand how seed values affect hash generation

Sample output from the incoming example data:

usernameactionhash_seed10hash_seed20
alicelogin72349810736145328918945672301234567890
boblogin41235678901234567895678901234567890123
alicelogout72349810736145328918945672301234567890
charlielogin98765432109876543212345678901234567890
boblogout41235678901234567895678901234567890123

Note that the same username produces different hash values with different seeds (compare hash_seed10 and hash_seed20 for alice). Each seed consistently produces the same hash value for the same input (notice how alice always has the same hash value within each seed).