Create Hash Values from Multiple Fields with Limited Range

Generate hash values from fields with modulo limit using the hash() function

Query

flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result
logscale
hash([user_id, department, action], limit=10)

Introduction

The hash() function can be used to generate a non-cryptographic hash value from field contents. By default, it returns an integer in the range [0,4294967295], but this range can be reduced using the limit parameter which applies a modulo operation to the result.

In this example, the hash() function is used to generate a hash value from multiple fields, with the result limited to a smaller range of 10 using the limit parameter.

Example incoming data might look like this:

@timestampuser_iddepartmentactionresource
2025-08-06T10:00:00Zuser123salesreaddocument1
2025-08-06T10:00:01Zuser456marketingwritedocument2
2025-08-06T10:00:02Zuser789engineeringdeletedocument3
2025-08-06T10:00:03Zuser234salesupdatedocument4
2025-08-06T10:00:04Zuser567marketingreaddocument5
2025-08-06T10:00:05Zuser890engineeringwritedocument6
2025-08-06T10:00:06Zuser345salesdeletedocument7
2025-08-06T10:00:07Zuser678marketingupdatedocument8
2025-08-06T10:00:08Zuser901engineeringreaddocument9
2025-08-06T10:00:09Zuser432saleswritedocument10

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1[(Function)] result{{Result Set}} repo --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    hash([user_id, department, action], limit=10)

    Creates a non-cryptographic hash value from the contents of fields user_id, department, and action. While the function normally returns an integer between 0 and 4294967295, the limit parameter set to 10 applies a modulo operation to the hash value, ensuring the result is between 0 and 9.

    The hash() function returns the result in a field named _hash by default. This is particularly useful for reducing the number of groups in subsequent groupBy operations.

  3. Event Result set.

Summary and Results

The query is used to generate consistent numerical hash values from multiple fields while constraining the output to a specified range using modulo.

This query is useful, for example, to reduce the number of distinct groups in a groupBy() operation when dealing with high-cardinality data, accepting that collisions will occur due to the limited output range.

Sample output from the incoming example data:

@timestampuser_iddepartmentactionresource_hash
2025-08-06T10:00:00Zuser123salesreaddocument17
2025-08-06T10:00:01Zuser456marketingwritedocument23
2025-08-06T10:00:02Zuser789engineeringdeletedocument35
2025-08-06T10:00:03Zuser234salesupdatedocument42
2025-08-06T10:00:04Zuser567marketingreaddocument58

Note that without a limit parameter, the function would return values between 0 and 4294967295. The limit parameter uses modulo to reduce the output range, in this case to 0-9.

For visualizing this data, a table widget would be effective to show the original fields alongside their hash values. When using the hash() function with groupBy(), a pie chart widget could help visualize the distribution of events across the limited hash values. To monitor the effectiveness of the hash distribution within the limited range, consider using a bar chart widget to show the frequency of each hash value.