Computes a non-cryptographic hash of a list of fields. The hash
is returned as an integer in the range [0,4294967295]. Calling
this function with the same values and the same seed (or no
seed) will result in the same hash being computed. This hash is
not cryptographic and should not be used to securely obscure
data (instead use hashRewrite()
and
hashMatch()
for that). This function can,
for example, be used to reduce the number of groups in a
groupBy()
, at the cost of having
collisions.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _hash | The output name of the field to set. |
field [b] | array of strings | required | The fields for which to compute hash values. | |
limit | number | optional[a] | An upper bound on the number returned by this function. The returned hash will be modulo this value and thus be constrained to the range [0,limit]. | |
Minimum | 1 | |||
seed | string | optional[a] | An optional seed for the hash function. | |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale Syntaxhash(["value"])
and:
logscale Syntaxhash(field=["value"])
These examples show basic structure only.
hash()
Syntax Examples
Hash the field a
and put
the result into _hash:
hash(a)
Hash the fields a
,
b
, and
c
and put the result
modulo 10 into _hash
hash([a,b,c], limit=10)
Hash the field a
(by
setting field explicitly) using a seed of 10
hash(field=[a], seed=10)
Group events into 10 buckets such that all events with the
same value of a
ends in
the same bucket.
hash(a, limit=10)
| groupBy(_hash)
hash()
Examples
Click
next to an example below to get the full details.Create Sample Groups Using Hash
Create consistent sample groups of events using the
hash()
Query
hash(ip_address, limit=10)
groupBy(_hash, function=count())
Introduction
In this example, the hash()
function is used to
create sample groups from web server access logs based on IP addresses.
This allows for consistent grouping of events from the same IP address
while limiting the total number of groups.
Example incoming data might look like this:
bytes_sent | ip_address | request_path | status_code | @timestamp |
---|---|---|---|---|
1532 | 192.168.1.100 | /home | 200 | 2023-06-15T10:00:00Z |
892 | 192.168.1.201 | /notfound | 404 | 2023-06-15T10:00:01Z |
2341 | 192.168.10.100 | /about | 200 | 2023-06-15T10:00:02Z |
721 | 192.168.15.102 | /error | 500 | 2023-06-15T10:00:03Z |
1267 | 192.168.1.101 | /contact | 200 | 2023-06-15T10:00:04Z |
1843 | 192.168.1.103 | /products | 200 | 2023-06-15T10:00:05Z |
1654 | 192.168.15.100 | /cart | 200 | 2023-06-15T10:00:06Z |
Step-by-Step
Starting with the source repository events.
- logscale
hash(ip_address, limit=10)
Creates a hash value from the ip_address field and returns the result in a new field named _hash (default). This creates a consistent mapping where the same IP address will always generate the same hash value.
.The
limit
parameter is set to10
, which ensures the hash values are distributed across 10 buckets (0-9). All events with the same value of ip-address ends in the same bucket. - logscale
groupBy(_hash, function=count())
Groups the events by the _hash field. For each group, it counts the number of events and returns the result in a new field named _count. This aggregation reduces the data to show how many events fall into each hash bucket.
Event Result set.
Summary and Results
The query is used to create consistent sample groups from large datasets by hashing a field value into a limited number of buckets.
This query is useful, for example, to analyze patterns in web traffic by sampling IP addresses into manageable groups while maintaining consistency - the same IP address will always hash to the same group. This can help identify behavioral patterns or anomalies in subsets of your traffic.
Sample output from the incoming example data:
_hash | _count |
---|---|
2 | 1 |
3 | 1 |
6 | 1 |
8 | 3 |
9 | 1 |
Note that the hash values remain consistent for the same input, enabling reliable sampling across time periods.
Generate Different But Consistent Hash Values From Same Input Data
Generate different but consistent hash values from the same input
data using the hash()
function with seed
Query
hash(field=user_id, limit=5, as="hash1")
| hash(field=user_id, limit=5, seed=10, as="hash2")
| hash(field=user_id, limit=5, seed=20, as="hash3")
Introduction
In this example, the hash()
function is used with
different seed
values to
demonstrate how the same input data can generate different hash values.
This is useful when you need multiple independent but consistent ways to
group or sample the same data.
Example incoming data might look like this:
@timestamp | user_id | action | department |
---|---|---|---|
2023-06-15T10:00:00Z | user123 | login | sales |
2023-06-15T10:00:01Z | user456 | logout | marketing |
2023-06-15T10:00:02Z | user123 | view | sales |
2023-06-15T10:00:03Z | user789 | login | engineering |
2023-06-15T10:00:04Z | user456 | login | marketing |
2023-06-15T10:00:05Z | user123 | search | sales |
2023-06-15T10:00:06Z | user789 | logout | engineering |
Step-by-Step
Starting with the source repository events.
- logscale
hash(field=user_id, limit=5, as="hash1")
Creates hash values from the user_id field and returns the results in the field hash1.
- logscale
| hash(field=user_id, limit=5, seed=10, as="hash2")
Creates a second set of hash values from the same user_id field but uses a
seed
value of10
and returns the results in the field hash2. Using a different seed value creates a different but equally consistent distribution of hash values. - logscale
| hash(field=user_id, limit=5, seed=20, as="hash3")
Creates a third set of hash values using a
seed
value of20
and returns the results in the field hash3. This demonstrates how different seed values create different hash distributions for the same input data. Event Result set.
Summary and Results
The query is used to generate different but consistent hash values from
the same input data by using different
seed
values.
This query is useful, for example, to create multiple independent sampling groups from the same data set, or to implement A/B/C testing where users need to be consistently assigned to different test groups.
Sample output from the incoming example data:
action | department | hash1 | hash2 | hash3 | user_id |
---|---|---|---|---|---|
login | sales | 3 | 0 | 2 | user123 |
logout | marketing | 2 | 2 | 2 | user456 |
view | sales | 3 | 0 | 2 | user123 |
login | engineering | 4 | 4 | 1 | user789 |
login | marketing | 2 | 2 | 2 | user456 |
search | sales | 3 | 0 | 2 | user123 |
logout | engineering | 4 | 4 | 1 | user789 |
Important notes about the output:
The same user_id always generates the same hash value within each hash field.
Different seed values create different hash values for the same input.
The hash values remain within the specified limit range (0-4) regardless of the seed value.
The distribution pattern changes with different seeds while maintaining consistency for each input value.