Extends the groupBy() function for grouping
by time, diving the search time interval into buckets. Each
event is put into a bucket based on its timestamp.
When using the bucket() function, events
are grouped by a number of notional 'buckets', each defining a
timespan, calculated by dividing the time range by the number of
required buckets. The function creates a new field,
_bucket, that contains
the corresponding bucket's start time in milliseconds (UTC
time).
Specifies which aggregate functions to perform on each group. Default is to count the elements in each group. If several aggregators are listed for the function parameter, then their outputs are combined using the rules described for stats().
It sets the minimum allowed span for each bucket, for cases where the buckets parameter has a high value and therefore the span of each bucket can be so small as to be of no use. It is defined as a Relative Time Syntax such as 1hour or 3 weeks. minSpan can be as long as the search interval at most — if set as longer instead, a warning notifies that the search interval is used as the minSpan.
Defines the time span for each bucket. The time span is defined as a relative time modifier like 1hour or 3 weeks. If not provided or set to auto the search time interval, and thus the number of buckets, is determined dynamically.
Defines the time zone for bucketing. This value overrides timeZoneOffsetMinutes which may be passed in the HTTP/JSON query API. For example, timezone=UTC or timezone='+02:00'. See the full list of timezones supported by LogScale at Supported Time Zones.
Each value is a unit conversion for the given column. For instance: bytes/span to Kbytes/day converts a sum of bytes into Kb/day automatically taking the time span into account. If present, this array must be either length 1 (apply to all series) or have the same length as function.
[a] Optional parameters use their default value unless explicitly set.
Hide omitted argument names for this functionShow omitted argument names for this function
Omitted Argument Names
The argument name for span can be omitted; the following forms of this function are equivalent:
logscale Syntax
bucket("value")
and:
logscale Syntax
bucket(span="value")
These examples show basic structure only.
When generating aggregated buckets against data, the exact
number of buckets may not match the expected due to the
combination of the query span, requested number of buckets, and
available event data.
For example, given a query displaying buckets for every one
minute, but with a query interval of 1 hour starting at
09:17:30, 61 buckets will be created, as represented by the
shaded intervals shown in
Figure 110, “Bucket Allocation using bucket()”:
Figure 110. Bucket Allocation using bucket()
The buckets are generated, first based on the requested timespan
interval or number of buckets, and then on the relevant timespan
boundary. For example:
An interval per hour across a day will start at 00:00
An interval of a minute across an hour will start at
09:00:00
Buckets will contain the following event data:
The first bucket will contain the extracted event data for
the relevant timespan (1 bucket per minute from 09:17), but
only containing events after query interval. For example,
the bucket will start 09:17, but contain only events with a
timestamp after 09:17:30
The next 58 buckets will contain the event data for each
minute.
Bucket 60 will contain the event data up until 10:17:30.
Bucket 61 will contain any remaining data from the last time
interval bucket.
The result is that the number of buckets returned will be 61,
even though the interval is per minute across a one hour
boundary. The trailing data will always be included in the
output. It may have an impact on the data displayed when
bucket() is used in combination with a
Time Chart
.
Counts different HTTP status codes over time and buckets them into
time intervals of 1 minute. Notice we group by two fields:
status code and the
implicit field _bucket.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Sets the bucket interval to 1 minute, aggregating the count of the field
status_code.
Event Result set.
Summary and Results
Bucketing allows for data to be collected according to a time range. Using
the right aggregation function to quantify the value groups that
information into the buckets suitable for graphing for example with a
Bar Chart, with the size of the bar using
the declared function result, count() in this
example.
Time series aggregate status codes by count()
per minute into buckets
Query
logscale
bucket(1min,field=status_code,function=count())
Introduction
In this example, the bucket() function is used with
count() to count different HTTP status codes over
time and bucket them into time intervals of 1 minute.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Counts different HTTP status codes over time and buckets them into time
intervals of 1 minute. Notice that we group by two fields:
status_code field and the
implicit field _bucket.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query performance.
Bucketing allows for data to be collected according to a time range.
Using the right aggregation function to quantify the value groups that
information into the buckets suitable for graphing for example with a
Bar Chart, with the size of the bar using
the declared function result, count() in this
example.
When generating a list of buckets using the
bucket() function, the output will always contain
one more bucket than the number defined in
buckets. This is to
accommodate all the values that will fall outside the given time frame
across the requested number of buckets. This calculation is due to the
events being bound by the bucket in which they have been stored,
resulting in bucket() selecting the buckets for the
given time range and any remainder. For example, when requesting 24
buckets over a period of one day in the humio-metrics
repository:
Step-by-Step
Starting with the source repository events.
logscale
bucket(buckets=24,function=sum("count"))
Buckets the events into 24 groups, using the sum()
function on the count field.
logscale
|parseTimestamp(field=_bucket,format=millis)
Extracts the timestamp from the generated bucket and convert to a date
time value; in this example the bucket outputs the timestamp as an epoch
value in the _bucket field.
Event Result set.
Summary and Results
The resulting output shows 25 buckets, the original 24 requested one
additional that contains all the data after the requested timespan for
the requested number of buckets.
_bucket
_sum
@timestamp
1681290000000
1322658945428
1681290000000
1681293600000
1879891517753
1681293600000
1681297200000
1967566541025
1681297200000
1681300800000
2058848152111
1681300800000
1681304400000
2163576682259
1681304400000
1681308000000
2255771347658
1681308000000
1681311600000
2342791941872
1681311600000
1681315200000
2429639369980
1681315200000
1681318800000
2516589869179
1681318800000
1681322400000
2603409167993
1681322400000
1681326000000
2690189000694
1681326000000
1681329600000
2776920777654
1681329600000
1681333200000
2873523432202
1681333200000
1681336800000
2969865160869
1681336800000
1681340400000
3057623890645
1681340400000
1681344000000
3144632647026
1681344000000
1681347600000
3231759376472
1681347600000
1681351200000
3318929777092
1681351200000
1681354800000
3406027872076
1681354800000
1681358400000
3493085788508
1681358400000
1681362000000
3580128551694
1681362000000
1681365600000
3667150316470
1681365600000
1681369200000
3754207997997
1681369200000
1681372800000
3841234050532
1681372800000
1681376400000
1040019734927
1681376400000
Bucket Events Into Groups
Bucket events into 24 groups using the
count() function and
buckets() function
In this example, the bucket() function is used to
request 24 buckets over a period of one day in the
humio-metrics repository.
Step-by-Step
Starting with the source repository events.
logscale
bucket(buckets=24,function=sum("count"))
Buckets the events into 24 groups spanning over a period of one day,
using the sum() function on the
count field.
logscale
|parseTimestamp(field=_bucket,format=millis)
Extracts the timestamp from the generated bucket and converts the
timestamp to a date time value. In this example, the bucket outputs the
timestamp as an epoch value in the
_bucket field. This results in
an additional bucket containing all the data after the requested
timespan for the requested number of buckets.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query performance by
making et easier to manage and locate data subsets when performing
analytics tasks. Note that the resulting outputs shows 25 buckets; the
original requested 24 buckets and in addition the bucket for the
extracted timestamp.
Divides the search time interval into buckets. As time span is not
specified, the search interval is divided into 127 buckets. Events in
each bucket are counted:
Step-by-Step
Starting with the source repository events.
logscale
bucket(function=count())
Summarizes events using the count() into buckets
across the selected timespan.
Event Result set.
Summary and Results
This query organizes data into buckets according to the count of events.
Calculate Relationship Between X And Y Variables - Example 2
Calculate the linear relationship between server load and total
response size using the linReg() function
with bucket()
In this example, the linReg() function is used to
calculate the linear relationship between
bytes_sent (x variable) and
server_load_pct (y variable).
The example shows the relationship between server load percentage and
total response size across time.
Buckets the data points by time, then calculates the sum of bytes sent
for each bucket returning the result in a field named
x, and calculates the average server load
percentage for each bucket returning the result in a field named
y.
logscale
|linReg(x=x, y=y)
Correlates x with y, showing
the relationship between the variables x and
y and outputs the results in fields named
_slope (slope
value),_intercept (intercept
value),_r2 (adjusted R-squared value), and
_n (number of data points). These four key values
indicate relationship strength and reliability.
Event Result set.
Summary and Results
The query is used to calculate a linear relationship between
bytes_sent (x variable) and
server_load_pct (y variable).
Calculating the relationship between server load percentage and total
response size is useful to identify different operational patterns, such
as, for example, performance bottlenecks, resource allocation issues, or
to identify system optimization opportunities.
Sample output from the incoming example data:
_slope
_intercept
_r2
_n
0.00010617525557193158
28.934098111407938
0.991172367336835
10
_slope is the rate of change between server load
and response size.
_intercept is the baseline relationship value.
_r2 is the statistical accuracy of the linear
model.
_n is the total number of data points analyzed.
Calculate Relationship Between X And Y Variables - Example 3
Calculate the linear relationship between server load and each of
several types of request types using the
linReg() function with
bucket() and groupBy()
In this example, the linReg() function is used to
calculate the linear relationship between
request_type (x variable) and
server_load_pct (y variable).
The example shows the relationship between server load and each of
several types of HTTP request types across time.
Buckets the data points by time, then calculates the average server
load for each time bucket returning the result in a field named
y. It also groups the request types in a field
named request_type and makes a count of requests
by type in each time bucket returning the result in a field named
x.
logscale
|groupBy(request_type,function=linReg(x=x, y=y))
Correlates x with y, showing
the relationship between the variables x and
y for each HTTP request type and outputs the results
in fields named _slope (slope
value),_intercept (intercept
value),_r2 (adjusted R-squared value), and
_n (number of data points). These four key values
indicate relationship strength and reliability.
Event Result set.
Summary and Results
The query is used to analyze how different HTTP request types affect
server load. The analysis helps identify which HTTP request types have
the strongest impact on server performance.
Sample output from the incoming example data:
request_type
_slope
_intercept
_r2
_n
DELETE
<no value>
<no value>
<no value>
<no value>
GET
-13.749999999999941
72.7999999999999
0.5941824574313592
5
POST
16.29999999999992
32.70000000000012
0.7196207242484238
3
PUT
<no value>
<no value>
<no value>
<no value>
_slope is the impact rate of request volume on
server load.
_intercept is the baseline server load when there
are no requests of a specific type.
_r2 is the statistical accuracy of the
relationship.
_n is the total number of data points analyzed.
Compute Cumulative Aggregation Across Buckets
Compute a cumulative aggregation across buckets using the
accumulate() function with
timeChart()
In this example, the accumulate() function is used
with timeChart() to accumulate values across time
intervals.
Note that the accumulate() function must be used
after an aggregator function to ensure event ordering.
Example incoming data might look like this:
@timestamp
key
value
1451606301001
a
5
1451606301500
b
6
1451606301701
a
1
1451606302001
c
2
1451606302201
b
6
Step-by-Step
Starting with the source repository events.
logscale
timeChart(span=1000ms,function=sum(value))
Groups data into 1-second buckets over a 4-second period, sums the field
value for each bucket and
returns the results in a field named
_sum. The result is displayed in
a timechart.
logscale
|accumulate(sum(_sum,as=_accumulated_sum))
Calculates a running total of the sums in the
_sum field, and returns the
results in a field named
_accumulated_sum.
Event Result set.
Summary and Results
The query is used to accumulate values across time intervals/buckets.
The query is useful for tracking cumulative metrics or identifying
trends in the data.
Buckets the values, using the field #repo using a
count()
logscale
|@timestamp:=_bucket
Updates the timestamp to the value generated by the
bucket()
logscale
|drop(_bucket)
Discards the _bucket field from
the results.
Event Result set.
Summary and Results
The query can be run on each repo. Or, create a view that looks across
multiple repos and then run it from there to get all the repo counts in
one search.
Using a 60 second timespan for each bucket, displays the
percentile() for the
responsetime field.
Event Result set.
Summary and Results
The percentile() quantifies values by determining
whether the value is larger than a percentage of the overall values. The
output provides a powerful view of the relative significance of a value.
Combined in this example with bucket(), the query
will generate buckets of data showing the comparative response time for
every 60 seconds.