Extends the groupBy() function for grouping by time.

This function produces a table, if a graph is a desired, please consider using timeChart() instead.

This function divides the search time interval into buckets. Each event is put into a bucket based on its timestamp.

Events are grouped by their bucket, generating the field _bucket. The value of _bucket is the corresponding bucket's start time in milliseconds (UTC time).

The bucket() function takes all the same parameters as groupBy(). The _bucket is added to the fields grouped by.

ParameterTypeRequiredDefaultDescription
bucketsnumberfalse Defines the number of buckets. The time span is defined by splitting the query time interval into this many buckets. 0..1500
fieldstringfalse Specifies which fields to group by. Note it is possible to group by multiple fields.
function[Aggregate]falsecount(as=_count)Specifies which aggregate functions to perform on each group. Default is to count the elements in each group.
limitnumberfalse10Defines the maximum number of series to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly.
  Maximum500 
minSpanstringfalse Defines the time span for each bucket. The time span is defined as a relative-time-syntax such as 1hour or 3 weeks. If not provided or set to auto the search time interval, and thus the number of buckets, is determined dynamically.
spanstringfalseautoDefines the time span for each bucket. The time span is defined as a relative time modifier like 1hour or 3 weeks. If not provided or set to auto the search time interval, and thus the number of buckets, is determined dynamically. [a]
timezonestringfalse Defines the time zone for bucketing. This value overrides timeZoneOffsetMinutes which may be passed in the HTTP/JSON query API. For example, timezone=UTC or timezone='+02:00'. See the full list of timezones supported by LogScale at Supported Timezones.
unit[string]false Each value is a unit conversion for the given column. For instance: bytes/span to Kbytes/day converts a sum of bytes into Kb/day automatically taking the time span into account. If present, this array must be either length 1 (apply to all series) or have the same length as the function parameter.

[a] If an argument name is not given, span is the default argument.

When generating aggregated buckets against data, the exact number of buckets may not match the expected due to the combination of the query span, requested number of buckets, and available event data.

For example, given a query displaying buckets for every one minute, but with a query interval of 1 hour starting at 09:17:30, 61 buckets will be created, as represented by the shaded intervals shown in Figure 376, “Bucket Allocation using bucket()”:

Bucket Allocation using bucket)

Figure 376. Bucket Allocation using bucket()


The buckets are generated, first based on the requested timespan interval or number of buckets, and then on the relevant timespan boundary. For example:

  • An interval per hour across a day will start at 00:00

  • An interval of a minute across an hour will start at 09:00:00

Buckets will contain the following event data:

  • The first bucket will contain the extracted event data for the relevant timespan (1 bucket per minute from 09:17), but only containing events after query interval. For example, the bucket will start 09:17, but contain only events with a timestamp after 09:17:30

  • The next 58 buckets will contain the event data for each minute.

  • Bucket 60 will contain the event data up until 10:17:30.

  • Bucket 61 will contain any remaining data from the last time interval bucket.

The result is that the number of buckets returned will be 61, even though the interval is per minute across a one hour boundary. The trailing data will always be included in the output. It may have an impact on the data displayed when bucket() is used in combination with a Time Chart.

bucket() Examples

Aggregating Status Codes by count() per minute

Query
logscale
bucket(1min, field=status_code, function=count())
Introduction

Counts different http status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.

Step-by-Step
  • Set the bucket interval to 1 minute, aggregating the count of the field status_code

    logscale
    bucket(1min, field=status_code, function=count())

Bucket Counts when using bucket()

Query
logscale
bucket(buckets=24, function=sum("count"))
| parseTimestamp(field=_bucket,format=millis)
Introduction

When generating a list of buckets using the bucket() the output will always contain one more bucket that the number defined in buckets. This is to accommodate all the values that will fall outside the given time frame across the requested number of buckets. This calculation is due to the events being bound by the bucket in which they have been stored, resulting in the bucket() selecting the buckets for the given time range and any remainder. For example, when requesting 24 buckets over a period of one day in the humio-metrics repository:

Step-by-Step
  • Bucket the events into 24 groups, using the sum() on the count.

    logscale
    bucket(buckets=24, function=sum("count"))
  • Extract the timestamp from the generated bucket and convert to a date time value; in this example the bucket outputs the timestamp as an epoch value in the _bucket field.

    logscale
    | parseTimestamp(field=_bucket,format=millis)
Summary and Results

The resulting outputs shows 25 buckets, the original 24 requested one additional that contains all the data after the requested timespan for the requested number of buckets.

logscale
_bucket          _sum             @timestamp
1681290000000    1322658945428    1681290000000
1681293600000    1879891517753    1681293600000
1681297200000    1967566541025    1681297200000
1681300800000    2058848152111    1681300800000
1681304400000    2163576682259    1681304400000
1681308000000    2255771347658    1681308000000
1681311600000    2342791941872    1681311600000
1681315200000    2429639369980    1681315200000
1681318800000    2516589869179    1681318800000
1681322400000    2603409167993    1681322400000
1681326000000    2690189000694    1681326000000
1681329600000    2776920777654    1681329600000
1681333200000    2873523432202    1681333200000
1681336800000    2969865160869    1681336800000
1681340400000    3057623890645    1681340400000
1681344000000    3144632647026    1681344000000
1681347600000    3231759376472    1681347600000
1681351200000    3318929777092    1681351200000
1681354800000    3406027872076    1681354800000
1681358400000    3493085788508    1681358400000
1681362000000    3580128551694    1681362000000
1681365600000    3667150316470    1681365600000
1681369200000    3754207997997    1681369200000
1681372800000    3841234050532    1681372800000
1681376400000    1040019734927    1681376400000

Bucket Events summarized by count()

Query
logscale
bucket(function=count())
Introduction

Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:

Step-by-Step
  • Summary events using the count() into buckets across the selected timespan.

    logscale
    bucket(function=count())

Showing Percentiles across Multiple Buckets

Query
logscale
bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
Introduction

Show response time percentiles over time. Calculate percentiles per minute (bucket time into 1 minute intervals):

Step-by-Step
  • Using a 60 second timespan for each bucket, display the percentile() for the responsetime field.

    logscale
    bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))