Extends the groupBy() function for grouping by time, diving the search time interval into buckets. Each event is put into a bucket based on its timestamp.

When using the bucket() function, events are grouped by a number of notional 'buckets', each defining a timespan, calculated by dividing the time range by the number of required buckets. The function creates a new field, _bucket, that contains the corresponding bucket's start time in milliseconds (UTC time).

The bucket() function accepts the same parameters as groupBy().

The output from the bucket() is a table and can be used as the input for a variety of ???. Alternatively, use the timeChart() function.

ParameterTypeRequiredDefaultDescription
bucketsnumberoptional[a]  Defines the number of buckets. The time span is defined by splitting the query time interval into this many buckets. 0..1500
fieldstringoptional[a]  Specifies which fields to group by. Note it is possible to group by multiple fields.
functionArray of Aggregate Functionsoptional[a]count(as=_count) Specifies which aggregate functions to perform on each group. Default is to count the elements in each group.
limitintegeroptional[a]10 Defines the maximum number of series to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly.
  Maximum500 
minSpanlongoptional[a]  It sets the minimum allowed span for each bucket, for cases where the buckets parameter has a high value and therefore the span of each bucket can be so small as to be of no use. It is defined as a Relative Time Syntax such as 1hour or 3 weeks. minSpan can be as long as the search interval at most — if set as longer instead, a warning notifies that the search interval is used as the minSpan.
span[b]relative-timeoptional[a]auto Defines the time span for each bucket. The time span is defined as a relative time modifier like 1hour or 3 weeks. If not provided or set to auto the search time interval, and thus the number of buckets, is determined dynamically.
timezonestringoptional[a]  Defines the time zone for bucketing. This value overrides timeZoneOffsetMinutes which may be passed in the HTTP/JSON query API. For example, timezone=UTC or timezone='+02:00'. See the full list of timezones supported by LogScale at Supported Timezones.
unitArray of stringsoptional[a]  Each value is a unit conversion for the given column. For instance: bytes/span to Kbytes/day converts a sum of bytes into Kb/day automatically taking the time span into account. If present, this array must be either length 1 (apply to all series) or have the same length as function.

[a] Optional parameters use their default value unless explicitly set

[b] The argument name span can be omitted.

Omitted Argument Names

The argument name for span can be omitted; the following forms of this function are equivalent:

logscale
bucket("auto")

and:

logscale
bucket(span="auto")

When generating aggregated buckets against data, the exact number of buckets may not match the expected due to the combination of the query span, requested number of buckets, and available event data.

For example, given a query displaying buckets for every one minute, but with a query interval of 1 hour starting at 09:17:30, 61 buckets will be created, as represented by the shaded intervals shown in Figure 174, “Bucket Allocation using bucket()”:

Bucket Allocation using bucket)

Figure 174. Bucket Allocation using bucket()


The buckets are generated, first based on the requested timespan interval or number of buckets, and then on the relevant timespan boundary. For example:

  • An interval per hour across a day will start at 00:00

  • An interval of a minute across an hour will start at 09:00:00

Buckets will contain the following event data:

  • The first bucket will contain the extracted event data for the relevant timespan (1 bucket per minute from 09:17), but only containing events after query interval. For example, the bucket will start 09:17, but contain only events with a timestamp after 09:17:30

  • The next 58 buckets will contain the event data for each minute.

  • Bucket 60 will contain the event data up until 10:17:30.

  • Bucket 61 will contain any remaining data from the last time interval bucket.

The result is that the number of buckets returned will be 61, even though the interval is per minute across a one hour boundary. The trailing data will always be included in the output. It may have an impact on the data displayed when bucket() is used in combination with a Time Chart.

bucket() Examples

Aggregating Status Codes by count() per minute

Query
flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result
logscale
bucket(1min, field=status_code, function=count())
Introduction

Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Set the bucket interval to 1 minute, aggregating the count of the field status_code.

    logscale
    bucket(1min, field=status_code, function=count())
  3. Event Result set

Summary and Results

Bucketing allows for data to be collected according to a time range. Using the right aggregation function to quantify the value groups that information into the buckets suitable for graphing for example with a Bar Chart, with the size of the bar using the declared function result, count() in this example.

Bucket Counts when using bucket()

Query
flowchart LR; repo{{Events}} 0{{Aggregate}} 1>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> result

Search Repository: humio-metrics

logscale
bucket(buckets=24, function=sum("count"))
| parseTimestamp(field=_bucket,format=millis)
Introduction

When generating a list of buckets using the bucket() function, the output will always contain one more bucket than the number defined in buckets. This is to accommodate all the values that will fall outside the given time frame across the requested number of buckets. This calculation is due to the events being bound by the bucket in which they have been stored, resulting in bucket() selecting the buckets for the given time range and any remainder. For example, when requesting 24 buckets over a period of one day in the humio-metrics repository:

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0{{Aggregate}} 1>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Bucket the events into 24 groups, using the sum() function on the count field.

    logscale
    bucket(buckets=24, function=sum("count"))
  3. flowchart LR; repo{{Events}} 0{{Aggregate}} 1>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    Extract the timestamp from the generated bucket and convert to a date time value; in this example the bucket outputs the timestamp as an epoch value in the _bucket field.

    logscale
    | parseTimestamp(field=_bucket,format=millis)
  4. Event Result set

Summary and Results

The resulting output shows 25 buckets, the original 24 requested one additional that contains all the data after the requested timespan for the requested number of buckets.

_bucket_sum@timestamp
168129000000013226589454281681290000000
168129360000018798915177531681293600000
168129720000019675665410251681297200000
168130080000020588481521111681300800000
168130440000021635766822591681304400000
168130800000022557713476581681308000000
168131160000023427919418721681311600000
168131520000024296393699801681315200000
168131880000025165898691791681318800000
168132240000026034091679931681322400000
168132600000026901890006941681326000000
168132960000027769207776541681329600000
168133320000028735234322021681333200000
168133680000029698651608691681336800000
168134040000030576238906451681340400000
168134400000031446326470261681344000000
168134760000032317593764721681347600000
168135120000033189297770921681351200000
168135480000034060278720761681354800000
168135840000034930857885081681358400000
168136200000035801285516941681362000000
168136560000036671503164701681365600000
168136920000037542079979971681369200000
168137280000038412340505321681372800000
168137640000010400197349271681376400000

Bucket Events summarized by count()

Query
flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result
logscale
bucket(function=count())
Introduction

Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Summarize events using the count() into buckets across the selected timespan.

    logscale
    bucket(function=count())
  3. Event Result set

Summary and Results

This query organizes data into buckets according to the count of events.

Showing Percentiles across Multiple Buckets

Query
flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result
logscale
bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
Introduction

Show response time percentiles over time. Calculate percentiles per minute by bucketing into 1 minute intervals:

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Using a 60 second timespan for each bucket, display the percentile() for the responsetime field.

    logscale
    bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
  3. Event Result set

Summary and Results

The percentile() quantifies values by determining whether the value is larger than a percentage of the overall values. The output provides a powerful view of the relative significance of a value. Combined in this example with bucket(), the query will generate buckets of data showing the comparative response time for every 60 seconds.