percentile() is an estimation function that estimates percentiles over a given collection of numbers.

ParameterTypeRequiredDefault ValueDescription
accuracydoubleoptional[a] 0.01 Provided as a relative error threshold. Can be between >0 and <1: values closer to 1 means lower accuracy, values closer to 0 means higher accuracy.
asstringoptional[a]   Prefix of output fields.
field[b]stringrequired   Specifies the field for which to calculate percentiles. The field must contain numbers.
percentilesarray of numbersoptional[a] [50, 75, 99] Specifies which percentiles to calculate.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

A percentile is a comparison value between a particular value and the values of the rest of a group. This enables the identification of scores that a particular score surpassed. For example, with a value of 75 ranked in the 85th percentile, it means that the score 75 is higher than 85% of the values of the entire group. This can be used to determine threshold and limits for triggering events or scoring probabilities and threats.

For example, given the values 12, 25, 50 and 99, the 50th percentile would be any value between 25 and 50, in this case the percentile() function will return 25.79. Note that LogScale's percentile function returns any valid value in order to reduce resource usage and not the mean of valid values as percentile algorithms in general often returns.

Note

LogScale uses an approximative algorithm of percentiles in order to achieve a good balance of speed, memory usage and accuracy.

The function returns one event with a field for each of the percentiles specified in the percentiles parameter. Fields are named like by prepending _ to the values specified in the percentiles parameter. For example the event could contain the fields _50, _75 and _99.

The following conditions apply when using this function:

  • The function only works on non-negative input values.

  • The accuracy argument specifies the accuracy of the percentile relative to the number estimated and is intended as a relative error tolerance (lower values implies a better accuracy). Some examples:

    • An accuracy of 0.001 specifies the accuracy of the percentile relative to the number estimated (note that specifying accuracy=0.001 actually implies that the accuracy is 0.999). The number estimated depends on the accuracy argument and the amount of data available. A larger amount of data returns better estimations.

      For example, with an original value of 1000 the value would be betwen 999 and 1001 (1000-1000/1000 and (1000+1000/1000)).

    • An accuracy of 0.01 means accuracy to 1/100 of the original value.

      For example, with an original value of 1000 the value between 990 and 1010 ((1000-1000/100 and (1000+1000/100)).

      With an original value of 500 the value would be between 495 and 505 ((500-500/100 and 500+500/100)).

Important

Higher accuracy implies a high memory usage. Be careful to choose the accuracy for the kind of precision they need from the expected output value. Lower percentiles are discarded if the memory usage becomes too high. If your percentiles seems off, try reducing the accuracy.

percentile() Syntax Examples

Calculate the 50th,75th,99th and 99.9th percentiles for events with the field responsetime:

logscale
percentile(field=responsetime, percentiles=[50, 75, 99, 99.9])

In a timechart, calculate percentiles for both of the fields r1 and r2.

logscale
timeChart(function=[percentile(field=r1,as=r1),percentile(field=r2,as=r2)])

To calculate the median for a given value, use percentile() with percentiles set to 50:

logscale
percentile(field=allocBytes,percentiles=[50],as=median)

This creates the field median_50 with the 50th percentile value.

percentile()Examples

Click + next to an example below to get the full details.

Create Time Chart Widget for All Events

Query
logscale
timeChart(span=1h, function=count())
Introduction

The Time Chart Widget is the most commonly used widget in LogScale. It displays bucketed time series data on a timeline. The timeChart() function is used to create time chart widgets, in this example a timechart that shows the number of events per hour over the last 24 hours. We do this by selecting to search over the last 24 hours in the time selector in the UI, and then we tell the function to make each time bucket one hour long (withspan=1hour).

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    timeChart(span=1h, function=count())

    Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the count() function is run on each group, giving us the number of events per hour.

  3. Event Result set.

Summary and Results

The query is used to create timechart widgets showing number of events per hour over the last 24 hours. The timechart shows one group of events per time bucket. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.

Create Time Chart Widget for Different Events

Query
logscale
timeChart(span=1h, function=count(), series=method)
Introduction

The Time Chart Widget is the most commonly used widget in LogScale. It displays bucketed time series data on a timeline. The timeChart() function is used to create time chart widgets, in this example a timechart that shows the number of the different events per hour over the last 24 hours. For example, you may want to count different kinds of HTTP methods used for requests in the logs. If those are stored in a field named method, you can use this field as a series. Furthermore, we select to search over the last 24 hours in the time selector in the UI, and also add a function to make each time bucket one hour long (withspan=1hour).

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    timeChart(span=1h, function=count(), series=method)

    Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the count() function is run on the series field to return the number of each different kinds of events per hour.

  3. Event Result set.

Summary and Results

The query is used to create timechart widgets showing number of different kinds of events per hour over the last 24 hours. In this example we do not just have one group of events per time bucket, but multiple groups: one group for every value of method that exists in the timespan we are searching in. So if we are still searching over a 24 hour period, and we have received only GET, PUT, and POST requests in that timespan, we will get three groups of events per bucket (because we have three different values for method) Therefore, we end up with 72 groups of events. And every group contains only events which correspond to some time bucket and a specific value of method. Then count() is run on each of these groups, to give us the number of GET events per hour, PUT events per hour, and POST events per hour. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.

Determine a Score Based on Field Value

Query
logscale
percentile(filesize, percentiles=[40,80],as=score)
| symbol := if(filesize > score_80, then=":+1:", else=if(filesize > score_40, then="so-so", else=":-1:"))
Introduction

When summarizing and displaying data, it may be necessary to derive a score or validity based on a test value. This can be achieved using if() by creating the score value if the underlying field is over a threshold value.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    percentile(filesize, percentiles=[40,80],as=score)

    Calculates the percentile() for the filesize field and determines what filesize that is above 40% of the overall event set, and 80% of the overall event set.

  3. logscale
    | symbol := if(filesize > score_80, then=":+1:", else=if(filesize > score_40, then="so-so", else=":-1:"))

    Compares whether the filesize is greater than 80% of the events, setting symbol to :+1:. Because if() functions can be embedded, the else parameter is another if() statement that sets symbol to so-so if the size is greater than 40%, or :+1: otherwise.

  4. Event Result set.

Summary and Results

Using if() is the best way to make conditional choices about values. The function has the benefit of being able to be embedded into other statements, unlike case.

Show Percentiles Across Multiple Buckets

Query
logscale
bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
Introduction

Show response time percentiles over time. Calculate percentiles per minute by bucketing into 1 minute intervals:

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))

    Using a 60 second timespan for each bucket, displays the percentile() for the responsetime field.

  3. Event Result set.

Summary and Results

The percentile() quantifies values by determining whether the value is larger than a percentage of the overall values. The output provides a powerful view of the relative significance of a value. Combined in this example with bucket(), the query will generate buckets of data showing the comparative response time for every 60 seconds.