percentile()
is an estimation function that
estimates percentiles over a given collection of numbers.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
accuracy | double | optional[a] | 0.01 | Provided as a relative error threshold. Can be between >0 and <1: values closer to 1 means lower accuracy, values closer to 0 means higher accuracy. |
as | string | optional[a] | Prefix of output fields. | |
field [b] | string | required | Specifies the field for which to calculate percentiles. The field must contain numbers. | |
percentiles | array of numbers | optional[a] | [50, 75, 99] | Specifies which percentiles to calculate. |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale Syntaxpercentile("value")
and:
logscale Syntaxpercentile(field="value")
These examples show basic structure only.
A percentile is a comparison value between a particular value and the values of the rest of a group. This enables the identification of scores that a particular score surpassed. For example, with a value of 75 ranked in the 85th percentile, it means that the score 75 is higher than 85% of the values of the entire group. This can be used to determine threshold and limits for triggering events or scoring probabilities and threats.
For example, given the values 12, 25, 50 and 99, the 50th
percentile would be any value between 25 and 50, in this case
the percentile()
function will return
25.79. Note that LogScale's percentile function returns
any valid value in order to reduce resource usage and not the
mean of valid values as percentile algorithms in general often
returns.
Note
LogScale uses an approximative algorithm of percentiles in order to achieve a good balance of speed, memory usage and accuracy.
The function returns one event with a field for each of the
percentiles specified in the
percentiles
parameter.
Fields are named like by prepending
_ to the values
specified in the
percentiles
parameter.
For example the event could contain the fields
_50,
_75 and
_99.
The following conditions apply when using this function:
The function only works on non-negative input values.
The
accuracy
argument specifies the accuracy of the percentile relative to the number estimated and is intended as a relative error tolerance (lower values implies a better accuracy). Some examples:An
accuracy
of0.001
specifies the accuracy of the percentile relative to the number estimated (note that specifying accuracy=0.001 actually implies that the accuracy is 0.999). The number estimated depends on theaccuracy
argument and the amount of data available. A larger amount of data returns better estimations.For example, with an original value of 1000 the value would be betwen 999 and 1001 (
1000-1000/1000
and (1000+1000/1000
)).An
accuracy
of0.01
means accuracy to 1/100 of the original value.For example, with an original value of 1000 the value between 990 and 1010 ((
1000-1000/100
and (1000+1000/100
)).With an original value of 500 the value would be between 495 and 505 ((
500-500/100
and500+500/100
)).
Important
Higher
accuracy
implies a high memory usage. Be careful to choose the accuracy
for the kind of precision they need from the expected output
value. Lower percentiles are discarded if the memory usage
becomes too high. If your percentiles seems off, try reducing
the accuracy.
percentile()
Syntax Examples
Calculate the 50th,75th,99th and 99.9th percentiles for events with the field responsetime:
percentile(field=responsetime, percentiles=[50, 75, 99, 99.9])
In a timechart, calculate percentiles for both of the fields r1 and r2.
timeChart(function=[percentile(field=r1,as=r1),percentile(field=r2,as=r2)])
To calculate the median for a given value, use
percentile()
with
percentiles
set to
50
:
percentile(field=allocBytes,percentiles=[50],as=median)
This creates the field median_50 with the 50th percentile value.
percentile()
Examples
Click
next to an example below to get the full details.Create Time Chart Widget for All Events
Query
timeChart(span=1h, function=count())
Introduction
The Time Chart Widget is the most
commonly used widget in LogScale. It displays bucketed
time series data on a timeline. The
timeChart()
function is used to create time
chart widgets, in this example a timechart that shows the number
of events per hour over the last 24 hours. We do this by selecting
to search over the last 24 hours in the time selector in the UI,
and then we tell the function to make each time bucket one hour
long (withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count())
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on each group, giving us the number of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of events per hour over the last 24 hours. The timechart shows one group of events per time bucket. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.
Create Time Chart Widget for Different Events
Query
timeChart(span=1h, function=count(), series=method)
Introduction
The Time Chart Widget is the most
commonly used widget in LogScale. It displays bucketed
time series data on a timeline. The
timeChart()
function is used to create time
chart widgets, in this example a timechart that shows the number
of the different events per hour over the last 24 hours. For
example, you may want to count different kinds of HTTP methods
used for requests in the logs. If those are stored in a field
named method, you can use
this field as a series
.
Furthermore, we select to search over the last 24 hours in the
time selector in the UI, and also add a function to make each time
bucket one hour long
(withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count(), series=method)
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on the series field to return the number of each different kinds of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
different kinds of events per hour over the last 24 hours. In this
example we do not just have one group of events per time bucket, but
multiple groups: one group for every value of
method that exists in the
timespan we are searching in. So if we are still searching over a 24
hour period, and we have received only GET
,
PUT
, and POST
requests
in that timespan, we will get three groups of events per bucket (because
we have three different values for
method) Therefore, we end up
with 72 groups of events. And every group contains only events which
correspond to some time bucket and a specific value of
method. Then
count()
is run on each of these groups, to give us
the number of GET
events per hour,
PUT
events per hour, and
POST
events per hour. When viewing and hovering
over the buckets within the time chart, the display will show the
precise value and time for the displayed bucket, with the time showing
the point where the bucket starts.
Determine a Score Based on Field Value
Query
percentile(filesize, percentiles=[40,80],as=score)
| symbol := if(filesize > score_80, then=":+1:", else=if(filesize > score_40, then="so-so", else=":-1:"))
Introduction
When summarizing and displaying data, it may be necessary to
derive a score or validity based on a test value. This can be
achieved using if()
by creating the score
value if the underlying field is over a threshold value.
Step-by-Step
Starting with the source repository events.
- logscale
percentile(filesize, percentiles=[40,80],as=score)
Calculates the
percentile()
for the filesize field and determines what filesize that is above 40% of the overall event set, and 80% of the overall event set. - logscale
| symbol := if(filesize > score_80, then=":+1:", else=if(filesize > score_40, then="so-so", else=":-1:"))
Compares whether the filesize is greater than 80% of the events, setting symbol to
:+1:
. Becauseif()
functions can be embedded, theelse
parameter is anotherif()
statement that sets symbol toso-so
if the size is greater than 40%, or:+1:
otherwise. Event Result set.
Summary and Results
Using if()
is the best way to make conditional
choices about values. The function has the benefit of being able to be
embedded into other statements, unlike
case
.
Show Percentiles Across Multiple Buckets
Query
bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
Introduction
Show response time percentiles over time. Calculate percentiles per minute by bucketing into 1 minute intervals:
Step-by-Step
Starting with the source repository events.
- logscale
bucket(span=60sec, function=percentile(field=responsetime, percentiles=[50, 75, 99, 99.9]))
Using a 60 second timespan for each bucket, displays the
percentile()
for the responsetime field. Event Result set.
Summary and Results
The percentile()
quantifies values by determining
whether the value is larger than a percentage of the overall values. The
output provides a powerful view of the relative significance of a value.
Combined in this example with bucket()
, the query
will generate buckets of data showing the comparative response time for
every 60 seconds.