Counts the number of events in the repository, or streaming through the function. The result is put in a field named, _count. You can use this field name to pipe the results to other query functions or general use.
It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _count | The name of the output field. |
distinct | boolean | optional[a] | When specified, counts only distinct values. When this parameter is set to true , LogScale always uses an estimate, which may give an inexact result as the value. | |
field [b] | string | optional[a] | The field for which only events are counted. | |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale Syntaxcount("value")
and:
logscale Syntaxcount(field="value")
These examples show basic structure only.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.
While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:
Distinct Values | Result of distinct count | Deviation percentage |
---|---|---|
10 | 10 | 0 |
100 | 100 | 0 |
1000 | 995 | -0.005025125628 |
10000 | 10039 | 0.003884849089 |
100000 | 100917 | 0.009086675189 |
1000000 | 984780 | -0.01545522858 |
10000000 | 10121302 | 0.01198482172 |
Important
For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.
More typically, values used for aggregations or counts for distinct values will have low cardinality (for example, a small number of distinct values against the overall set).
count()
Examples
Below are several examples using the count()
function. Some are simple and others are more complex, with functions
embedded within others.
Click
next to an example below to get the full details.
Aggregate Status Codes by count()
per Minute
Query
bucket(1min, field=status_code, function=count())
Introduction
Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.
Step-by-Step
Starting with the source repository events.
- logscale
bucket(1min, field=status_code, function=count())
Sets the bucket interval to 1 minute, aggregating the count of the field status_code.
Event Result set.
Summary and Results
Bucketing allows for data to be collected according to a time
range. Using the right aggregation function to quantify the value
groups that information into the buckets suitable for graphing for
example with a Bar Chart
, with the
size of the bar using the declared function result,
count()
in this example.
Aggregate Status Codes by count()
Per Minute
Time series aggregate status codes by count()
per minute into buckets
Query
bucket(1min, field=status_code, function=count())
Introduction
Bucketing is a powerful technique for optimizing data storage and
query performance. Bucketing allows for data to be collected
according to a time range, dividing large datasets into manageable
parts, thereby making it easier to quickly find specific events.
In this example, the bucket()
function is
used with count()
to count different HTTP
status codes over time and bucket them into time intervals of 1
minute.
Step-by-Step
Starting with the source repository events.
- logscale
bucket(1min, field=status_code, function=count())
Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice that we group by two fields: status_code field and the implicit field _bucket.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query
performance. Bucketing allows for data to be collected according
to a time range. Using the right aggregation function to
quantify the value groups that information into the buckets
suitable for graphing for example with a Bar
Chart
, with the size of the bar using the
declared function result, count()
in this
example.
Alert Query for Parsers Issues
Reporting errors
Query
#type=humio #kind=logs| loglevel=WARN| class = c.h.d.ParserLimitingJob| "Setting reject ingest for"| groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )| timeDiff:=_max-_min| timeDiff > 300000 and _count > 10
Introduction
This alert query tries to balance reacting when there are problems with parsers, without being too restrictive.
Step-by-Step
Starting with the source repository events.
- logscale
#type=humio #kind=logs
Filters on all logs across all hosts in the cluster.
- logscale
| loglevel=WARN
Filters for all events where the loglevel is equal to
WARN
. - logscale
| class = c.h.d.ParserLimitingJob
Assigns the value
c.h.d.ParserLimitingJob
to the class for the logs having the loglevel valueWARN
. - logscale
| "Setting reject ingest for"
Filters for events containing the string
Setting reject ingest for
.This is the error message generated when ingested events are rejected. - logscale
| groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )
Groups the returned result by the field id, makes a count on the events and returns the minimum timestamp and maximum timestamp. This returns a new event set, with the fields id, _count, _min, and _max.
- logscale
| timeDiff:=_max-_min
Calculates the time difference between the maximum timestamp values and the minimum timestamp values and returns the result in a new field named timeDiff.
- logscale
| timeDiff > 300000 and _count > 10
Returns all events where the values of timeDiff is greater that
300000
and where there are more than10
occurrences. Event Result set.
Summary and Results
This query is used to set up alerts for parsers issues. Setting up alerts for parsers issues will allow to proactively reach out to customers where their queries are being throttled and help them.
Bucket Events Summarized by count()
Query
bucket(function=count())
Introduction
Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:
Step-by-Step
Starting with the source repository events.
- logscale
bucket(function=count())
Summarizes events using the
count()
into buckets across the selected timespan. Event Result set.
Summary and Results
This query organizes data into buckets according to the count of events.
Calculate a Percentage of Successful Status Codes Over Time
Query
| success := if(status >= 500, then=0, else=1)| timechart(series=customer,function=
[
{
[sum(success,as=success),count(as=total)]| pct_successful := (success/total)*100| drop([success,total])}],span=15m,limit=100)
Introduction
Calculate a percentage of successful status codes inside the
timeChart()
function field.
Step-by-Step
Starting with the source repository events.
- logscale
| success := if(status >= 500, then=0, else=1)
Adds a success field at the following conditions:
If the value of field status is greater than or equal to
500
, set the value of success to0
, otherwise to1
.
- logscale
| timechart(series=customer,function= [ { [sum(success,as=success),count(as=total)]
Creates a new timechart, generating a new series, customer that uses a compound function. In this example, the embedded function is generating an array of values, but the array values are generated by an embedded aggregate. The embedded aggregate (defined using the
{}
syntax), creates asum()
andcount()
value across the events grouped by the value of success field generated from the filter query. This is counting the1
1 or0
generated by theif()
function; counting all the values and adding up the ones for successful values. These values will be assigned to the success and total fields. Note that at this point we are still within the aggregate, so the two new fields are within the context of the aggregate, with each field being created for a corresponding success value. - logscale
| pct_successful := (success/total)*100
Calculates the percentage that are successful. We are still within the aggregate, so the output of this process will be an embedded set of events with the total and success values grouped by each original HTTP response code.
- logscale
| drop([success,total])}],span=15m,limit=100)
Still within the embedded aggregate, drop the total and success fields from the array generated by the aggregate. These fields were temporary to calculate the percentage of successful results, but are not needed in the array for generating the result set. Then, set a span for the buckets for the events of 15 minutes and limit to 100 results overall.
Event Result set.
Summary and Results
This query shows how an embedded aggregate can be used to generate a sequence of values that can be formatted (in this case to calculate percentages) and generate a new event series for the aggregate values.
Count Events per Repository
Count of the events received by repository
Query
bucket(span=1d,field=#repo,function=count())| @timestamp:=_bucket| drop(_bucket)
Introduction
Count of X events received by a repo (Cloud).
Step-by-Step
Starting with the source repository events.
- logscale
bucket(span=1d,field=#repo,function=count())
- logscale
| @timestamp:=_bucket
Updates the timestamp to the value generated by the
bucket()
- logscale
| drop(_bucket)
Discards the _bucket field from the results.
Event Result set.
Summary and Results
The query can be run on each repo. Or, create a view that looks across multiple repos and then run it from there to get all the repo counts in one search.
Count Total of Malware and Nonmalware Events
Count total of malware and nonmalvare events in percentage
Query
[count(malware, as=_malware), count(nonmalware, as=_nonmalware)]| total := _malware + _nonmalware| nonmalware_pct_total := (_nonmalware/total)*100| malware_pct_total := (_malware/total)*100
Introduction
It is possibe to use the count()
function to
show the count in percentage of two fields against total. In this
example, the function count()
function is
used to count the field
malware and the field
nonmalware and have the
results returned in percentage. A result set could, for example,
be normalware 30%% and nonmalware 70%%.
Step-by-Step
Starting with the source repository events.
- logscale
[count(malware, as=_malware), count(nonmalware, as=_nonmalware)]
Returns the counted results of the field malware in a field named _malware and the counted results of the field nonmalware in a field named _nonmalware.
- logscale
| total := _malware + _nonmalware
Assigns the total of these events to a new field named total.
- logscale
| nonmalware_pct_total := (_nonmalware/total)*100 | malware_pct_total := (_malware/total)*100
Calculates the _malware and _nonmalware as a percentage of the total.
Event Result set.
Summary and Results
The query is used to get an overview of the total number of malware versus nonmalvare.
Create Time Chart Widget for Different Events
Query
timeChart(span=1h, function=count(), series=method)
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart()
function is used to create time
chart widgets, in this example a timechart that shows the number
of the different events per hour over the last 24 hours. For
example, you may want to count different kinds of HTTP methods
used for requests in the logs. If those are stored in a field
named method, you can use
this field as a series
. Furthermore, we select to
search over the last 24 hours in the time selector in the UI, and
also add a function to make each time bucket one hour long
(withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count(), series=method)
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on the series field to return the number of each different kinds of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
different kinds of events per hour over the last 24 hours. In
this example we do not just have one group of events per time
bucket, but multiple groups: one group for every value of
method that exists in
the timespan we are searching in. So if we are still searching
over a 24 hour period, and we have received only
GET
, PUT
, and
POST
requests in that timespan, we will
get three groups of events per bucket (because we have three
different values for
method) Therefore, we
end up with 72 groups of events. And every group contains only
events which correspond to some time bucket and a specific value
of method. Then
count()
is run on each of these groups, to
give us the number of GET
events per
hour, PUT
events per hour, and
POST
events per hour. When viewing and
hovering over the buckets within the time chart, the display
will show the precise value and time for the displayed bucket,
with the time showing the point where the bucket starts.
Create Timechart Widget for All Events
Query
timeChart(span=1h, function=count())
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart()
function is used to create
timechart widgets, in this example a timechart that shows the
number of events per hour over the last 24 hours. We do this by
selecting to search over the last 24 hours in the time selector in
the UI, and then we tell the function to make each time bucket one
hour long (withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count())
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on each group, giving us the number of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of events per hour over the last 24 hours. The timechart shows one group of events per time bucket. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.
Count All Events
This a simple example using the count()
function.
The query just counts the number of events found in the repository for
the period of time selected:
count()
The result is just a single number, the total count.
_count |
---|
3886817 |
To format adding a thousands separator:
count()
| format("%,i", field=_count, as=_count)
Produces
_count | |
---|---|
3 | 886,817 |
Group & Count
In this example, the query uses the count()
function within the groupBy()
function. The first
parameter given is the field upon which to group the data. In this
case, it's the HTTP method (for example, GET
,
PUT
, POST
). The
second parameter says to use the function count()
to count the number occurrences for each method found.
groupby(field=method, function=count())
The result is a table with the column headings, method and _count, with the values for each:
method | _count |
---|---|
DELETE | 7375 |
GET | 153493 |
POST | 31654 |
Chart of Daily Counts
Figure 111. count()
Chart of Daily Counts
You can use the count()
function in conjunction
with the timeChart()
function to count the number
occurrences of events or other factors. By default, the
timeChart()
function will aggregate the data by
day. The results will look something like what you see in the
screenshot shown in
Figure 111, “count()
Chart of Daily Counts”.
timechart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with a status.
For instance, the status code 200
is
returned when the request is successful, and
404
when the page is not found. To get
a list of status codes returned and a count of each for a given
period, you would enter the following query in the
Search box:
groupby(field=status, function=count())
The sample output is shown below:
status | _count |
---|---|
101 | 9 |
200 | 55258 |
204 | 137834 |
307 | 2 |
400 | 2 |
401 | 4 |
403 | 57 |
404 | 265 |
504 | 62 |
stopping | 6 |
success | 6 |