Counts the number of events in the repository, or streaming through the function. You can use this field name to pipe the results to other query functions or general use.
It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _count | The name of the output field. |
distinct | boolean | optional[a] | When specified, counts only distinct values. When this parameter is set to true , LogScale always uses an estimate, which may give an inexact result as the value. | |
field [b] | string | optional[a] | The field for which only events are counted. | |
[a] Optional parameters use their default value unless explicitly set. [b] The parameter name |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale Syntaxcount("value")
and:
logscale Syntaxcount(field="value")
These examples show basic structure only.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.
While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:
Distinct Values | Result of distinct count | Deviation percentage |
---|---|---|
10 | 10 | 0 |
100 | 100 | 0 |
1000 | 995 | -0.005025125628 |
10000 | 10039 | 0.003884849089 |
100000 | 100917 | 0.009086675189 |
1000000 | 984780 | -0.01545522858 |
10000000 | 10121302 | 0.01198482172 |
Important
For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.
More typically, values used for aggregations or counts for distinct values will have low cardinality (for example, a small number of distinct values against the overall set).
count()
Examples
Below are several examples using the
count()
function. Some are simple and
others are more complex, with functions embedded within
others.
Click
next to an example below to get the full details.
Aggregate Status Codes by count()
per Minute
Query
bucket(1min, field=status_code, function=count())
Introduction
Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.
Step-by-Step
Starting with the source repository events.
- logscale
bucket(1min, field=status_code, function=count())
Sets the bucket interval to 1 minute, aggregating the count of the field status_code.
Event Result set.
Summary and Results
Bucketing allows for data to be collected according to a time
range. Using the right aggregation function to quantify the value
groups that information into the buckets suitable for graphing for
example with a Bar Chart
, with the
size of the bar using the declared function result,
count()
in this example.
Aggregate Status Codes by count()
Per Minute
Time series aggregate status codes by count()
per minute into buckets
Query
bucket(1min, field=status_code, function=count())
Introduction
Bucketing is a powerful technique for optimizing data storage and
query performance. Bucketing allows for data to be collected
according to a time range, dividing large datasets into manageable
parts, thereby making it easier to quickly find specific events.
In this example, the bucket()
function is
used with count()
to count different HTTP
status codes over time and bucket them into time intervals of 1
minute.
Step-by-Step
Starting with the source repository events.
- logscale
bucket(1min, field=status_code, function=count())
Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice that we group by two fields: status_code field and the implicit field _bucket.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query
performance. Bucketing allows for data to be collected according
to a time range. Using the right aggregation function to
quantify the value groups that information into the buckets
suitable for graphing for example with a Bar
Chart
, with the size of the bar using the
declared function result, count()
in this
example.
Alert Query for Parsers Issues
Reporting errors
Query
#type=humio #kind=logs
| loglevel=WARN
| class = c.h.d.ParserLimitingJob
| "Setting reject ingest for"
| groupBy(id, function=[count(), min(@timestamp), max(@timestamp)] )
| timeDiff:=_max-_min
| timeDiff > 300000 and _count > 10
Introduction
This alert query tries to balance reacting when there are problems with parsers, without being too restrictive.
Step-by-Step
Starting with the source repository events.
- logscale
#type=humio #kind=logs
Filters on all logs across all hosts in the cluster.
- logscale
| loglevel=WARN
Filters for all events where the loglevel is equal to
WARN
. - logscale
| class = c.h.d.ParserLimitingJob
Assigns the value
c.h.d.ParserLimitingJob
to the class for the logs having the loglevel valueWARN
. - logscale
| "Setting reject ingest for"
Filters for events containing the string
Setting reject ingest for
.This is the error message generated when ingested events are rejected. - logscale
| groupBy(id, function=[count(), min(@timestamp), max(@timestamp)] )
Groups the returned result by the field id, makes a count on the events and returns the minimum timestamp and maximum timestamp. This returns a new event set, with the fields id, _count, _min, and _max.
- logscale
| timeDiff:=_max-_min
Calculates the time difference between the maximum timestamp values and the minimum timestamp values and returns the result in a new field named timeDiff.
- logscale
| timeDiff > 300000 and _count > 10
Returns all events where the values of timeDiff is greater that
300000
and where there are more than10
occurrences. Event Result set.
Summary and Results
This query is used to set up alerts for parsers issues. Setting up alerts for parsers issues will allow to proactively reach out to customers where their queries are being throttled and help them.
Bucket Events Summarized by count()
Query
bucket(function=count())
Introduction
Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:
Step-by-Step
Starting with the source repository events.
- logscale
bucket(function=count())
Summarizes events using the
count()
into buckets across the selected timespan. Event Result set.
Summary and Results
This query organizes data into buckets according to the count of events.
Calculate a Percentage of Successful Status Codes Over Time
Query
| success := if(status >= 500, then=0, else=1)
| timeChart(series=customer,function=
[
{
[sum(success,as=success),count(as=total)]
| pct_successful := (success/total)*100
| drop([success,total])}],span=15m,limit=100)
Introduction
Calculate a percentage of successful status codes inside the
timeChart()
function field.
Step-by-Step
Starting with the source repository events.
- logscale
| success := if(status >= 500, then=0, else=1)
Adds a success field at the following conditions:
If the value of field status is greater than or equal to
500
, set the value of success to0
, otherwise to1
.
- logscale
| timeChart(series=customer,function= [ { [sum(success,as=success),count(as=total)]
Creates a new timechart, generating a new series, customer that uses a compound function. In this example, the embedded function is generating an array of values, but the array values are generated by an embedded aggregate. The embedded aggregate (defined using the
{}
syntax), creates asum()
andcount()
value across the events grouped by the value of success field generated from the filter query. This is counting the1
1 or0
generated by theif()
function; counting all the values and adding up the ones for successful values. These values will be assigned to the success and total fields. Note that at this point we are still within the aggregate, so the two new fields are within the context of the aggregate, with each field being created for a corresponding success value. - logscale
| pct_successful := (success/total)*100
Calculates the percentage that are successful. We are still within the aggregate, so the output of this process will be an embedded set of events with the total and success values grouped by each original HTTP response code.
- logscale
| drop([success,total])}],span=15m,limit=100)
Still within the embedded aggregate, drop the total and success fields from the array generated by the aggregate. These fields were temporary to calculate the percentage of successful results, but are not needed in the array for generating the result set. Then, set a span for the buckets for the events of 15 minutes and limit to 100 results overall.
Event Result set.
Summary and Results
This query shows how an embedded aggregate can be used to generate a sequence of values that can be formatted (in this case to calculate percentages) and generate a new event series for the aggregate values.
Collect and Group Events by Specified Field - Example 2
Collect and group events by specified field using collect()
as part of a groupBy()
operation
Query
LocalAddressIP4 = * RemoteAddressIP4 = * aip = *
| groupBy([LocalAddressIP4, RemoteAddressIP4], function=([count(aip, as=aipCount, distinct=true), collect([aip])]))
Introduction
The collect()
function can be used to collect
fields from multiple events into one event as part of a
groupBy()
operation. The
groupBy()
function is used to group together
events by one or more specified fields. It is used to extract
additional aggregations from the data and then add calculation to
it using the count()
function. In this
example, the collect()
function is used to
collect fields from multiple events.
Step-by-Step
Starting with the source repository events.
- logscale
LocalAddressIP4 = * RemoteAddressIP4 = * aip = *
Filters for all events where the fields LocalAddressIP4, RemoteAddressIP4 and aip are all present. The actual values in these fields do not matter; the query just checks for their existence.
- logscale
| groupBy([LocalAddressIP4, RemoteAddressIP4], function=([count(aip, as=aipCount, distinct=true), collect([aip])]))
Groups the returned results in arrays named LocalAddressIP4 and RemoteAddressIP4, collects all the AIPs (Adaptive Internet Protocol) into an array and performs a count on the field aip. The count of the AIP values is returned in a new field named aipCount.
Event Result set.
Summary and Results
The query is used to collect fields from multiple events into
one event. Collecting should be used on smaller data sets to
create a list (or set, or map, or whatever) when you actually
need a list object explicitly (for example, in order to pass it
on to some other API). Using collect()
on
larger data set may cause out of memory as it returns the entire
data set. The query is useful for network connection analysis
and for identifying potential threats.
Sample output might look like this:
LocalAddressIP4 | RemoteAddressIP4 | aipCount | aip |
---|---|---|---|
192.168.1.100 | 203.0.113.50 | 3 | [10.0.0.1, 10.0.0.2, 10.0.0.3] |
10.0.0.5 | 198.51.100.75 | 1 | [172.16.0.1] |
172.16.0.10 | 8.8.8.8 | 5 | [192.0.2.1, 192.0.2.2, 192.0.2.3, 192.0.2.4, 192.0.2.5] |
Count Events per Repository
Count of the events received by repository
Query
bucket(span=1d,field=#repo,function=count())
| @timestamp:=_bucket
| drop(_bucket)
Introduction
Count of X events received by a repo (Cloud).
Step-by-Step
Starting with the source repository events.
- logscale
bucket(span=1d,field=#repo,function=count())
- logscale
| @timestamp:=_bucket
Updates the timestamp to the value generated by the
bucket()
- logscale
| drop(_bucket)
Discards the _bucket field from the results.
Event Result set.
Summary and Results
The query can be run on each repo. Or, create a view that looks across multiple repos and then run it from there to get all the repo counts in one search.
Count Total of Malware and Nonmalware Events
Count total of malware and nonmalvare events in percentage
Query
[count(malware, as=_malware), count(nonmalware, as=_nonmalware)]
| total := _malware + _nonmalware
| nonmalware_pct_total := (_nonmalware/total)*100
| malware_pct_total := (_malware/total)*100
Introduction
It is possibe to use the count()
function to
show the count in percentage of two fields against total. In this
example, the function count()
function is
used to count the field
malware and the field
nonmalware and have the
results returned in percentage. A result set could, for example,
be normalware 30%% and nonmalware 70%%.
Step-by-Step
Starting with the source repository events.
- logscale
[count(malware, as=_malware), count(nonmalware, as=_nonmalware)]
Returns the counted results of the field malware in a field named _malware and the counted results of the field nonmalware in a field named _nonmalware.
- logscale
| total := _malware + _nonmalware
Assigns the total of these events to a new field named total.
- logscale
| nonmalware_pct_total := (_nonmalware/total)*100 | malware_pct_total := (_malware/total)*100
Calculates the _malware and _nonmalware as a percentage of the total.
Event Result set.
Summary and Results
The query is used to get an overview of the total number of malware versus nonmalvare.
Create Time Chart Widget for Different Events
Query
timeChart(span=1h, function=count(), series=method)
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart()
function is used to create time
chart widgets, in this example a timechart that shows the number
of the different events per hour over the last 24 hours. For
example, you may want to count different kinds of HTTP methods
used for requests in the logs. If those are stored in a field
named method, you can use
this field as a series
. Furthermore, we select to
search over the last 24 hours in the time selector in the UI, and
also add a function to make each time bucket one hour long
(withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count(), series=method)
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on the series field to return the number of each different kinds of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
different kinds of events per hour over the last 24 hours. In
this example we do not just have one group of events per time
bucket, but multiple groups: one group for every value of
method that exists in
the timespan we are searching in. So if we are still searching
over a 24 hour period, and we have received only
GET
, PUT
, and
POST
requests in that timespan, we will
get three groups of events per bucket (because we have three
different values for
method) Therefore, we
end up with 72 groups of events. And every group contains only
events which correspond to some time bucket and a specific value
of method. Then
count()
is run on each of these groups, to
give us the number of GET
events per
hour, PUT
events per hour, and
POST
events per hour. When viewing and
hovering over the buckets within the time chart, the display
will show the precise value and time for the displayed bucket,
with the time showing the point where the bucket starts.
Create Timechart Widget for All Events
Query
timeChart(span=1h, function=count())
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart()
function is used to create
timechart widgets, in this example a timechart that shows the
number of events per hour over the last 24 hours. We do this by
selecting to search over the last 24 hours in the time selector in
the UI, and then we tell the function to make each time bucket one
hour long (withspan=1hour
).
Step-by-Step
Starting with the source repository events.
- logscale
timeChart(span=1h, function=count())
Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the
count()
function is run on each group, giving us the number of events per hour. Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of events per hour over the last 24 hours. The timechart shows one group of events per time bucket. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.
Get List of Status Codes
Get list of status codes returned and a count of each for a given period using the groupBy()
function with count()
Query
groupBy(field=status, function=count())
Introduction
The groupBy()
function is used to group
together events by one or more specified fields. It is used to
extract additional aggregations from the data and then add
calculation to it using the count()
function.
In this example, the groupBy()
function is
used to get a list of status codes for logged events. For
instance, the status code 200
is
returned when the request is successful, and
404
when the page is not found.
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(field=status, function=count())
Groups events by the status field, and counts the number of events in each group.
It is possible to enhance the query for more detailed analysis. The following query example groups by both the fields status and source, limits to 1000 results, and sorts by count descending.
groupBy([field=status, field=source], function=count(), limit=1000) | sort(_count, order=desc)
Event Result set.
Summary and Results
The query is used to extract a list of status codes, each with a count of how many events have that status. The query is useful for summarizing and analyzing log data.
Sample output from the incoming example data:
status | _count |
---|---|
101 | 17 |
200 | 46183 |
204 | 3 |
307 | 1 |
400 | 2893 |
401 | 4 |
Failure | 1 |
Success | 8633 |
Count All Events
This a simple example using the
count()
function. The query just
counts the number of events found in the repository for
the period of time selected:
count()
The result is just a single number, the total count.
_count |
---|
3886817 |
To format adding a thousands separator:
count()
| format("%,i", field=_count, as=_count)
Produces
_count | |
---|---|
3 | 886,817 |
Group & Count
In this example, the query uses the
count()
function within the
groupBy()
function. The first
parameter given is the field upon which to group the data.
In this case, it's the HTTP method (for example,
GET
,
PUT
,
POST
). The second parameter says
to use the function count()
to count
the number occurrences for each method found.
groupBy(field=method, function=count())
The result is a table with the column headings, method and _count, with the values for each:
method | _count |
---|---|
DELETE | 7375 |
GET | 153493 |
POST | 31654 |
Chart of Daily Counts
![]() |
Figure 187. count()
Chart of Daily Counts
You can use the count()
function in
conjunction with the timeChart()
function to count the number occurrences of events or
other factors. By default, the
timeChart()
function will aggregate
the data by day. The results will look something like what
you see in the screenshot shown in
Figure 187, “count()
Chart of Daily Counts”.
timeChart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with
a status. For instance, the status code
200
is returned when the
request is successful, and
404
when the page is not
found. To get a list of status codes returned and a count
of each for a given period, you would enter the following
query in the
Search box:
groupBy(field=status, function=count())
The sample output is shown below:
status | _count |
---|---|
101 | 9 |
200 | 55258 |
204 | 137834 |
307 | 2 |
400 | 2 |
401 | 4 |
403 | 57 |
404 | 265 |
504 | 62 |
stopping | 6 |
success | 6 |