Counts the number of events in the repository, or streaming through the function. The result is put in a field named, _count. You can use this field name to pipe the results to other query functions or general use.
It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _count | The name of the output field. |
distinct | boolean | optional[a] | When specified, counts only distinct values. When this parameter is set to true , LogScale always uses an estimate, which may give an inexact result as the value. | |
field [b] | string | optional[a] | The field for which only events are counted. | |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscalecount("field")
and:
logscalecount(field="field")
These examples show basic structure only.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.
While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:
Distinct Values | Result of distinct count | Deviation percentage |
---|---|---|
10 | 10 | 0 |
100 | 100 | 0 |
1000 | 995 | -0.005025125628 |
10000 | 10039 | 0.003884849089 |
100000 | 100917 | 0.009086675189 |
1000000 | 984780 | -0.01545522858 |
10000000 | 10121302 | 0.01198482172 |
Important
For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.
More typically, values used for aggregations or counts for distinct values will have low cardinality (i.e. a small number of distinct values against the overall set).
count()
Examples
Below are several examples using the count()
function. Some are simple and others are more complex, with functions
embedded within others.
Aggregating Status Codes by count()
per minute
Query
bucket(1min, field=status_code, function=count())
Introduction
Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.
Step-by-Step
Starting with the source repository events
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
bucket(1min, field=status_code, function=count())
Set the bucket interval to 1 minute, aggregating the count of the field status_code.
Event Result set
Summary and Results
Bucketing allows for data to be collected according to a time
range. Using the right aggregation function to quantify the value
groups that information into the buckets suitable for graphing for
example with a Bar Chart
, with the
size of the bar using the declared function result,
count()
in this example.
Alert Query for Parsers Issues
Query
#type=humio #kind=logs
| loglevel=WARN
| class = c.h.d.ParserLimitingJob
| "Setting reject ingest for"
| groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )
| timeDiff:=_max-_min
| timeDiff > 300000 and _count > 10
Introduction
This alert query tries to balance reacting when there are problems with parsers, without being too sensitive.
Step-by-Step
Starting with the source repository events
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
#type=humio #kind=logs
some description of the query
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| loglevel=WARN
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 2 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| class = c.h.d.ParserLimitingJob
some description of the query
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 3 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| "Setting reject ingest for"
some description of the query
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 4 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )
some description of the query
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 5 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| timeDiff:=_max-_min
some description of the query
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[/Filter/] 1[/Filter/] 2[/Filter/] 3[/Filter/] 4{{Aggregate}} 5["Expression"] 6["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> 3 3 --> 4 4 --> 5 5 --> 6 6 --> result style 6 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
| timeDiff > 300000 and _count > 10
// 300000ms = 5m (5 * 60 * 1000) some description of the query
Event Result set
Summary and Results
Setting up alerts for this will allow to proactively reach out to customers that are being throttled and help them.
Bucket Events summarized by count()
Query
bucket(function=count())
Introduction
Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:
Step-by-Step
Starting with the source repository events
- flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0{{Aggregate}} result{{Result Set}} repo --> 0 0 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;logscale
bucket(function=count())
Summarize events using the
count()
into buckets across the selected timespan. Event Result set
Summary and Results
This query organizes data into buckets according to the count of events.
Count All Events
This a simple example using the count()
function.
The query just counts the number of events found in the repository for
the period of time selected:
count()
The result is just a single number, the total count.
_count |
---|
3886817 |
To format adding a thousands separator:
count() | format("%,i", field=_count, as=_count)
Produces
_count | ||
---|---|---|
3 | 886 | 817 |
Group & Count
In this example, the query uses the count()
function within the groupBy()
function. The first
parameter given is the field upon which to group the data. In this
case, it's the HTTP method (e.g., GET
,
PUT
, POST
). The
second parameter says to use the function count()
to count the number occurrences for each method found.
groupby(field=method, function=count())
The result is a table with the column headings, method and _count, with the values for each:
method | _count |
---|---|
DELETE | 7375 |
GET | 153493 |
POST | 31654 |
Chart of Daily Counts
Figure 192. count()
Chart of Daily Counts
You can use the count()
function in conjunction
with the timeChart()
function to count the number
occurrences of events or other factors. By default, the
timeChart()
function will aggregate the data by
day. The results will look something like what you see in the
screenshot shown in
Figure 192, “count()
Chart of Daily Counts”.
timechart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with a status.
For instance, the status code 200
is
returned when the request is successful, and
404
when the page is not found. To get
a list of status codes returned and a count of each for a given
period, you would enter the following query in the
Search box:
groupby(field=status, function=count())
The sample output is shown below:
status | _count |
---|---|
101 | 9 |
200 | 55258 |
204 | 137834 |
307 | 2 |
400 | 2 |
401 | 4 |
403 | 57 |
404 | 265 |
504 | 62 |
stopping | 6 |
success | 6 |