Counts the number of events in the repository, or streaming through the function. The result is put in a field named, _count. You can use this field name to pipe the results to other query functions or general use.

It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.

ParameterTypeRequiredDefaultDescription
asstringoptional[a]_count The name of the output field.
distinctbooleanoptional[a]  When specified, counts only distinct values. When this parameter is set to true, LogScale always uses an estimate, which may give an inexact result as the value.
field[b]stringoptional[a]  The field for which only events are counted.

[a] Optional parameters use their default value unless explicitly set

[b] The argument name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

Accuracy When Counting Distinct Values

When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.

While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:

Distinct ValuesResult of distinct countDeviation percentage
10100
1001000
1000995-0.005025125628
10000100390.003884849089
1000001009170.009086675189
1000000984780-0.01545522858
10000000101213020.01198482172

Important

For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.

More typically, values used for aggregations or counts for distinct values will have low cardinality (i.e. a small number of distinct values against the overall set).

count() Examples

Below are several examples using the count() function. Some are simple and others are more complex, with functions embedded within others.

Count All Events

This a simple example using the count() function. The query just counts the number of events found in the repository for the period of time selected:

logscale
count()

The result is just a single number, the total count.

_count
3886817

To format adding a thousands separator:

logscale
count() | format("%,i", field=_count, as=_count)

Produces

_count  
3886817

Group & Count

In this example, the query uses the count() function within the groupBy() function. The first parameter given is the field upon which to group the data. In this case, it's the HTTP method (e.g., GET, PUT, POST). The second parameter says to use the function count() to count the number occurrences for each method found.

logscale
groupby(field=method, function=count())

The result is a table with the column headings, method and _count, with the values for each:

method_count
DELETE7375
GET153493
POST31654

Chart of Daily Counts

count() Chart of Daily Counts

Figure 110. count() Chart of Daily Counts


You can use the count() function in conjunction with the timeChart() function to count the number occurrences of events or other factors. By default, the timeChart() function will aggregate the data by day. The results will look something like what you see in the screenshot shown in Figure 110, “count() Chart of Daily Counts”.

logscale
timechart(function=count())

Table of Daily Counts

When a user accesses a web site, the event is logged with a status. For instance, the status code 200 is returned when the request is successful, and 404 when the page is not found. To get a list of status codes returned and a count of each for a given period, you would enter the following query in the Search box:

logscale
groupby(field=status, function=count())

The sample output is shown below:

status_count
1019
20055258
204137834
3072
4002
4014
40357
404265
50462
stopping6
success6