Counts the number of events in the repository, or streaming through the function. The result is put in a field named, _count. You can use this field name to pipe the results to other query functions or general use.
It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
as | string | optional[a] | _count | The name of the output field. |
distinct | boolean | optional[a] | When specified, counts only distinct values. When this parameter is set to true , LogScale always uses an estimate, which may give an inexact result as the value. | |
field [b] | string | optional[a] | The field for which only events are counted. | |
[a] Optional parameters use their default value unless explicitly set |
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscalecount("field")
and:
logscalecount(field="field")
These examples show basic structure only; full examples are provided below.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.
While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:
Distinct Values | Result of distinct count | Deviation percentage |
---|---|---|
10 | 10 | 0 |
100 | 100 | 0 |
1000 | 995 | -0.005025125628 |
10000 | 10039 | 0.003884849089 |
100000 | 100917 | 0.009086675189 |
1000000 | 984780 | -0.01545522858 |
10000000 | 10121302 | 0.01198482172 |
Important
For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.
More typically, values used for aggregations or counts for distinct values will have low cardinality (i.e. a small number of distinct values against the overall set).
count()
Examples
Below are several examples using the count()
function. Some are simple and others are more complex, with functions
embedded within others.
Count All Events
This a simple example using the count()
function.
The query just counts the number of events found in the repository for
the period of time selected:
count()
The result is just a single number, the total count.
_count |
---|
3886817 |
To format adding a thousands separator:
count() | format("%,i", field=_count, as=_count)
Produces
_count | ||
---|---|---|
3 | 886 | 817 |
Group & Count
In this example, the query uses the count()
function within the groupBy()
function. The first
parameter given is the field upon which to group the data. In this
case, it's the HTTP method (e.g., GET
,
PUT
, POST
). The
second parameter says to use the function count()
to count the number occurrences for each method found.
groupby(field=method, function=count())
The result is a table with the column headings, method and _count, with the values for each:
method | _count |
---|---|
DELETE | 7375 |
GET | 153493 |
POST | 31654 |
Chart of Daily Counts
Figure 189. count()
Chart of Daily Counts
You can use the count()
function in conjunction
with the timeChart()
function to count the number
occurrences of events or other factors. By default, the
timeChart()
function will aggregate the data by
day. The results will look something like what you see in the
screenshot shown in
Figure 189, “count()
Chart of Daily Counts”.
timechart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with a status.
For instance, the status code 200
is
returned when the request is successful, and
404
when the page is not found. To get
a list of status codes returned and a count of each for a given
period, you would enter the following query in the
Search box:
groupby(field=status, function=count())
The sample output is shown below:
status | _count |
---|---|
101 | 9 |
200 | 55258 |
204 | 137834 |
307 | 2 |
400 | 2 |
401 | 4 |
403 | 57 |
404 | 265 |
504 | 62 |
stopping | 6 |
success | 6 |