Counts the number of events in the repository, or streaming
through the function. You can use this field name to pipe the
results to other query functions or general use.
It's possible to specify a field and only events containing that
field are counted. It's also possible to do a distinct count.
When having many distinct values LogScale will not try
to keep them all in memory. An estimate is then used, so the
result will not be a precise match.
When specified, counts only distinct values. When this parameter is set to true, LogScale always uses an estimate, which may give an inexact result as the value.
Hide omitted argument names for this functionShow omitted argument names for this function
Omitted Argument Names
The argument name for field can be omitted; the following forms of this function are equivalent:
logscale Syntax
count("value")
and:
logscale Syntax
count(field="value")
These examples show basic structure only.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly
when there are repeated elements in a limited memory
environment, limitations exist in the accuracy of the count to
avoid consuming too much memory in the process. For example,
if counting 1,000,000 (million) events. If each event contains
a different value, then memory is required to store the count
for each of those million entries. Even if the field is only
10 bytes long, that is approximate 9MB of memory required to
store the state. In LogScale, this affects the limits
as outlined in
State Sizes and Limits. As noted in
that section, LogScale uses an estimation algorithm
that produces an estimate of the number of distinct values
while keeping the memory usage to a minimum.
While the algorithm in question doesn't give any guarantees on
the relative error of the reported result, the typical
accuracy (standard error) is less than 2%, with 2/3s of all
results being within 1%, tests with up to 10^7 distinct
values, the result at worst deviated by less than 0.02%. The
worst results for each test can be seen in the table below:
Distinct Values
Result of distinct count
Deviation percentage
10
10
0
100
100
0
1000
995
-0.005025125628
10000
10039
0.003884849089
100000
100917
0.009086675189
1000000
984780
-0.01545522858
10000000
10121302
0.01198482172
Important
For less than 100 distinct values, the deviation percentage
will be exacerbated. For example, if there are only 10
distinct values, a deviation of 1 is 10%, even though it is
the smallest possible deviation from the actual number of
distinct values.
More typically, values used for aggregations or counts for
distinct values will have low cardinality (for example, a
small number of distinct values against the overall set).
Counts different HTTP status codes over time and buckets them into
time intervals of 1 minute. Notice we group by two fields:
status code and the
implicit field _bucket.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Sets the bucket interval to 1 minute, aggregating the count of the field
status_code.
Event Result set.
Summary and Results
Bucketing allows for data to be collected according to a time range. Using
the right aggregation function to quantify the value groups that
information into the buckets suitable for graphing for example with a
Bar Chart, with the size of the bar using
the declared function result, count() in this
example.
Time series aggregate status codes by count() per minute into buckets
Query
logscale
bucket(1min,field=status_code,function=count())
Introduction
In this example, the bucket() function is used with
count() to count different HTTP status codes over
time and bucket them into time intervals of 1 minute.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Counts different HTTP status codes over time and buckets them into time
intervals of 1 minute. Notice that we group by two fields:
status_code field and the
implicit field _bucket.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query performance.
Bucketing allows for data to be collected according to a time range.
Using the right aggregation function to quantify the value groups that
information into the buckets suitable for graphing for example with a
Bar Chart, with the size of the bar using
the declared function result, count() in this
example.
Groups the returned result by the field
id, makes a count on the events
and returns the minimum timestamp and maximum timestamp. This returns a
new event set, with the fields
id,
_count,
_min, and
_max.
logscale
|timeDiff:=_max-_min
Calculates the time difference between the maximum timestamp values and
the minimum timestamp values and returns the result in a new field named
timeDiff.
logscale
|timeDiff>300000and_count>10
Returns all events where the values of
timeDiff is greater that
300000 and where there are more
than 10 occurrences.
Event Result set.
Summary and Results
This query is used to set up alerts for parsers issues. Setting up
alerts for parsers issues will allow to proactively reach out to
customers where their queries are being throttled and help them.
Divides the search time interval into buckets. As time span is not
specified, the search interval is divided into 127 buckets. Events in
each bucket are counted:
Step-by-Step
Starting with the source repository events.
logscale
bucket(function=count())
Summarizes events using the count() into buckets
across the selected timespan.
Event Result set.
Summary and Results
This query organizes data into buckets according to the count of events.
Calculate a Percentage of Successful Status Codes Over Time
Creates a new timechart, generating a new series,
customer that uses a compound
function. In this example, the embedded function is generating an array
of values, but the array values are generated by an embedded aggregate.
The embedded aggregate (defined using the
{} syntax), creates a
sum() and count() value across
the events grouped by the value of
success field generated from the
filter query. This is counting the 11 or
0 generated by the
if() function; counting all the values and adding
up the ones for successful values. These values will be assigned to the
success and
total fields. Note that at this
point we are still within the aggregate, so the two new fields are
within the context of the aggregate, with each field being created for a
corresponding success value.
logscale
|pct_successful:=(success/total)*100
Calculates the percentage that are successful. We are still within the
aggregate, so the output of this process will be an embedded set of
events with the total and
success values grouped by each
original HTTP response code.
logscale
|drop([success,total])}],span=15m,limit=100)
Still within the embedded aggregate, drop the
total and
success fields from the array
generated by the aggregate. These fields were temporary to calculate the
percentage of successful results, but are not needed in the array for
generating the result set. Then, set a span for the buckets for the
events of 15 minutes and limit to 100 results overall.
Event Result set.
Summary and Results
This query shows how an embedded aggregate can be used to generate a
sequence of values that can be formatted (in this case to calculate
percentages) and generate a new event series for the aggregate values.
Collect and Group Events by Specified Field - Example 2
Collect and group events by specified field using collect() as part of a groupBy() operation
In this example, the collect() function is used to
collect fields from multiple events.
Step-by-Step
Starting with the source repository events.
logscale
LocalAddressIP4=*RemoteAddressIP4=*aip=*
Filters for all events where the fields
LocalAddressIP4,
RemoteAddressIP4 and
aip are all present. The actual
values in these fields do not matter; the query just checks for their
existence.
Groups the returned results in arrays named
LocalAddressIP4 and
RemoteAddressIP4, collects all
the AIPs (Adaptive Internet Protocol) into an array and performs a count
on the field aip. The count of
the AIP values is returned in a new field named
aipCount.
Event Result set.
Summary and Results
The query is used to collect fields from multiple events into one event.
Collecting should be used on smaller data sets to create a list (or set,
or map, or whatever) when you actually need a list object explicitly
(for example, in order to pass it on to some other API). Using
collect() on larger data set may cause out of
memory as it returns the entire data set. The query is useful for
network connection analysis and for identifying potential threats.
Buckets the values, using the field #repo using a
count()
logscale
|@timestamp:=_bucket
Updates the timestamp to the value generated by the
bucket()
logscale
|drop(_bucket)
Discards the _bucket field from
the results.
Event Result set.
Summary and Results
The query can be run on each repo. Or, create a view that looks across
multiple repos and then run it from there to get all the repo counts in
one search.
Count Total of Malware and Nonmalware Events
Count total of malware and nonmalvare events in percentage
It is possible to use the count() function to
show the count in percentage of two fields against total. In this
example, the function count() function is
used to count the field
malware and the field
nonmalware and have the
results returned in percentage. A result set could, for example,
be normalware 30% and nonmalware 70%.
Returns the counted results of the field
malware in a field named
_malware and the counted results
of the field nonmalware in a
field named _nonmalware.
logscale
|total:=_malware+_nonmalware
Assigns the total of these events to a new field named
total.
Calculates the _malware and
_nonmalware as a percentage of
the total.
Event Result set.
Summary and Results
The query is used to get an overview of the total number of malware
versus nonmalvare.
Create Time Chart Widget for Different Events
Query
logscale
timeChart(span=1h,function=count(),series=method)
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a
timeline. The timeChart() function is used to
create time chart widgets, in this example a timechart that shows
the number of the different events per hour over the last 24
hours. For example, you may want to count different kinds of HTTP
methods used for requests in the logs. If those are stored in a
field named method, you
can use this field as a series.
Furthermore, we select to search over the last 24 hours in the
time selector in the UI, and also add a function to make each time
bucket one hour long
(withspan=1hour).
Step-by-Step
Starting with the source repository events.
logscale
timeChart(span=1h,function=count(),series=method)
Creates 24 time buckets when we search over the last 24 hours, and all
searched events get sorted into groups depending on the bucket they
belong to (based on their @timestamp value). When
all events have been divided up by time, the
count() function is run on the
series field to return the
number of each different kinds of events per hour.
Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
different kinds of events per hour over the last 24 hours. In this
example we do not just have one group of events per time bucket, but
multiple groups: one group for every value of
method that exists in the
timespan we are searching in. So if we are still searching over a 24
hour period, and we have received only GET,
PUT, and POST requests
in that timespan, we will get three groups of events per bucket (because
we have three different values for
method) Therefore, we end up
with 72 groups of events. And every group contains only events which
correspond to some time bucket and a specific value of
method. Then
count() is run on each of these groups, to give us
the number of GET events per hour,
PUT events per hour, and
POST events per hour. When viewing and hovering
over the buckets within the time chart, the display will show the
precise value and time for the displayed bucket, with the time showing
the point where the bucket starts.
Create Timechart Widget for All Events
Query
logscale
timeChart(span=1h,function=count())
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a
timeline. The timeChart() function is used to
create timechart widgets, in this example a timechart that shows
the number of events per hour over the last 24 hours. We do this
by selecting to search over the last 24 hours in the time selector
in the UI, and then we tell the function to make each time bucket
one hour long (withspan=1hour).
Step-by-Step
Starting with the source repository events.
logscale
timeChart(span=1h,function=count())
Creates 24 time buckets when we search over the last 24 hours, and all
searched events get sorted into groups depending on the bucket they
belong to (based on their @timestamp value). When
all events have been divided up by time, the
count() function is run on each group, giving us
the number of events per hour.
Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of events
per hour over the last 24 hours. The timechart shows one group of events
per time bucket. When viewing and hovering over the buckets within the
time chart, the display will show the precise value and time for the
displayed bucket, with the time showing the point where the bucket
starts.
Get List of Status Codes
Get list of status codes returned and a count of each for a given period using the groupBy() function with count()
Query
logscale
groupBy(field=status,function=count())
Introduction
In this example, the groupBy() function is used to
get a list of status codes for logged events. For instance, the status
code 200 is returned when the request is
successful, and 404 when the page is not
found.
Step-by-Step
Starting with the source repository events.
logscale
groupBy(field=status,function=count())
Groups events by the status field, and counts the
number of events in each group.
It is possible to enhance the query for more detailed analysis. The
following query example groups by both the fields
status and
source, limits to 1000
results, and sorts by count descending.
groupBy([field=status, field=source],
function=count(), limit=1000) | sort(_count, order=desc)
Event Result set.
Summary and Results
The query is used to extract a list of status codes, each with a count
of how many events have that status. The query is useful for summarizing
and analyzing log data.
Sample output from the incoming example data:
status
_count
101
17
200
46183
204
3
307
1
400
2893
401
4
Failure
1
Success
8633
Count All Events
This a simple example using the
count() function. The query just
counts the number of events found in the repository for
the period of time selected:
logscale
count()
The result is just a single number, the total count.
_count
3886817
To format adding a thousands separator:
logscale
count()|format("%,i",field=_count,as=_count)
Produces
_count
3
886,817
Group & Count
In this example, the query uses the
count() function within the
groupBy() function. The first
parameter given is the field upon which to group the data.
In this case, it's the HTTP method (for example,
GET,
PUT,
POST). The second parameter says
to use the function count() to count
the number occurrences for each method found.
logscale
groupBy(field=method,function=count())
The result is a table with the column headings,
method and
_count, with the
values for each:
You can use the count() function in
conjunction with the timeChart()
function to count the number occurrences of events or
other factors. By default, the
timeChart() function will aggregate
the data by day. The results will look something like what
you see in the screenshot shown in
Figure 191, “count() Chart of Daily Counts”.
logscale
timeChart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with
a status. For instance, the status code
200 is returned when the
request is successful, and
404 when the page is not
found. To get a list of status codes returned and a count
of each for a given period, you would enter the following
query in the
Search box: