Counts the number of events in the repository, or streaming through the
function. The result is put in a field named,
_count. You can use this field
name to pipe the results to other query functions or general use.
It's possible to specify a field and only events containing that field are
counted. It's also possible to do a distinct count. When having many
distinct values LogScale will not try to keep them all in memory.
An estimate is then used, so the result will not be a precise match.
When specified, counts only distinct values. When this parameter is set to true, LogScale always uses an estimate, which may give an inexact result as the value.
Hide omitted argument names for this functionShow omitted argument names for this function
Omitted Argument Names
The argument name for field can be omitted; the following forms of this function are equivalent:
logscale Syntax
count("value")
and:
logscale Syntax
count(field="value")
These examples show basic structure only.
Accuracy When Counting Distinct Values
When counting distinct values in a data stream, particularly when there
are repeated elements in a limited memory environment, limitations exist
in the accuracy of the count to avoid consuming too much memory in the
process. For example, if counting 1,000,000 (million) events. If each
event contains a different value, then memory is required to store the
count for each of those million entries. Even if the field is only 10
bytes long, that is approximate 9MB of memory required to store the
state. In LogScale, this affects the limits as outlined in
State Sizes and Limits. As noted in that
section, LogScale uses an estimation algorithm that produces an
estimate of the number of distinct values while keeping the memory usage
to a minimum.
While the algorithm in question doesn't give any guarantees on the
relative error of the reported result, the typical accuracy (standard
error) is less than 2%, with 2/3s of all results being within 1%, tests
with up to 10^7 distinct values, the result at worst deviated by less
than 0.02%. The worst results for each test can be seen in the table
below:
Distinct Values
Result of distinct count
Deviation percentage
10
10
0
100
100
0
1000
995
-0.005025125628
10000
10039
0.003884849089
100000
100917
0.009086675189
1000000
984780
-0.01545522858
10000000
10121302
0.01198482172
Important
For less than 100 distinct values, the deviation percentage will be
exacerbated. For example, if there are only 10 distinct values, a
deviation of 1 is 10%, even though it is the smallest possible
deviation from the actual number of distinct values.
More typically, values used for aggregations or counts for distinct
values will have low cardinality (for example, a small number of
distinct values against the overall set).
Counts different HTTP status codes over time and buckets them into
time intervals of 1 minute. Notice we group by two fields:
status code and the
implicit field _bucket.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Sets the bucket interval to 1 minute, aggregating the count of
the field status_code.
Event Result set.
Summary and Results
Bucketing allows for data to be collected according to a time
range. Using the right aggregation function to quantify the value
groups that information into the buckets suitable for graphing for
example with a Bar Chart, with the
size of the bar using the declared function result,
count() in this example.
Time series aggregate status codes by count() per minute into buckets
Query
logscale
bucket(1min,field=status_code,function=count())
Introduction
Bucketing is a powerful technique for optimizing data storage and
query performance. Bucketing allows for data to be collected
according to a time range, dividing large datasets into manageable
parts, thereby making it easier to quickly find specific events.
In this example, the bucket() function is
used with count() to count different HTTP
status codes over time and bucket them into time intervals of 1
minute.
Step-by-Step
Starting with the source repository events.
logscale
bucket(1min,field=status_code,function=count())
Counts different HTTP status codes over time and buckets them
into time intervals of 1 minute. Notice that we group by two
fields: status_code
field and the implicit field _bucket.
Event Result set.
Summary and Results
The query is used to optimizing data storage and query
performance. Bucketing allows for data to be collected according
to a time range. Using the right aggregation function to
quantify the value groups that information into the buckets
suitable for graphing for example with a Bar
Chart, with the size of the bar using the
declared function result, count() in this
example.
Groups the returned result by the field
id, makes a count on the
events and returns the minimum timestamp and maximum timestamp.
This returns a new event set, with the fields
id,
_count, _min, and
_max.
logscale
|timeDiff:=_max-_min
Calculates the time difference between the maximum timestamp
values and the minimum timestamp values and returns the result
in a new field named
timeDiff.
logscale
|timeDiff>300000and_count>10
Returns all events where the values of
timeDiff is greater that
300000 and where there are
more than 10 occurrences.
Event Result set.
Summary and Results
This query is used to set up alerts for parsers issues. Setting
up alerts for parsers issues will allow to proactively reach out
to customers where their queries are being throttled and help
them.
Divides the search time interval into buckets. As time span is
not specified, the search interval is divided into 127 buckets.
Events in each bucket are counted:
Step-by-Step
Starting with the source repository events.
logscale
bucket(function=count())
Summarizes events using the count() into
buckets across the selected timespan.
Event Result set.
Summary and Results
This query organizes data into buckets according to the count of
events.
Calculate a Percentage of Successful Status Codes Over Time
Creates a new timechart, generating a new series,
customer that uses a
compound function. In this example, the embedded function is
generating an array of values, but the array values are
generated by an embedded aggregate. The embedded aggregate
(defined using the {} syntax),
creates a sum() and
count() value across the events grouped by
the value of success
field generated from the filter query. This is counting the
11 or
0 generated by the
if() function; counting all the values and
adding up the ones for successful values. These values will be
assigned to the success
and total fields. Note
that at this point we are still within the aggregate, so the two
new fields are within the context of the aggregate, with each
field being created for a corresponding
success value.
logscale
|pct_successful:=(success/total)*100
Calculates the percentage that are successful. We are still
within the aggregate, so the output of this process will be an
embedded set of events with the
total and
success values grouped
by each original HTTP response code.
logscale
|drop([success,total])}],span=15m,limit=100)
Still within the embedded aggregate, drop the
total and
success fields from the
array generated by the aggregate. These fields were temporary to
calculate the percentage of successful results, but are not
needed in the array for generating the result set. Then, set a
span for the buckets for the events of 15 minutes and limit to
100 results overall.
Event Result set.
Summary and Results
This query shows how an embedded aggregate can be used to
generate a sequence of values that can be formatted (in this
case to calculate percentages) and generate a new event series
for the aggregate values.
Buckets the values, using the field #repo
using a count()
logscale
|@timestamp:=_bucket
Updates the timestamp to the value generated by the
bucket()
logscale
|drop(_bucket)
Discards the _bucket field from the
results.
Event Result set.
Summary and Results
The query can be run on each repo. Or, create a view that looks
across multiple repos and then run it from there to get all the
repo counts in one search.
Count Total of Malware and Nonmalware Events
Count total of malware and nonmalvare events in percentage
It is possibe to use the count() function to
show the count in percentage of two fields against total. In this
example, the function count() function is
used to count the field
malware and the field
nonmalware and have the
results returned in percentage. A result set could, for example,
be normalware 30%% and nonmalware 70%%.
Returns the counted results of the field
malware in a field named
_malware and the counted results of the
field nonmalware in a
field named _nonmalware.
logscale
|total:=_malware+_nonmalware
Assigns the total of these events to a new field named
total.
Calculates the _malware and
_nonmalware as a percentage of the total.
Event Result set.
Summary and Results
The query is used to get an overview of the total number of
malware versus nonmalvare.
Create Time Chart Widget for Different Events
Query
logscale
timeChart(span=1h,function=count(),series=method)
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart() function is used to create time
chart widgets, in this example a timechart that shows the number
of the different events per hour over the last 24 hours. For
example, you may want to count different kinds of HTTP methods
used for requests in the logs. If those are stored in a field
named method, you can use
this field as a series. Furthermore, we select to
search over the last 24 hours in the time selector in the UI, and
also add a function to make each time bucket one hour long
(withspan=1hour).
Step-by-Step
Starting with the source repository events.
logscale
timeChart(span=1h,function=count(),series=method)
Creates 24 time buckets when we search over the last 24 hours,
and all searched events get sorted into groups depending on the
bucket they belong to (based on their @timestamp
value). When all events have been divided up by
time, the count() function is run on the
series field to return
the number of each different kinds of events per hour.
Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
different kinds of events per hour over the last 24 hours. In
this example we do not just have one group of events per time
bucket, but multiple groups: one group for every value of
method that exists in
the timespan we are searching in. So if we are still searching
over a 24 hour period, and we have received only
GET, PUT, and
POST requests in that timespan, we will
get three groups of events per bucket (because we have three
different values for
method) Therefore, we
end up with 72 groups of events. And every group contains only
events which correspond to some time bucket and a specific value
of method. Then
count() is run on each of these groups, to
give us the number of GET events per
hour, PUT events per hour, and
POST events per hour. When viewing and
hovering over the buckets within the time chart, the display
will show the precise value and time for the displayed bucket,
with the time showing the point where the bucket starts.
Create Timechart Widget for All Events
Query
logscale
timeChart(span=1h,function=count())
Introduction
The time chart widget is the most commonly used widget in
LogScale. It displays bucketed time series data on a timeline. The
timeChart() function is used to create
timechart widgets, in this example a timechart that shows the
number of events per hour over the last 24 hours. We do this by
selecting to search over the last 24 hours in the time selector in
the UI, and then we tell the function to make each time bucket one
hour long (withspan=1hour).
Step-by-Step
Starting with the source repository events.
logscale
timeChart(span=1h,function=count())
Creates 24 time buckets when we search over the last 24 hours,
and all searched events get sorted into groups depending on the
bucket they belong to (based on their @timestamp
value). When all events have been divided up by
time, the count() function is run on each
group, giving us the number of events per hour.
Event Result set.
Summary and Results
The query is used to create timechart widgets showing number of
events per hour over the last 24 hours. The timechart shows one
group of events per time bucket. When viewing and hovering over
the buckets within the time chart, the display will show the
precise value and time for the displayed bucket, with the time
showing the point where the bucket starts.
Count All Events
This a simple example using the count() function.
The query just counts the number of events found in the repository for
the period of time selected:
logscale
count()
The result is just a single number, the total count.
_count
3886817
To format adding a thousands separator:
logscale
count()|format("%,i",field=_count,as=_count)
Produces
_count
3
886,817
Group & Count
In this example, the query uses the count()
function within the groupBy() function. The first
parameter given is the field upon which to group the data. In this
case, it's the HTTP method (for example, GET,
PUT, POST). The
second parameter says to use the function count()
to count the number occurrences for each method found.
logscale
groupby(field=method,function=count())
The result is a table with the column headings,
method and
_count, with the values for
each:
You can use the count() function in conjunction
with the timeChart() function to count the number
occurrences of events or other factors. By default, the
timeChart() function will aggregate the data by
day. The results will look something like what you see in the
screenshot shown in
Figure 110, “count() Chart of Daily Counts”.
logscale
timechart(function=count())
Table of Daily Counts
When a user accesses a web site, the event is logged with a status.
For instance, the status code 200 is
returned when the request is successful, and
404 when the page is not found. To get
a list of status codes returned and a count of each for a given
period, you would enter the following query in the
Search box: