Counts the number of events in the repository, or streaming through the function. The result is put in a field named, _count. You can use this field name to pipe the results to other query functions or general use.

It's possible to specify a field and only events containing that field are counted. It's also possible to do a distinct count. When having many distinct values LogScale will not try to keep them all in memory. An estimate is then used, so the result will not be a precise match.

ParameterTypeRequiredDefault ValueDescription
asstringoptional[a]_count The name of the output field.
distinctbooleanoptional[a]  When specified, counts only distinct values. When this parameter is set to true, LogScale always uses an estimate, which may give an inexact result as the value.
field[b]stringoptional[a]  The field for which only events are counted.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name field can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

Accuracy When Counting Distinct Values

When counting distinct values in a data stream, particularly when there are repeated elements in a limited memory environment, limitations exist in the accuracy of the count to avoid consuming too much memory in the process. For example, if counting 1,000,000 (million) events. If each event contains a different value, then memory is required to store the count for each of those million entries. Even if the field is only 10 bytes long, that is approximate 9MB of memory required to store the state. In LogScale, this affects the limits as outlined in State Sizes and Limits. As noted in that section, LogScale uses an estimation algorithm that produces an estimate of the number of distinct values while keeping the memory usage to a minimum.

While the algorithm in question doesn't give any guarantees on the relative error of the reported result, the typical accuracy (standard error) is less than 2%, with 2/3s of all results being within 1%, tests with up to 10^7 distinct values, the result at worst deviated by less than 0.02%. The worst results for each test can be seen in the table below:

Distinct ValuesResult of distinct countDeviation percentage
10100
1001000
1000995-0.005025125628
10000100390.003884849089
1000001009170.009086675189
1000000984780-0.01545522858
10000000101213020.01198482172

Important

For less than 100 distinct values, the deviation percentage will be exacerbated. For example, if there are only 10 distinct values, a deviation of 1 is 10%, even though it is the smallest possible deviation from the actual number of distinct values.

More typically, values used for aggregations or counts for distinct values will have low cardinality (for example, a small number of distinct values against the overall set).

count() Examples

Below are several examples using the count() function. Some are simple and others are more complex, with functions embedded within others.

Click + next to an example below to get the full details.

Aggregate Status Codes by count() per Minute

Query
logscale
bucket(1min, field=status_code, function=count())
Introduction

Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice we group by two fields: status code and the implicit field _bucket.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(1min, field=status_code, function=count())

    Sets the bucket interval to 1 minute, aggregating the count of the field status_code.

  3. Event Result set.

Summary and Results

Bucketing allows for data to be collected according to a time range. Using the right aggregation function to quantify the value groups that information into the buckets suitable for graphing for example with a Bar Chart, with the size of the bar using the declared function result, count() in this example.

Aggregate Status Codes by count() Per Minute

Time series aggregate status codes by count() per minute into buckets

Query
logscale
bucket(1min, field=status_code, function=count())
Introduction

Bucketing is a powerful technique for optimizing data storage and query performance. Bucketing allows for data to be collected according to a time range, dividing large datasets into manageable parts, thereby making it easier to quickly find specific events. In this example, the bucket() function is used with count() to count different HTTP status codes over time and bucket them into time intervals of 1 minute.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(1min, field=status_code, function=count())

    Counts different HTTP status codes over time and buckets them into time intervals of 1 minute. Notice that we group by two fields: status_code field and the implicit field _bucket.

  3. Event Result set.

Summary and Results

The query is used to optimizing data storage and query performance. Bucketing allows for data to be collected according to a time range. Using the right aggregation function to quantify the value groups that information into the buckets suitable for graphing for example with a Bar Chart, with the size of the bar using the declared function result, count() in this example.

Alert Query for Parsers Issues

Reporting errors

Query
logscale
#type=humio #kind=logs| loglevel=WARN| class = c.h.d.ParserLimitingJob| "Setting reject ingest for"| groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )| timeDiff:=_max-_min| timeDiff > 300000 and _count > 10
Introduction

This alert query tries to balance reacting when there are problems with parsers, without being too restrictive.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    #type=humio #kind=logs

    Filters on all logs across all hosts in the cluster.

  3. logscale
    | loglevel=WARN

    Filters for all events where the loglevel is equal to WARN.

  4. logscale
    | class = c.h.d.ParserLimitingJob

    Assigns the value c.h.d.ParserLimitingJob to the class for the logs having the loglevel value WARN.

  5. logscale
    | "Setting reject ingest for"

    Filters for events containing the string Setting reject ingest for.This is the error message generated when ingested events are rejected.

  6. logscale
    | groupby(id, function=[count(), min(@timestamp), max(@timestamp)] )

    Groups the returned result by the field id, makes a count on the events and returns the minimum timestamp and maximum timestamp. This returns a new event set, with the fields id, _count, _min, and _max.

  7. logscale
    | timeDiff:=_max-_min

    Calculates the time difference between the maximum timestamp values and the minimum timestamp values and returns the result in a new field named timeDiff.

  8. logscale
    | timeDiff > 300000 and _count > 10

    Returns all events where the values of timeDiff is greater that 300000 and where there are more than 10 occurrences.

  9. Event Result set.

Summary and Results

This query is used to set up alerts for parsers issues. Setting up alerts for parsers issues will allow to proactively reach out to customers where their queries are being throttled and help them.

Bucket Events Summarized by count()

Query
logscale
bucket(function=count())
Introduction

Divides the search time interval into buckets. As time span is not specified, the search interval is divided into 127 buckets. Events in each bucket are counted:

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(function=count())

    Summarizes events using the count() into buckets across the selected timespan.

  3. Event Result set.

Summary and Results

This query organizes data into buckets according to the count of events.

Calculate a Percentage of Successful Status Codes Over Time

Query
logscale
| success := if(status >= 500, then=0, else=1)| timechart(series=customer,function=
[
{
[sum(success,as=success),count(as=total)]| pct_successful := (success/total)*100| drop([success,total])}],span=15m,limit=100)
Introduction

Calculate a percentage of successful status codes inside the timeChart() function field.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    | success := if(status >= 500, then=0, else=1)

    Adds a success field at the following conditions:

    • If the value of field status is greater than or equal to 500, set the value of success to 0, otherwise to 1.

  3. logscale
    | timechart(series=customer,function=
    [
      {
        [sum(success,as=success),count(as=total)]

    Creates a new timechart, generating a new series, customer that uses a compound function. In this example, the embedded function is generating an array of values, but the array values are generated by an embedded aggregate. The embedded aggregate (defined using the {} syntax), creates a sum() and count() value across the events grouped by the value of success field generated from the filter query. This is counting the 11 or 0 generated by the if() function; counting all the values and adding up the ones for successful values. These values will be assigned to the success and total fields. Note that at this point we are still within the aggregate, so the two new fields are within the context of the aggregate, with each field being created for a corresponding success value.

  4. logscale
    | pct_successful := (success/total)*100

    Calculates the percentage that are successful. We are still within the aggregate, so the output of this process will be an embedded set of events with the total and success values grouped by each original HTTP response code.

  5. logscale
    | drop([success,total])}],span=15m,limit=100)

    Still within the embedded aggregate, drop the total and success fields from the array generated by the aggregate. These fields were temporary to calculate the percentage of successful results, but are not needed in the array for generating the result set. Then, set a span for the buckets for the events of 15 minutes and limit to 100 results overall.

  6. Event Result set.

Summary and Results

This query shows how an embedded aggregate can be used to generate a sequence of values that can be formatted (in this case to calculate percentages) and generate a new event series for the aggregate values.

Count Events per Repository

Count of the events received by repository

Query
logscale
bucket(span=1d,field=#repo,function=count())| @timestamp:=_bucket| drop(_bucket)
Introduction

Count of X events received by a repo (Cloud).

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    bucket(span=1d,field=#repo,function=count())

    Buckets the values, using the field #repo using a count()

  3. logscale
    | @timestamp:=_bucket

    Updates the timestamp to the value generated by the bucket()

  4. logscale
    | drop(_bucket)

    Discards the _bucket field from the results.

  5. Event Result set.

Summary and Results

The query can be run on each repo. Or, create a view that looks across multiple repos and then run it from there to get all the repo counts in one search.

Count Total of Malware and Nonmalware Events

Count total of malware and nonmalvare events in percentage

Query
logscale
[count(malware, as=_malware), count(nonmalware, as=_nonmalware)]| total := _malware + _nonmalware| nonmalware_pct_total := (_nonmalware/total)*100| malware_pct_total := (_malware/total)*100
Introduction

It is possibe to use the count() function to show the count in percentage of two fields against total. In this example, the function count() function is used to count the field malware and the field nonmalware and have the results returned in percentage. A result set could, for example, be normalware 30%% and nonmalware 70%%.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    [count(malware, as=_malware), count(nonmalware, as=_nonmalware)]

    Returns the counted results of the field malware in a field named _malware and the counted results of the field nonmalware in a field named _nonmalware.

  3. logscale
    | total := _malware + _nonmalware

    Assigns the total of these events to a new field named total.

  4. logscale
    | nonmalware_pct_total := (_nonmalware/total)*100
    
    | malware_pct_total := (_malware/total)*100

    Calculates the _malware and _nonmalware as a percentage of the total.

  5. Event Result set.

Summary and Results

The query is used to get an overview of the total number of malware versus nonmalvare.

Create Time Chart Widget for Different Events

Query
logscale
timeChart(span=1h, function=count(), series=method)
Introduction

The time chart widget is the most commonly used widget in LogScale. It displays bucketed time series data on a timeline. The timeChart() function is used to create time chart widgets, in this example a timechart that shows the number of the different events per hour over the last 24 hours. For example, you may want to count different kinds of HTTP methods used for requests in the logs. If those are stored in a field named method, you can use this field as a series. Furthermore, we select to search over the last 24 hours in the time selector in the UI, and also add a function to make each time bucket one hour long (withspan=1hour).

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    timeChart(span=1h, function=count(), series=method)

    Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the count() function is run on the series field to return the number of each different kinds of events per hour.

  3. Event Result set.

Summary and Results

The query is used to create timechart widgets showing number of different kinds of events per hour over the last 24 hours. In this example we do not just have one group of events per time bucket, but multiple groups: one group for every value of method that exists in the timespan we are searching in. So if we are still searching over a 24 hour period, and we have received only GET, PUT, and POST requests in that timespan, we will get three groups of events per bucket (because we have three different values for method) Therefore, we end up with 72 groups of events. And every group contains only events which correspond to some time bucket and a specific value of method. Then count() is run on each of these groups, to give us the number of GET events per hour, PUT events per hour, and POST events per hour. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.

Create Timechart Widget for All Events

Query
logscale
timeChart(span=1h, function=count())
Introduction

The time chart widget is the most commonly used widget in LogScale. It displays bucketed time series data on a timeline. The timeChart() function is used to create timechart widgets, in this example a timechart that shows the number of events per hour over the last 24 hours. We do this by selecting to search over the last 24 hours in the time selector in the UI, and then we tell the function to make each time bucket one hour long (withspan=1hour).

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    timeChart(span=1h, function=count())

    Creates 24 time buckets when we search over the last 24 hours, and all searched events get sorted into groups depending on the bucket they belong to (based on their @timestamp value). When all events have been divided up by time, the count() function is run on each group, giving us the number of events per hour.

  3. Event Result set.

Summary and Results

The query is used to create timechart widgets showing number of events per hour over the last 24 hours. The timechart shows one group of events per time bucket. When viewing and hovering over the buckets within the time chart, the display will show the precise value and time for the displayed bucket, with the time showing the point where the bucket starts.

Count All Events

This a simple example using the count() function. The query just counts the number of events found in the repository for the period of time selected:

logscale
count()

The result is just a single number, the total count.

_count
3886817

To format adding a thousands separator:

logscale
count()
| format("%,i", field=_count, as=_count)

Produces

_count 
3886,817

Group & Count

In this example, the query uses the count() function within the groupBy() function. The first parameter given is the field upon which to group the data. In this case, it's the HTTP method (for example, GET, PUT, POST). The second parameter says to use the function count() to count the number occurrences for each method found.

logscale
groupby(field=method, function=count())

The result is a table with the column headings, method and _count, with the values for each:

method_count
DELETE7375
GET153493
POST31654

Chart of Daily Counts

count() Chart of Daily Counts

Figure 109. count() Chart of Daily Counts


You can use the count() function in conjunction with the timeChart() function to count the number occurrences of events or other factors. By default, the timeChart() function will aggregate the data by day. The results will look something like what you see in the screenshot shown in Figure 109, “count() Chart of Daily Counts”.

logscale
timechart(function=count())

Table of Daily Counts

When a user accesses a web site, the event is logged with a status. For instance, the status code 200 is returned when the request is successful, and 404 when the page is not found. To get a list of status codes returned and a count of each for a given period, you would enter the following query in the Search box:

logscale
groupby(field=status, function=count())

The sample output is shown below:

status_count
1019
20055258
204137834
3072
4002
4014
40357
404265
50462
stopping6
success6