Use this query function to find the most common values of a field in a set of events — the top of an ordered list of results. It's also possible to find the occurrences of a field using the value of another field.

The top() query function is a more succinct and powerful way to execute the groupBy() query in conjunction with count() and sort():

logscale
groupby([*fields*], function=count())
| sort(_count)

Function Traits: Aggregate

ParameterTypeRequiredDefaultDescription
asstringoptional_count or _sum The optional name of the output field.
errornumberoptional5 The error threshold in percentage for displaying a warning message when not precise enough.
field[a]Array of stringsrequired  The fields on which to group and count. An event is not counted if fields are not present.
limitnumberoptional10 Sets the number of results to return.
maxstringoptional  This changes the function used from count() to find the max value of a max field (i.e., groupby([*fields*], function=max(*max*)) | sort(_max)).
percentbooleanoptionalfalse Will add a column named percent containing the count in percentage of total.
reststringoptional  Will add an extra row containing the count of all the other values not included.
sumstringoptional  This changes the function used from count() to sum() (i.e., like groupby([*fields*], function=sum(*sum*)) | sort(_sum)).

[a] The argument name field can be omitted.

The parameter name for field can be omitted; the following forms are equivalent:

logscale
top("value")

and:

logscale
top(field="value")

LogScale's top() function uses an approximative algorithm from DataSketches to compute the most frequent items. This algorithm is guaranteed to be exact for up to 0.75* maxMapSize items, where maxMapSize is 32768 items in historical queries and 8192 items in live queries.

The algorithm provides an upper bound for the error. By default, a warning is issued if the guaranteed precision is less than five percent; such error threshold can be modified using the error parameter. See Frequent Items, Error Threshold Table for more information.

top() only returns events that are guaranteed to be in the top k events — that is to say, that are not false positives.

When the top() function is executed, if there are more fields other than those grouped and counted, the rest parameter will return an extra row containing a count of all the remaining values — those values that were not included in the top results. To enable it, set the parameter to whatever you want the row to be labeled.

top() Examples

Extracting the Top Most Viewed Pages of a Website

Query
flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result
logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)

| top(url_page, limit=12, rest=others)
Introduction

Your LogScale repository is ingesting log entries from a web server for a photography site. On this site there are several articles about photography. The URL for articles on this site ends with the extension, .page instead of .html.

You want to extract the page users viewed and then list the top most viewed pages.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Extract the page viewed by users by returning the name of the file from the url field and storing that result in a field labeled, url_page.

    logscale
    regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
  3. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    List the top most viewed pages. The first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use.

    logscale
    | top(url_page, limit=12, rest=others)
  4. Event Result set

Summary and Results

The table displays the matches from the most viewed pages during the selected period to the least — limited to the top twelve. The thirteenth line is a total of all other pages.

Listing the most viewed webpages with top()

Figure 107. Listing the most viewed webpages with top()


Listing URLs Not Found

Query
flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result
logscale
statuscode = "404"

| top(url, limit=20)
Introduction

You want to get a list of URLs that users attempted to view, but the web server could not find them.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Get only events in which the statuscode is 404: that's the HTTP code which indicates that the requested URL was not found.

    logscale
    statuscode = "404"
  3. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    Pipe the events to the top() function to group the results on the value of the urlurl field and to list the top twenty.

    logscale
    | top(url, limit=20)
  4. Event Result set

Summary and Results

The results show a few attempts to access pages like wp-login.php and similar pages. These are attempts to log into WordPress, Drupal, and other content management systems. Since this particular web server does not use a CMS, these pages don't exist on the server and are indications of failed hacker attempts.

Listing the most viewed webpages with top()

Figure 108. Listing URLs not found with top()