Use this query function to find the most common values of a field in a set of events — the top of an ordered list of results. It's also possible to find the occurrences of a field using the value of another field.

The top() query function is a more succinct and powerful way to execute the groupBy() query in conjunction with count() and sort():

logscale
groupby([*fields*], function=count())
| sort(_count)
ParameterTypeRequiredDefaultDescription
asstringoptional[a]_count or _sum The optional name of the output field.
errornumberoptional[a]5 The error threshold in percentage for displaying a warning message when not precise enough.
field[b]Array of stringsrequired  The fields on which to group and count. An event is not counted if fields are not present.
limitnumberoptional[a]10 Sets the number of results to find.
maxstringoptional[a]  This changes the function used from count() to find the max value of a max field (i.e., groupby([*fields*], function=max(*max*)) | sort(_max)).
percentbooleanoptional[a]false Will add a column named percent containing the count in percentage of total.
reststringoptional[a]  Will add an extra row containing the count of all the other values not included.
sumstringoptional[a]  This changes the function used from count() to sum() (i.e., like groupby([*fields*], function=sum(*sum*)) | sort(_sum)).

[a] Optional parameters use their default value unless explicitly set

[b] The argument name field can be omitted.

Omitted Argument Names

The argument name for field can be omitted; the following forms of this function are equivalent:

logscale
top("value")

and:

logscale
top(field="value")

LogScale's top() function uses an approximative algorithm from DataSketches to compute the most frequent items. This algorithm is guaranteed to be exact for up to 0.75* maxMapSize items, where maxMapSize is 32768 items in historical queries and 8192 items in live queries.

The algorithm provides an upper bound for the error. By default, a warning is issued if the guaranteed precision is less than five percent; such error threshold can be modified using the error parameter. See Frequent Items, Error Threshold Table for more information.

top() only returns events that are guaranteed to be in the top k events — that is to say, that are not false positives.

When the top() function is executed, if there are more fields other than those grouped and counted, the rest parameter will return an extra row containing a count of all the remaining values — those values that were not included in the top results. To enable it, set the parameter to whatever you want the row to be labeled.

top() Examples

Extracting the Top Most Viewed Pages of a Website

Query
flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result
logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
| top(url_page, limit=12, rest=others)
Introduction

Your LogScale repository is ingesting log entries from a web server for a photography site. On this site there are several articles about photography. The URL for articles on this site ends with the extension, .page instead of .html.

You want to extract the page users viewed and then list the top most viewed pages.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Extract the page viewed by users by returning the name of the file from the url field and storing that result in a field labeled, url_page.

    logscale
    regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
  3. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    List the top most viewed pages. The first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use.

    logscale
    | top(url_page, limit=12, rest=others)
  4. Event Result set

Summary and Results

The table displays the matches from the most viewed pages during the selected period to the least — limited to the top twelve.

url_page_count
home.page51
index.page21
home-studio.page10
a-better-digital-camera.page7
is-film-better.page6
leica-q-customized.page6
student-kit.page4
focusing-screens.page4
changing-images-identity.page2
others27

Listing URLs Not Found

Query
flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result
logscale
statuscode = "404"
| top(url, limit=20)
Introduction

You want to get a list of URLs that users attempted to view, but the web server could not find them.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    Get only events in which the statuscode is 404: that's the HTTP code which indicates that the requested URL was not found.

    logscale
    statuscode = "404"
  3. flowchart LR; repo{{Events}} 0[/Filter/] 1[/Filter/] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    Pipe the events to the top() function to group the results on the value of the urlurl field and to list the top twenty.

    logscale
    | top(url, limit=20)
  4. Event Result set

Summary and Results

The results show a few attempts to access pages like wp-login.php and similar pages. These are attempts to log into WordPress, Drupal, and other content management systems. Since this particular web server does not use a CMS, these pages don't exist on the server and are indications of failed hacker attempts.

url_count
/.env962
/favicon.ico67
/api/.env22
/core/.env25
/backend/.env25
/info.php19
/admin/.env19
/user/login18