Use this query function to find the most common values of a field in a set of events, the top of an ordered list of results. It's also possible to find the occurrences of a field using the value of another field.

The top() query function is a more succinct and powerful way to execute the groupBy() query in conjunction with count() and sort():

logscale
groupby([*fields*], function=count())
| sort(_count)
ParameterTypeRequiredDefaultDescription
asstringfalse_count or _sumThe optional name of the output field.
errornumberfalse5The error threshold in percent for displaying a warning message when not precise enough.
field[string]true This is fields on which to group and count. An event is not counted if the fields aren't present. [a]
limitnumberfalse10Sets the number of results to return.
maxstringfalse This changes function used from count() to finding the max value of a max field (i.e., groupby([*fields*], function=max(*max*)) | sort(_max)).
percentbooleanfalsefalseWill add a column named percent containing the count in percentage of total.
reststringfalse Will add an extra row containing the count of all the other values not included.
sumstringfalse This changes function used from count() to sum() (i.e., like groupby([*fields*], function=sum(*sum*)) | sort(_sum)).

[a] If an argument name is not given, field is the default argument.

When the top() query function is executed, if there were more fields than those that were be grouped and counted, the rest parameter will return an extra row containing a count of all other values, values that were not included in the top results. To enable it, set it to a whatever you want the row to be labeled.

A warning message will be displayed if the results returned are not precise enough. The error parameter is used to specify the error threshold in percent — the default is five percent. You may lower that value if you want to know about results that are not more precise.

When the data set becomes huge, the top() function uses a streaming approximation algorithm. It is implemented with datasketches. By default, a warning is issued if the precision is less than five percent. This can be specified using the error parameter. The implementation uses a maxMapSize with value 32768 for historical queries, and 8192 for live queries. See Frequent Items, Error Threshold Table for more information. Only results falling within the threshold are returned.

top() Examples

There are many ways in which the top() function may be used. As an example of how it may be used, suppose you have a LogScale repository that's ingesting log entries from a web server for a photography site. On this site are several articles about photography. The URL for articles on this site end with the extension, .page instead of .html. Based on this, you can use the regex() query function to extract the page users viewed and then use the top() function to list the top most viewed pages. You could do that like this:

logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
| top(url_page, limit=12, rest=others)

The first line is for the regex() function. Since this reference page is about the top() function, we won't discuss the details of it — other than it returns the name of the file from the url field and stores that result in a field labeled, url_page.

The second line of the query above shows how you might use the top() function. Notice the first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use. In the screenshot in Figure 405, “top() Example” here, you can see that the last line of the results reads, others.

top() Example

Figure 405. top() Example


You can see in the results shown in the screenshot that the matches displayed, from the most viewed page during the selected period to the least — limited to the top twelve. The thirteenth line is a total of all other pages.

As another example, suppose you want to get a list of URL's that users attempted to view, but the web server could not find them. You could do a query like this:

logscale
statuscode = "404"
| top(url, limit=20)

In this query, we first get only events in which the statuscode is 404: that's http code which indicates that the requested URL was not found. Those events are then piped to the top() function on the second line of the query. For this function, we want to group the results on the value of the url field and to list the top twenty. The results will look something like the screenshot in Figure 406, “top() Example” here.

top() Example

Figure 406. top() Example


Looking at the screenshot, we can see that there a few attempts to access pages like wp-login.php and similar pages. These are attempts to log into WordPress, Drupal, and other content management systems. Since this particular web server does not use a CMS, these pages don't exist on the server and are indications of failed hacker attempts.