Use this query function to find the most common values of a field in a set of events — the top of an ordered list of results. It's also possible to find the occurrences of a field using the value of another field.
The top()
query function is a more succinct
and powerful way to execute the groupBy()
query in conjunction with count()
and
sort()
:
groupBy([*fields*], function=count())
| sort(_count)
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _count or _sum | The optional name of the output field. |
error | number | optional[a] | 5 | The error threshold in percentage for displaying a warning message when not precise enough. |
field [b] | array of strings | required | The fields on which to group and count. An event is not counted if fields are not present. | |
limit | number | optional[a] | 10 | Sets the number of results to find. |
Minimum | 1 | |||
max | string | optional[a] | This changes the function used from count() to find the max value of a max field (for example, groupBy([*fields*], function=max(*max*)) | sort(_max) ). | |
percent | boolean | optional[a] | false | Will add a column named percent containing the count in percentage of total. |
rest | string | optional[a] | Will add an extra row containing the count of all the other values not included. | |
sum | string | optional[a] | This changes the function used from count() to sum() (for example, like groupBy([*fields*], function=sum(*sum*)) | sort(_sum) ). | |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale Syntaxtop(["value"])
and:
logscale Syntaxtop(field=["value"])
These examples show basic structure only.
LogScale's top()
function uses an
approximative algorithm from
DataSketches
to compute the most frequent items. This algorithm is guaranteed
to be exact for up to 0.75*
maxMapSize
items, where
maxMapSize
is
32768
items in historical
queries and 8192
items in live
queries.
The algorithm provides an upper bound for the error. By default,
a warning is issued if the guaranteed precision is less than
five percent; such error threshold can be modified using the
error
parameter. See
Frequent
Items, Error Threshold Table for more information.
top()
only returns events that are
guaranteed to be in the top k events — that is to say,
that are not false positives.
When the top()
function is executed, if
there are more fields other than those grouped and counted, the
rest
parameter will
return an extra row containing a count of all the remaining
values — those values that were not included in the top
results. To enable it, set the parameter to whatever you want
the row to be labeled.
top()
Examples
Click
next to an example below to get the full details.Calculate Query Costs by User and Repository in a Single Field
Calculate query costs by user across multiple repositories, showing the repository/user as a single field
Query
#type=humio #kind=logs class=c.h.j.RunningQueriesLoggerJob message="Highest Cost query"
| repoUser:= format("%s/%s", field=[dataspace, initiatingUser])
| top(repoUser, sum=deltaTotalCost, as=cost)
|table([cost, repoUser], sortby=cost)
Introduction
In this example, the query filter events in the humio
repository that are tagged with
kind
equal to
logs
and then returns the events
where the class field has values
containing
c.h.j.RunningQueriesLoggerJob
,
searching for the specific value Highest Cost
query
. The query then combines the results in a new field
repoUser. The query then uses
top()
and table()
functions to
aggregate and display the results.
Example incoming data might look like this:
#type | #kind | class | message | timestamp | dataspace | initiatingUser | totalLiveCost | totalStaticCost | deltaTotalCost | repo |
---|---|---|---|---|---|---|---|---|---|---|
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:30:00Z | production | john.doe | 1500 | 800 | 2300 | security-logs |
humio | logs c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:31:00Z | development | jane.smith | 2000 | 1200 | 3200 | app-logs | |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:32:00Z | staging | bob.wilson | 1000 | 500 | 1500 | infra-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:33:00Z | production | john.doe | 1800 | 900 | 2700 | security-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:34:00Z | development | jane.smith | 2500 | 1300 | 3800 | app-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:35:00Z | staging | alice.cooper | 1200 | 600 | 1800 | infra-logs |
Step-by-Step
Starting with the source repository events.
- logscale
#type=humio #kind=logs class=c.h.j.RunningQueriesLoggerJob message="Highest Cost query"
Filters for Humio internal logs containing
c.h.j. RunningQueriesLoggerJob
in the class field and where the value in the message field is equal toHighest Cost query
. - logscale
| repoUser:= format("%s/%s", field=[dataspace, initiatingUser])
Combines the fields dataspace and initiatingUser with a
/
separator, and then assigns the combined value to a new field named repoUser. Example of combined value:dataspace/username
. - logscale
| top(repoUser, sum=deltaTotalCost, as=cost)
Finds the most common values in the field repoUser, makes a sum of the field deltaTotalCost, and returns the results in a new field named cost.
- logscale
|table([cost, repoUser], sortby=cost)
Displays the results in a table with fields
cost
andrepoUser
, sorted by the columncost
. Event Result set.
Summary and Results
The query is used to search across multiple repositories and calculate query costs per user, by combining costs and showing the repository/user as a single field.
Sample output from the incoming example data:
cost | repoUser |
---|---|
3200 | development/jane.smith |
2300 | production/john.doe |
1500 | staging/bob.wilson |
Extract the Top Most Viewed Pages of a Website
Query
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
| top(url_page, limit=12, rest=others)
Introduction
Your LogScale repository is ingesting log entries from a web
server for a photography site. On this site there are several articles
about photography. The URL for articles on this site ends with the
extension, .page
instead of
.html
.
You want to extract the page users viewed and then list the top most viewed pages.
Step-by-Step
Starting with the source repository events.
- logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
Extracts the page viewed by users by returning the name of the file from the url field and storing that result in a field labeled, url_page.
- logscale
| top(url_page, limit=12, rest=others)
Lists the top most viewed pages. The first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use.
Event Result set.
Summary and Results
The table displays the matches from the most viewed pages during the selected period to the least — limited to the top twelve.
url_page | _count |
---|---|
home.page | 51 |
index.page | 21 |
home-studio.page | 10 |
a-better-digital-camera.page | 7 |
is-film-better.page | 6 |
leica-q-customized.page | 6 |
student-kit.page | 4 |
focusing-screens.page | 4 |
changing-images-identity.page | 2 |
others | 27 |
List URLs Not Found
Query
statuscode = "404"
| top(url, limit=20)
Introduction
You want to get a list of URLs that users attempted to view, but the web server could not find them.
Step-by-Step
Starting with the source repository events.
- logscale
statuscode = "404"
Filters only events in which the statuscode is 404: that is the HTTP code which indicates that the requested URL was not found.
- logscale
| top(url, limit=20)
Pipe the events to the
top()
function to group the results on the value of the urlurl field and to list the top twenty. Event Result set.
Summary and Results
The results show a few attempts to access pages like
wp-login.php
and similar pages. These
are attempts to log into WordPress, Drupal, and other content management
systems. Since this particular web server does not use a CMS, these
pages don't exist on the server and are indications of failed hacker
attempts.
url | _count |
---|---|
/.env | 962 |
/favicon.ico | 67 |
/api/.env | 22 |
/core/.env | 25 |
/backend/.env | 25 |
/info.php | 19 |
/admin/.env | 19 |
/user/login | 18 |