Frequently Used Query Operations
Querying events in LogScale means to combine, mix or match, filter or group elements together to get different results from the same logs.
Searching Strings
The first and simplest query that can be done in LogScale is searching your data as you would normally do in a web browser, by means of the symbols commonly used to refine web searches.
"erro"
This search will look for strings in your data that have the word
error
, or
gingerroot
, or
ferrous
, or even a portion of
these words — for example, looking for
"info"
will also return any string
containing the word sinfonie
.
"64.*"
In this example we are searching all events that have IP addresses
starting with 64
in their strings.
@rawstring=/^64/
This search will return all strings where the
@rawstring field starts with
64
.
url=/prod/
This search would return all strings having the word
product
in the
url field.
Making Searches Case-Insensitive
Free-text searches in LogScale are case-sensitive by default, meaning
that searching for ERROR
will
not match events with the string
error
.
Searches can be made case-insensitive by adding the option
i
at the end of a
regex, this indicates you want the regex to be case-insensitive:
/windows/i
In this example, all strings with either the word
WINDOWS
or
Windows
or
wInDoWs
will be
returned in the results.
Filtering Fields
With filter expressions (or, simply, filters) you can filter your events finding only events that have a specific value of a certain field:
method="GET"
In this example we are selecting all events that have the value
GET
in the field method.
If you are used to SQL, filters are very much like the
WHERE
clause of an
SQL-query.
Combining Filters
You can combine filters using the
and
and
or
logical operators.
method="GET" or
method="POST"
This query will find events with GET
or
POST
in the field
method.
You can also use the Wildcard character
*
to specify any text:
method="P*" or
method="D*ETE"
The and
operator is applied
implicitly if no operator is present:
method="PUT" example.com
This will find events with PUT
in the method field, and example.com
anywhere in the event.
Examples of Filters
Find listed below some of the possible filters, with examples, that are widely used in LogScale.
Full text and field filters
example.com and statuscode=200
It reads as follows: find events that contain both text
example.com
and status code equal to 200
Negating filters
not example.com
and not statuscode=200
It reads as follows: find events that do not contain the text
example.com
and do not have the status
code 200.
All events with a certain field
url=*
It reads as follows: match any event that has a field named url.
Events without a certain field
url!=*
It reads as follows: find all events, if any, that don't have a field named url
Find more details about filters in the Query Filters reference section.
Comparing Fields and Values
An effective usage of filtering in LogScale is provided by comparing a single field against a certain value, that is, to check if that field's value is greater or less than a defined value — for example:
statuscode > 200
This will return all events having a statuscode
value greater than 200
thus
filtering only certain server error codes, for instance.
test()
Function
Not only can LogScale make comparisons between one field and one
value, but it can also compare more fields and their respective
values, using the test()
function. For example,
we may need to check if the value of a certain
field1 is less than the value of another
field2.
test(field1 < field2)
This is an example from a more advanced comparison:
test(field1 != 2 * field2)
where !=
means "not equal to".
This is a test on two fields called field1 and
field2 — such a filter will only select
those events where field1 is not exactly twice
as large as field2.
Let's replace field1 or field2 with real data fields.
groupBy(userid, function=[
{statuscode = 200
| count(as=status200)},
{statuscode = 500
| count(as=status500)}
])
| test(status500 > status200)
This will find all user IDs that have more requests with "server
error" status codes (500
) than
"success" ones
(200
).
Another usage could be:
test(length(userid) == length(method))
This will find all requests where userid and
method fields have the same length, will select
for example events with Chad
and POST
, and
Peter
and
PATCH
.
Combining Expressions
Several expressions can be chained in a LogScale query using the pipe
operator (|
).
Just like in Unix, the pipeline operator takes the output of the expression on its left side and uses it as the input for the expression on its right side.
In the example below, we are counting the number of
GET
requests in the current search interval.
method="GET"
| count()
Tip
To use multiple lines in the Search
Field box, type
<ctrl>+<Enter>
then
type pipe (|
) followed by the
search term.
Aggregating Data
Whereas filters return a selection of events, aggregation functions combine events to produce new results that are no longer the events — often a single number or row. Some examples on the mostly used aggregation functions follow.
avg()
Function
avg(responsesize)
Calculates the average value of the field responsesize and returns just one number as the result, showing the field _avg in the Fields panel on the left side of the interface:
Figure 88. Aggregate Result, _avg Example
count()
Function
Used to count the number of certain events in the repository. For example, we may want to get the list of the status codes found when users access a web site, as well as the total count of each of them:
count(field=statuscode)
It returns just one number as the result, showing the field _count in the Fields panel on the left side of the interface:
Figure 89. Aggregate Result, _count Example
groupBy()
Function
Used to group events based on the value of one or more fields.
groupBy(method)
The result is a table with a row for each method and the number of times that method has been observed:
Figure 90. GroupBy Result, Example One
The query above is a shorthand for:
groupBy(field="method")
or for:
groupBy(field="method",
function=count(as="_count"))
Both will produce the exact same result.
Since groupBy()
is used very frequently, the
parameters function
and as
have default values, and
field
is the unnamed parameter
for groupBy()
.
A similar example of the groupBy()
function would
be:
userid=*
| groupBy(userid, function=count(as="_count"))
The result is a table with a row for each user and the number of times that user has appeared in the events:
Figure 91. GroupBy Result, Example Two
The as
parameter can do more,
see details at Renaming Output Fields.
Grouping can also be done by multiple fields, as in the example below:
groupBy([url, statuscode])
Note
The maximum number of elements in a groupBy()
function is set to 20000.
timeChart()
Function
Used to plot events that have a certain field and produce a chart where the X-axis is time.
For example, events that have a field called responsesize are grouped into series in the time chart, based on its values:
timeChart(responsesize)
The same result can be obtained with the query:
responsesize = *
| timeChart()
stats()
Function
Used to compute multiple aggregate functions over the input.
This is equivalent to count()
function.
stats(function=count())
In the example below, this function will find the maximum and minimum responsesize:
[min_response := min(responsesize), max_response := max(responsesize)]
See Query Functions for more descriptions and examples on these and other aggregation query functions.
Transforming or Modifying Data
While some functions - groupBy()
- result in
completely new records other functions just modify the input records, by
adding, removing or updating fields in the results set.
Some popular transformation functions are listed here below.
eval()
Function
Can modify existing fields or create new ones on-the-fly. For example, we may want to show the responsesize field in kilobytes instead of bytes:
eval(sizeInKb=responsesize / 1000)
concat()
Function
Allows you to associate the values of a given set of fields and add a new field that contains the merge between those concatenated values. Consider having events that contain either a list of users and a list of the URLs visited, see this example query:
concat([url, userid], as="combined")
The values of the fields url and userid are concatenated i.e. put together, to show which user navigated which URLs in the newly created combined field:
Figure 92. Concatenating Fields
For a detailed description of the
as
parameter, see
Renaming Output Fields.
array:contains()
Function
Takes an array-prefix and a regex pattern and searches all events for matches to the specified pattern. Any event that does not have an array containing a matching entry is filtered off, i.e. dropped from the search results. For example, given a sequence of events containing incidents in an incidents array, find all events or hosts containing the 'Cozy Bear' incident, case insensitive:
array:contains(incidents, pattern="cOzY bEaR", flags="i")
replace()
Function
Can replace text in field based on regular expressions, and also support capturing groups. Here is an example of string replacement:
replace("\/products\/(.*)", field=url, replacement="[\"category\"]=$1")
select()
Function
Can select a set of fields from each event, particularly useful if you
need to export those fields. For example, look at
PUT
HTTP method and get a table
that selects userid and
responsetime fields in the searched time range,
to be exported in a CSV
file.
method=PUT
| select([userid, responsetime])
regex()
Function
Regular expressions are described in detail in Extracting Data Fields.
See Query Functions for more descriptions and examples on these and other transformation query functions.
Extracting Data Fields
One important transformation function is regex()
,
used to extract new fields by means of a
regular expression that can
contain one or more capturing groups.
For example, you may want to use the logs to figure out which product pages are the most visited.
regex("/products/.+/.+ HTTP/")
The expression above matches events that contain text like for example
/products/books/102 HTTP/
— the Event list
will now return URLs containing such texts.
Regular expressions are so common that there is even a literal syntax for writing them:
/\/products\/.+\/.+ HTTP\//i
In the query example above, the information about the product category
and product id is encoded in the URL and is not available as a field.
That means you cannot do a groupBy()
. To solve
this, you can run:
regex("/products/(?<category>.+)/(?<productId>.+) HTTP/")
This will extract new fields for each event that matches the regex. This syntax is called Groups and backreferences.
The Event list will now contain two new fields category and productId, which you are now able to group to determine how many product categories there are:
regex("/products/(?<category>.+)/(?<productId>.+) HTTP/")
| groupBy(category)
If you are regularly extracting data from events, it is more efficient to do this during parsing. Please see Parsing for more information.
Renaming Output Fields
LogScale allows you to change the name originally assigned to the output fields, to be displayed as you want, with the scope of giving titles that would be more meaningful to the users.
This is done by adding the parameter
as
, which is typically used with
the count()
function.
We will take the example seen at figure
Figure 91, “GroupBy Result, Example Two”
and modify that query by replacing the value _count
with Total per User with the
as
parameter:
userid=*
| groupBy(userid, function=count(as="Total per User"))
The Event list will show the same count result as with the standard
groupBy()
query, but in a field now renamed
Total per User.
Figure 93. Renaming Fields, Example One
Here is a similar example of query with
as
parameters:
statuscode >= 500
| groupBy(url, function=count(as="Error Count"))
This combined expression firstly selects only those events with URLs having a statuscode equal or greater than 500, and then group the URLs by counting their respective errors.
Figure 94. Renaming Fields, Example Two
The result table will display the error occurrences under the Error Count column, with values from the field now renamed as Error Count.
Another simple way of renaming fields is by using the
rename()
.
Formatting Data
Formatting functions can serve different purposes: change how events look as they enter LogScale and before they are stored; change how results look before they are presented to the user. For example:
format("%.2f", field=amount)
This will set how the field value (the amount) should be formatted — 2 decimals — thus changing the results so that they show only two decimal places.
Joining Data
LogScale allows the pairing of two searches in order to get combined results through intersection of two types of data.
This is done with join()
query functions —
they associate a primary query with a subquery, thus correlating events
coming from different datasets.
For example, an interesting usage of such correlation of events would be the case where you have one event containing email data (email ID, sender and recipient, subject) and another event containing a score about the phishing behaviour of the emails, as well as the email ID.
In this use case example, we would have two datasets like this:
emailID=X123 from=peter to=paul subject='Click this link'
emailID=X128 from=alice to=paul subject='Remember to sign up for lunch'
emailID=X123 phishiness=1.5
emailID=X128 phishiness=0.7
which, if queried this way:
beta:repeating(5m)
| from=* to=*
| join({phishiness > 1.0}, field=emailID, include=[phishiness])
| select([emailID, phishiness, from, to, subject])
would return all the possible suspect emails.
Note
Join queries work perfectly well with static, "historical" queries;
however, you might experience some difficulties in making them work
with live queries. To solve this, a
beta:repeating()
can be added in the query.
In the example query above, the beta:repeating()
function is used because we want our query to be live.
To make these kinds of query work, you need to:
Make sure that the two different events you want to join have an ID in common i.e., share the same key (like in the example case above).
Define the keys or fields that are used to match the results, in cases where there isn't a common item to join by (this can be done by assigning a dummy value to a dummy key field), or where the key values are in different fields. See
join()
for details.Make sure your subquery (or inner query) is made against a dataset that would presumably return a small result (in the example case above, we don't expect to have many email IDs reporting phishing).
For much more details on LogScale Join concepts and behaviour and the relevant query functions, see Query Joins and Join Query Functions respectively.
Special Behaviour for Live Joins
join()
functions do not support live queries, for
the reasons explained in
Limitations of Live Joins section.
As a consequence, when you tick the
Live mode checkbox for a query
using join()
, be aware that the query won't be
run truly live but repeatedly instead, aiming to
provide the most recent results.
For this purpose, an indicator in the User Interface will specify the last time the query was run:
Figure 95. Repeating Query Indicator