Joins
LogScale supports the joining of queries using the
join()
function. It will return a combined results
set from two queries it's given.
This function provides two join modes: inner and left. If you're familiar with SQL database systems, what would be the left table in SQL is referred to as the primary in LogScale, and the right table becomes the subquery.
Inner Joins

Figure 301. Inner Joins
An inner join returns any intersection of matching values, based upon
the criteria you provide, between the primary query and the subquery.
This is illustrated in purple in Figure 1 here. Incidentally, the inner
join is the join()
function's default behavior.
As an example, suppose we're analyzing two repositories to see what kinds of web browsers are generating errors on our web servers. In this example, we want to return the intersection of two types of data: the status codes that return errors (anything 400 or higher) from the accesslog in the current repository, and the web browser type (the user agent) from the weblogs repository. To do this, we can use the following query:
#type=accesslog statuscode>=400
| join({useragent != ""}, field=sourceipaddress, key=browseripaddress, repo=weblogs)
The primary query here is the first line. It says to search the current
working repository for logs of type
accesslog
, where
statuscode >= 400
.
Next, it pipes the primary query to the join()
function. It passes in {useragent !=
""}
as the subquery argument to the
join()
function. This subquery returns any value
for the useragent
that is not an
empty string.
To join the result sets of these two queries together we will need a way
to match them. If our accesslog and weblogs both contain IP addresses,
that would be a good option and is the one we would use in this example.
However, in this example the two repositories use different field names
for the IP address. The accesslog
uses sourceipaddress
, while the
weblogs
repository uses
browseripaddress
. This query
defines that relationship by passing in the field used in the primary
query, and the key that will be matched against in the subquery —
i.e., field=sourceipaddress
, and
key=browseripaddress
.
To define the external repository that the subquery will run against,
the query here simply passes that repository's name in with the repo
parameter, repo=weblogs
.
As for the results, the primary query will find all
accesslog
entries with an error
code. The subquery will find all entries in the
weblogs
repository with a value
present for the useragent
.
However, being an inner join, the combined results will only be returned
for entries where the IP addresses match in both queries.
Left Joins

Figure 302. Left Joins
The left join returns everything in the primary and anything that matches from the subquery. In other words, a query using a left join returns any intersection of matching values, based upon the criteria you provide, between the primary query and the subquery (like an inner join) plus all the values of the primary query. This is illustrated below.
As an example of a left join, suppose we want to get as much information as possible about people who are accessing hosts. The information we have may or may not contain the location of the user, but if it does, we need to include that in our query results.
Since location information is desirable, although optional, but all access logs are required a left join query would better fit our requirements than an inner join query. Here is what it would look like
#type=accesslog
| join({username=*}, field=host, key=hostname, repo=auditlogs, mode=left)
| table(timestamp, username, host, location)
In this example, we get all results from our current repository using
#type=accesslog
as the primary
query. Our subquery (username=*
)
states that we wish to return all usernames. As in the previous example,
we need a common point of data to map the relationship between our
repositories. We'll use the host, again, defining the differing field
names used in the two repositories for that data. We passed those into
the join()
function as
field=host, key=hostname
. We
define our external repository with
repo=auditlogs
.
Since we want all the results from the primary query, even if there is
no match with the subquery, we specify a left join using
mode=left
.
Finally, we display the results in a table by piping the results to
table(timestamp, username, host,
location)
.
Join Syntax
The join()
function's signature looks like this
join({subquery}, field=arg1, key=arg2, repo=arg3, mode=inner|left)
The unnamed subquery argument, and the named
field
and
key
arguments, are required. The
named repo
,
include
, and
mode
parameters are optional.
There are other options available. Please see the
Join Query Functions reference page for more
details.
First, let's consider a join query against a single repository. In that
scenario, there is no need to define the optional
repo
parameter. The query will
look like this
primary query | join({subquery}, field=arg1, key=arg2)
So to break down how the join()
function is used
The primary query is any valid LogScale query. This provides the first set of results you wish to work with.
The arguments you pass to the
join()
function define what results should be combined and how.The subquery is any valid LogScale query. This provides the second set of results you wish to combine with those from your primary query.
The field parameter's value (
arg1
) defines the field name in the primary query. This value will be used to match data in the subquery.The key parameter's value (
arg2
) defines the field name in the subquery. This value will be used to match data in the primary query.
If the field names being used in field and key are identical, you can use the simplified syntax
primary query | field =~ join({subquery})
In the above example, the field argument would be the actual field name.
For example, if both queries share a field named
host
that you wish to join on, you
would use:
primary query | host =~ join({subquery})
Joining Two Repositories
In addition to joining two queries against a single repository, the
join()
function can also be used to return a result
set from more than two repositories. To do so, use the optional
repo
parameter to define the
repository (as arg3
) you want the
subquery to run against.
primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)
Note
Note that the user running the join() query must have permissions on both repositories.
Limitations of Live Joins
Joins are implemented by starting two queries: the main query and the subquery. For static queries, the subquery runs first and then the main query runs. For live queries, the historic part is done as in the static case, whereas the main query and subquery will run simultaneously for the live part.
LogScale's notions of liveness and joins are not compatible, meaning
that the results of live queries using the join()
function may become incorrect over time. Instead of running these
queries fully live, they are internally repeated at an interval
determined by the server, which can lead to a delay on new results that
depend on the resources the query uses.
Important
You should avoid using join()
functions in
Alerts, as they fail to detect some
matches and therefore would not trigger the alert as expected.
Instead, try using Scheduled Searches.
Support for using join()
functions in alerts may
be removed in future versions.
Best Practices for Join Queries
LogScale recommends the following when running
join()
queries.
If your
join()
query key is constantly changing with live data, you should be aware that using live queries with thejoin()
function has some significant limitations, the reasons are explained at Limitations of Live Joins and atjoin()
.When debugging an inner join query, temporarily set the join mode to left (
join(mode=left)
). This will help you debug your query by showing all results matched or unmatched by the join clause.First write the subquery alone to make sure the results are what you expect. Once they are, then you can confidently include it in a join.