Joins
Humio supports the joining of queries using the
join()
function. It will return a combined results
set from two queries it's given.
This function provides two join modes: inner and left. If you're familiar with SQL database systems, what would be the left table in SQL is referred to as the primary in Humio, and the right table becomes the subquery.
Inner Joins

Figure 266. Inner Joins
An inner join returns any intersection of matching values, based upon
the criteria you provide, between the primary query and the subquery.
This is illustrated in purple in Figure 1 here. Incidentally, the inner
join is the join()
function's default behavior.
As an example, suppose we're analyzing two repositories to see what kinds of web browsers are generating errors on our web servers. In this example, we want to return the intersection of two types of data: the status codes that return errors (anything 400 or higher) from the accesslog in the current repository, and the web browser type (the user agent) from the weblogs repository. To do this, we can use the following query:
#type=accesslog statuscode>=400
| join({useragent != ""}, field=sourceipaddress, key=browseripaddress, repo=weblogs)
The primary query here is the first line. It says to search the current
working repository for logs of type accesslog
, where
statuscode >= 400
.
Next, it pipes the primary query to the join()
function. It passes in {useragent != ""}
as the
subquery argument to the join()
function. This
subquery returns any value for the useragent
that is
not an empty string.
To join the result sets of these two queries together we will need a way
to match them. If our accesslog and weblogs both contain IP addresses,
that would be a good option and is the one we would use in this example.
However, in this example the two repositories use different field names
for the IP address. The accesslog
uses
sourceipaddress
, while the weblogs
repository uses browseripaddress
. This query defines
that relationship by passing in the field used in the primary query, and
the key that will be matched against in the subquery — i.e.,
field=sourceipaddress
, and
key=browseripaddress
.
To define the external repository that the subquery will run against,
the query here simply passes that repository's name in with the repo
parameter, repo=weblogs
.
As for the results, the primary query will find all
accesslog
entries with an error code. The subquery
will find all entries in the weblogs
repository with
a value present for the useragent
. However, being an
inner join, the combined results will only be returned for entries where
the IP addresses match in both queries.
Left Joins

Figure 267. Left Joins
The left join returns everything in the primary and anything that matches from the subquery. In other words, a query using a left join returns any intersection of matching values, based upon the criteria you provide, between the primary query and the subquery (like an inner join) plus all the values of the primary query. This is illustrated below.
As an example of a left join, suppose we want to get as much information as possible about people who are accessing hosts. The information we have may or may not contain the location of the user, but if it does, we need to include that in our query results.
Since location information is desirable, although optional, but all access logs are required a left join query would better fit our requirements than an inner join query. Here is what it would look like
#type=accesslog
| join({username=*}, field=host, key=hostname, repo=auditlogs, mode=left)
| table(timestamp, username, host, location)
In this example, we get all results from our current repository using
#type=accesslog
as the primary query. Our subquery
(username=*
) states that we wish to return all
usernames. As in the previous example, we need a common point of data to
map the relationship between our repositories. We'll use the host,
again, defining the differing field names used in the two repositories
for that data. We passed those into the join()
function as field=host, key=hostname
. We define our
external repository with repo=auditlogs
.
Since we want all the results from the primary query, even if there is
no match with the subquery, we specify a left join using
mode=left
.
Finally, we display the results in a table by piping the results to
table(timestamp, username, host, location)
.
Join Syntax
The join()
function's signature looks like this
join({subquery}, field=arg1, key=arg2, repo=arg3, mode=inner|left)
The unnamed subquery argument, and the named field
and key
arguments, are required. The named
repo
, include
, and
mode
parameters are optional. There are other options
available. Please see the Join Query Functions
reference page for more details.
First, let's consider a join query against a single repository. In that
scenario, there is no need to define the optional
repo
parameter. The query will look like this
primary query | join({subquery}, field=arg1, key=arg2)
So to break down how the join()
function is used
The primary query is any valid Humio query. This provides the first set of results you wish to work with.
The arguments you pass to the
join()
function define what results should be combined and how.The subquery is any valid Humio query. This provides the second set of results you wish to combine with those from your primary query.
The field parameter's value (
arg1
) defines the field name in the primary query. This value will be used to match data in the subquery.The key parameter's value (
arg2
) defines the field name in the subquery. This value will be used to match data in the primary query.
If the field names being used in field and key are identical, you can use the simplified syntax
primary query | field =~ join({subquery})
In the above example, the field argument would be the actual field name.
For example, if both queries share a field named host
that you wish to join on, you would use:
primary query | host =~ join({subquery})
Joining Two Repositories
In addition to joining two queries against a single repository, the
join()
function can also be used to return a result
set from more than two repositories. To do so, use the optional
repo
parameter to define the repository (as
arg3
) you want the subquery to run against.
primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)
Note
Note that the user running the join() query must have permissions on both repositories.
Limitations of Live Joins
Joins are implemented by starting two queries: the main query and the subquery. For static queries, the subquery runs first and then the main query runs. For live queries, the historic part is done as in the static case, whereas the main query and subquery will run simultaneously for the live part.
Because of this and the distributed nature of Humio, subqueries in live queries are not guaranteed to "see" events before the main query does. In practice this can be thought of as there being a small delay (typically in the order of 5-6 seconds) from an event being ingested to it hitting the subquery. This can lead to surprising results when making live joins on events that occur very close in time.
For example, a query that contains only the count()
function will typically result in a larger count than the query below,
which joins events to themselves:
join({*}, field=id) | count()
This is because at the time the main query runs for a given event, the
subquery has not seen that event yet and the join will therefore not
emit the event. You can avoid this by not joining events that are
ingested close in time, using static queries, or by using
beta:repeating()
.
Important
Due to this limitation, you should avoid using Alerts in combination with live queries, as they may not execute correctly and trigger the alert.
Best Practices for Join Queries
Humio recommends the following when running join()
queries
If your
join()
query key is constantly changing with live data, use live queries to keep the query updated.When debugging an inner join query, temporarily set the join mode to left (
join(mode=left)
). This will help you debug your query by showing all results matched or unmatched by the join clause.Another suggestion is to first write the subquery alone to make sure the results are what you expect. Once they are, then you can confidently include it in a join.