Using the join()
Function
join()
Syntax
The join()
function's signature looks like this
join({subquery}, field=arg1, key=arg2)
The unnamed subquery argument, and the named
field and key arguments, are
required. The named
repo
,
include
, and
mode
parameters are
optional. There are other options available. Please see the
join()
reference page for more details.
First, let's consider a join query against a single repository. In
that scenario, there is no need to define the optional
repo
argument. The query
will look like this
primary query
| join({subquery}, field=arg1, key=arg2)
To summarize the join()
function operation:
The primary query is any valid LogScale query. This provides the first set of results you wish to work with.
The arguments you pass to the
join()
function define what results should be combined and how.The subquery is any valid LogScale query. This provides the second set of results you wish to combine with those from your primary query.
The field parameter's value (
arg1
) defines the field name in the primary query. This value will be used to match data in the subquery.The key parameter's value (
arg2
) defines the field name in the subquery. This value will be used to match data in the primary query.
If the field names being used in field and key are identical, you can use the simplified syntax
primary query
| field =~ join({subquery})
In the above example, the field argument would be the actual field name. For example, if both queries share a field named host that you wish to join on, you would use:
primary query
| host =~ join({subquery})
When running a join()
query, the start and end
times for the span of the subquery in the join can be be specified
using the start
and
end
parameters. This bounds
the subquery within the time range given.
Using these two options, the parameters are relative to the start and
end time of the primary query. For example, if the primary query is
executed with a timespan of a the current day, setting
start
to 2h
will be 2 hours before the start of the current day.
The following query, if run with the default time of 1 day until now (12:20 on 23rd October), will not show users not known about 5 hours before the query was executed:
type = "authentication"
| join({type != "authentication"},field=user,key=authuser,include=[port],start=24h,end=5h)
| groupby(user,function=[max("@timestamp"),min("@timestamp")])
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_max,as="maxtime")
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_min,as="mintime")
This creates this output:
user | _max | _min | maxtime | mintime |
---|---|---|---|---|
bsmith | 1729682282587 | 1729663215893 | 10/23/2024 11:18:02 Wed | 10/23/2024 06:00:15 Wed |
tjones | 1729682282587 | 1729663215893 | 10/23/2024 11:18:02 Wed | 10/23/2024 06:00:15 Wed |
Removing the the start
and
end
parameters gives the
full set:
user | _max | _min | maxtime | mintime |
---|---|---|---|---|
bsmith | 1729682282587 | 1729663215893 | 10/23/2024 11:18:02 Wed | 10/23/2024 06:00:15 Wed |
tjones | 1729682282587 | 1729663215893 | 10/23/2024 11:18:02 Wed | 10/23/2024 06:00:15 Wed |
tmcdonald | 1729682282587 | 1729663215893 | 10/23/2024 11:18:02 Wed | 10/23/2024 06:00:15 Wed |
Joining Two Repositories
In addition to joining two queries against a single repository, the
join()
function can also be used to return a
result set from more than two repositories. To do so, use the optional
repo
parameter to define the
repository (as arg3
) you want the subquery to run
against.
primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)
Note
The user running the join()
query must have
permissions on both repositories.