Using the join() Function

join() Syntax

The join() function's signature looks like this

logscale
join({subquery}, field=arg1, key=arg2)

The unnamed sub-query argument, and the named field and key arguments, are required. The named repo, include, and mode parameters are optional. There are other options available. Please see the join() reference page for more details.

First, let's consider a join query against a single repository. In that scenario, there is no need to define the optional repo argument. The query will look like this

logscale
primary query
| join({subquery}, field=arg1, key=arg2)

To summarize the join() function operation:

  • The primary query is any valid LogScale query. This provides the first set of results you wish to work with.

  • The arguments you pass to the join() function define what results should be combined and how.

  • The sub-query is any valid LogScale query. This provides the second set of results you wish to combine with those from your primary query.

  • The field parameter's value (arg1) defines the field name in the primary query. This value will be used to match data in the sub-query.

  • The key parameter's value (arg2) defines the field name in the sub-query. This value will be used to match data in the primary query.

If the field names being used in field and key are identical, you can use the simplified syntax

logscale
primary query
| field =~ join({subquery})

In the above example, the field argument would be the actual field name. For example, if both queries share a field named host that you wish to join on, you would use:

logscale
primary query
| host =~ join({subquery})

Using start and end Times

When running a join() query, the start and end times for the span of the sub-query in the join can be be specified using the start and end parameters. This bounds the sub-query within the time range given.

Using these two options, the parameters are relative to the start and end time of the primary query. For example, if the primary query is executed with a timespan of a the current day, setting start to 2h will be 2 hours before the start of the current day.

The query below, if run with the default time of 1 day until now (12:20 on 23rd October), wont show users not known about 5 hours before the query was executed:

logscale
type = "authentication"
| join({type != "authentication"},field=user,key=authuser,include=[port],start=24h,end=5h)
| groupby(user,function=[max("@timestamp"),min("@timestamp")])
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_max,as="maxtime")
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_min,as="mintime")

This creates this output:

user_max_minmaxtimemintime
bsmith1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tjones1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed

Removing the the start and end parameters gives the full set:

user_max_minmaxtimemintime
bsmith1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tjones1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tmcdonald1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed

Joining Two Repositories

In addition to joining two queries against a single repository, the join() function can also be used to return a result set from more than two repositories. To do so, use the optional repo parameter to define the repository (as arg3) you want the sub-query to run against.

logscale
primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)

Note

The user running the join() query must have permissions on both repositories.