Using the join() Function

join() Syntax

The join() function's signature looks like this

logscale
join({subquery}, field=arg1, key=arg2)

The unnamed subquery argument, and the named field and key arguments, are required. The named repo, include, and mode parameters are optional. There are other options available. Please see the join() reference page for more details.

First, let's consider a join query against a single repository. In that scenario, there is no need to define the optional repo argument. The query will look like this

logscale
primary query
| join({subquery}, field=arg1, key=arg2)

To summarize the join() function operation:

  • The primary query is any valid LogScale query. This provides the first set of results you wish to work with.

  • The arguments you pass to the join() function define what results should be combined and how.

  • The subquery is any valid LogScale query. This provides the second set of results you wish to combine with those from your primary query.

  • The field parameter's value (arg1) defines the field name in the primary query. This value will be used to match data in the subquery.

  • The key parameter's value (arg2) defines the field name in the subquery. This value will be used to match data in the primary query.

If the field names being used in field and key are identical, you can use the simplified syntax

logscale
primary query
| field =~ join({subquery})

In the above example, the field argument would be the actual field name. For example, if both queries share a field named host that you wish to join on, you would use:

logscale
primary query
| host =~ join({subquery})

Using start and end Times

When running a join() query, the start and end times for the span of the subquery in the join can be be specified using the start and end parameters. This bounds the subquery within the time range given.

Using these two options, the parameters are relative to the start and end time of the primary query. For example, if the primary query is executed with a timespan of a the current day, setting start to 2h will be 2 hours before the start of the current day.

The following query, if run with the default time of 1 day until now (12:20 on 23rd October), will not show users not known about 5 hours before the query was executed:

logscale
type = "authentication"
| join({type != "authentication"},field=user,key=authuser,include=[port],start=24h,end=5h)
| groupby(user,function=[max("@timestamp"),min("@timestamp")])
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_max,as="maxtime")
| formatTime(format="%m/%d/%Y %H:%M:%S %a", field=_min,as="mintime")

This creates this output:

user_max_minmaxtimemintime
bsmith1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tjones1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed

Removing the the start and end parameters gives the full set:

user_max_minmaxtimemintime
bsmith1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tjones1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed
tmcdonald1729682282587172966321589310/23/2024 11:18:02 Wed10/23/2024 06:00:15 Wed

Joining Two Repositories

In addition to joining two queries against a single repository, the join() function can also be used to return a result set from more than two repositories. To do so, use the optional repo parameter to define the repository (as arg3) you want the subquery to run against.

logscale
primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)

Note

The user running the join() query must have permissions on both repositories.