Types of Join

Joins match events from the two data sets from the primary query and sub query, and are typically described according to how the union of the two sets is derived. The two sets of events are matched to each other by using one or more fields from the primary query to corresponding fields in the sub-query. For example, given the following structure:

%%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% classDiagram direction LR weblog: ipaddr weblog: url weblog: authuser authentication: user authentication: email authentication: firstname authentication: lastname weblog --> authentication

The weblog dataset has these events:

ipaddrurlauthuser
10.0.0.1index.htmlbsmith
10.0.0.43index.html<no value>
10.0.0.4map.htmltjones
10.0.0.84map.htmlpalvarez

The authentication dataset has these events:

useremailfirstnamelastname
bsmithbob.smith@example.comBobSmith
tjonestrevor.jones@example.comTrevorJones
tmcdonaldtay.mcdonald@example.comTayMcDonald

To perform a join operation, the query must specify how the two datasets relate to each other, typically by declaring the fields that contain the information to be used as the match between each event. For example, with the two datasets above, the authuser field in weblog can be matched to the user field in authentication.

Diagram showing how the fields in two datasets can be related to each other during a join operation

Note

It is common to refer to the two datasets in a join query as on the left or the right, as this is used to describe the result of the join operation. Within LogScale, the primary query is the left dataset, and the sub-query is the right dataset.

If weblog is the primary query (left) and authentication is the sub-query (right), then weblog is considered to be on the left then we can perform left joins, inner joins, and right joins.

Left Joins

Image showing the intersection of two datasets when using a left join

Using a left join, the query returns every row in the table 'on the left', or the primary query, and also displays any rows from the sub-query that match using the specified fields in each dataset. This shows the colour section in the figure, everything from the left, and matching rows from the right.

Left joins are used to join or display the related information from the sub-query. In the example, a left join can display the information about the authenticated user by matching the authuser user in weblog with the user in authentication:

ipaddrurlauthuseruseremailfirstnamelastname
10.0.0.1index.htmlbsmithbsmithbob.smith@example.comBobSmith
10.0.0.43index.html<no value>    
10.0.0.4map.htmltjonestjonestrevor.jones@example.comTrevorJones

The query for this operation uses join():

logscale
type=weblog
| join({type="authentication"},field=authuser,key=user,mode=left)

In a left join:

  • All the events from the primary query are returned, even if there is no corresponding entry in the sub-query. For example, in returned list of events, the second row does not contain any matching events from authentication and is blank.

  • The result set will always contain the same number of events as in the primary query.

Left joins can be achieved using:

Comparison to Left Join in SQL

In SQL databases, the primary query in LogScale is equivalent to the the left table in SQL, or the table with the primary key. The sub query is equivalent to the table with the foreign key.

For example, for a left join in LogScale, the query for the earlier examples would be:

logscale
type=#weblog
| url="index.html"
| join({#type=authentication},field=authuser,key=user,mode=left)

In SQL would be:

sql
SELECT * FROM weblog JOIN authentication ON weblog.authuser = authentication.user

Within SQL,joins are declared by using one or more clauses that show the link between the two fields; within LogScale the correlation between fields is performed by declaring the declaring the fields in each result set.

Inner Join

Image showing the intersection of two datasets when using an inner join

Inner joins return the unified set of events, or intersection, between both repositories only where matching entries exist on both sets of events:

ipaddrurlauthuseruseremailfirstnamelastname
10.0.0.1index.htmlbsmithbsmithbob.smith@example.comBobSmith
10.0.0.4map.htmltjonestjonestrevor.jones@example.comTrevorJones

The query for this operation uses join():

logscale
type=weblog
| join({type="authentication"},field=authuser,key=user,mode=inner)

In an inner join, the result set may contain a smaller set of events than in either source set of events, because only the events that match on both sides will be included. In the example, because a match for the empty username and the username palvarez cannot be found in the sub-query, no corresponding event is produced.

Comparison to Inner Joins in SQL

For a LogScale inner join:

logscale
type=#weblog
| url="index.html"
| join({#type=authentication},field=authuser,key=user,mode=inner)

In SQL would be:

sql
SELECT * FROM weblog INNER JOIN authentication ON weblog.authuser = authentication.user

The INNER JOIM keyword is similar to the classification of the query type using the mode parameter to join().

Right Joins

Image showing the intersection of two datasets when using a right join

In a right join, the query will return all of the items in the sub-query, and all the matched itesm from the primary query. Right joins can be used when matching data directly to a list of known issues or actors. With the sample data, the results would look as follows:

authuseremailfirstnameiplastnametypeurl
tjonestrevor.jones@example.comTrevor212.58.235.1Jonesauthenticationmap.html
<no value><no value><no value>10.0.0.43<no value>weblogindex.html
bsmithbob.smith@example.comBob10.0.0.1Smithauthenticationindex.html

The query for this operation uses match() with the strict parameter set to false:

logscale
type=#weblog
| match(file="auth.json",field=authuser,column=user,strict="false")

Using this parameter, the match()returns all of the rows within the file which are then matched to the incoming result set.

Comparison to Right Joins in SQL

For a LogScale inner join:

logscale
type=#weblog
| match(file="auth.json",field=authuser,column=user,strict="false")

In SQL would be:

sql
SELECT * FROM weblog RIGHT JOIN authentication ON weblog.authuser = authentication.user