Types of Join
Joins match events from the two data sets from the primary query and sub query, and are typically described according to how the union of the two sets is derived. The two sets of events are matched to each other by using one or more fields from the primary query to corresponding fields in the sub-query. For example, given the following structure:
The weblog dataset has these events:
ipaddr | url | authuser |
---|---|---|
10.0.0.1 | index.html | bsmith |
10.0.0.43 | index.html | <no value> |
10.0.0.4 | map.html | tjones |
10.0.0.84 | map.html | palvarez |
The authentication dataset has these events:
user | firstname | lastname | |
---|---|---|---|
bsmith | bob.smith@example.com | Bob | Smith |
tjones | trevor.jones@example.com | Trevor | Jones |
tmcdonald | tay.mcdonald@example.com | Tay | McDonald |
To perform a join operation, the query must specify how the two datasets relate to each other, typically by declaring the fields that contain the information to be used as the match between each event. For example, with the two datasets above, the authuser field in weblog can be matched to the user field in authentication.
Note
It is common to refer to the two datasets in a join query as on the left or the right, as this is used to describe the result of the join operation. Within LogScale, the primary query is the left dataset, and the sub-query is the right dataset.
If weblog is the primary query (left) and authentication is the sub-query (right), then weblog is considered to be on the left then we can perform left joins, inner joins, and right joins.
Left Joins
Using a left join, the query returns every row in the table 'on the left', or the primary query, and also displays any rows from the sub-query that match using the specified fields in each dataset. This shows the colour section in the figure, everything from the left, and matching rows from the right.
Left joins are used to join or display the related information from the sub-query. In the example, a left join can display the information about the authenticated user by matching the authuser user in weblog with the user in authentication:
ipaddr | url | authuser | user | firstname | lastname | |
---|---|---|---|---|---|---|
10.0.0.1 | index.html | bsmith | bsmith | bob.smith@example.com | Bob | Smith |
10.0.0.43 | index.html | <no value> | ||||
10.0.0.4 | map.html | tjones | tjones | trevor.jones@example.com | Trevor | Jones |
The query for this operation uses join()
:
type=weblog
| join({type="authentication"},field=authuser,key=user,mode=left)
In a left join:
All the events from the primary query are returned, even if there is no corresponding entry in the sub-query. For example, in returned list of events, the second row does not contain any matching events from authentication and is blank.
The result set will always contain the same number of events as in the primary query.
Left joins can be achieved using:
Comparison to Left Join in SQL
In SQL databases, the primary query in LogScale is equivalent to the the left table in SQL, or the table with the primary key. The sub query is equivalent to the table with the foreign key.
For example, for a left join in LogScale, the query for the earlier examples would be:
type=#weblog
| url="index.html"
| join({#type=authentication},field=authuser,key=user,mode=left)
In SQL would be:
SELECT * FROM weblog JOIN authentication ON weblog.authuser = authentication.user
Within SQL,joins are declared by using one or more clauses that show the link between the two fields; within LogScale the correlation between fields is performed by declaring the declaring the fields in each result set.
Inner Join
Inner joins return the unified set of events, or intersection, between both repositories only where matching entries exist on both sets of events:
ipaddr | url | authuser | user | firstname | lastname | |
---|---|---|---|---|---|---|
10.0.0.1 | index.html | bsmith | bsmith | bob.smith@example.com | Bob | Smith |
10.0.0.4 | map.html | tjones | tjones | trevor.jones@example.com | Trevor | Jones |
The query for this operation uses join()
:
type=weblog
| join({type="authentication"},field=authuser,key=user,mode=inner)
In an inner join, the result set may contain a smaller set of events
than in either source set of events, because only the events that match
on both sides will be included. In the example, because a match for the
empty username and the username
palvarez
cannot be
found in the sub-query, no corresponding event is produced.
Comparison to Inner Joins in SQL
For a LogScale inner join:
type=#weblog
| url="index.html"
| join({#type=authentication},field=authuser,key=user,mode=inner)
In SQL would be:
SELECT * FROM weblog INNER JOIN authentication ON weblog.authuser = authentication.user
The INNER JOIM
keyword is similar to
the classification of the query type using the
mode
parameter to
join()
.
Right Joins
In a right join, the query will return all of the items in the sub-query, and all the matched itesm from the primary query. Right joins can be used when matching data directly to a list of known issues or actors. With the sample data, the results would look as follows:
authuser | firstname | ip | lastname | type | url | |
---|---|---|---|---|---|---|
tjones | trevor.jones@example.com | Trevor | 212.58.235.1 | Jones | authentication | map.html |
<no value> | <no value> | <no value> | 10.0.0.43 | <no value> | weblog | index.html |
bsmith | bob.smith@example.com | Bob | 10.0.0.1 | Smith | authentication | index.html |
The query for this operation uses match()
with the
strict
parameter set to
false
:
type=#weblog
| match(file="auth.json",field=authuser,column=user,strict="false")
Using this parameter, the match()
returns all of the
rows within the file which are then matched to the incoming result set.
Comparison to Right Joins in SQL
For a LogScale inner join:
type=#weblog
| match(file="auth.json",field=authuser,column=user,strict="false")
In SQL would be:
SELECT * FROM weblog RIGHT JOIN authentication ON weblog.authuser = authentication.user