Query Joins and Lookups

LogScale searches and processes information through a series of event data sets (see Basic Query Principles). These data sets can be derived from a repository, a search, or by importing or using a previously loaded flat file.

There are many different situations where two different sets of data need to be combined to provide clearer information or results. For example:

  • Lookup table where specific field values need to be matched to a corresponding table. For example, when matching a codename with a human-readable version.

  • Seeding a query with a static list of values in order to feed the basis of a query. For example, executing a query using a fixed list of users.

  • Combining data in a repository that contains security log information by matching the different types of recorded events. For example, combining a security event log that contains both authentication and failed login events.

  • Matching entries across repositories using one or more common fields, such as username.

Each of these examples require the use a relationship between two sets of data. Within Falcon LogScale, these two sets of data can either be a set of events (as part of a query) or a file (loaded into LogScale using Uploading Files).

The primary set of events is typically referred to as the Primary Query, and the set of related events is called the Subquery. To match the data between the two sets of data, the source field (or fields) on the primary key and use the source field in the subquery must be declared.

For example, in the following relationship diagram, there are two datasets, the weblog repository contains a list of weblog accesses from the Apache HTTP server; the authentication repository contains a list of authentication events and user information.

%%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% classDiagram direction LR weblog: ipaddr weblog: url weblog: authuser authentication: user authentication: email authentication: firstname authentication: lastname weblog --> authentication

The two datasets can be related to each other, the authuser field in weblog could be related to the user field in authentication. For example, to list the email address of the user that accessed each URL, the query would be on the weblog repository with a subquery to the authentication repository.

Note

When comparing to relational database terminology, a table in a relational database would be equivalent to the event set from a query or an uploaded file. The source field would refer to the foreign key in the query and the primary key would be the target field in the subquery.

Within LogScale, the primary query and subquery can be source from a repository, and uploaded file, or a query on these sources. In any LogScale scenario there is a distinction between the primary query, which provides the original dataset, and any subquery that finds related information. This allows for different scenarios with different event sets and sources of data:

  • Using a static file as the incoming event dataset and basis for the query, use the readFile() function. readFile() uses a file previously uploaded into the system, or included as part of a package. This can be used when you have a static dataset, the content of the file being used to see the stream of events before performing the rest of the query. For more information and examples, see Using the readFile() Function.

  • Using a static file as the subquery, to enhance or augment the content from a query. For example, using an uploaded static file from an authentication system that contains a username list to expand the user details in weblog. The match() function achieves this by enabling lookup of data in an uploaded file based on one or more keys. For more information and examples, see Using the match() Function.