Finds patterns involving multiple events linked with specified correlation keys.

The correlate() function is given a pattern to search for. The pattern describes multiple events and their relations, for instance: an event of type A, an event of type B and an event of type C, where the A and B events have the same id1 value and the A event's id2 value has the same value as the C event's id3 value.

correlate() then searches for groups of events matching that pattern. These groups of events in this context are referred to as a constellation.

The pattern may have temporal constraints - that is, there is a further requirement that the events must come in a specific order (see the sequence parameter).

Constellations may either involve events from anywhere in the entire search timespan, or a constraint may be added that all events of the pattern must be within a certain time interval from each other (see the within parameter).

For a simple example, when searching for suspicious login behavior, looking for a sequence of events involving failed and successful logins may require looking for a sequence of failed logins using the same login ID across multiple services. To further investigate, correlate() can identify a specific number of failure or successes, and/or identify those events within a specific timeframe. Multiple failed logins to multiple services with the same user ID within a short period of time could indicate a compromise event.

ParameterTypeRequiredDefault ValueDescription
globalConstraintsarray of stringsoptional[a]   A list of fields which all events within a constellation must agree on.
includeConstraintValuesbooleanoptional[a] true Whether to implicitly include the linked constraint fields from each query in the event output.
   Values
   falseDo not implicitly include fields from the constraints in the result set
   trueInclude fields from the constraints in the result set
includeMatchesOnceOnlybooleanoptional[a] false Matches events in the result set only once for each supplied query
   Values
   falseAllow a single event from a query to match within multiple constellations
   trueOnly match an event once for each identified constellation
iterationLimitintegeroptional[a] 5 Specifies a limit to the number of times correlate() will iterate over the events. This setting controls how much effort the function will spend to reduce the candidate event set into something manageable. If the candidate set still does not fit into memory after that point, a partial result will be reported.
  Maximum10 
  Controlling Variables
  

DEFAULT_GRAPHQUERY_ITERATION_LIMIT

Variable default: 5

  

MAX_GRAPHQUERY_ITERATION_LIMIT

Variable default: 10

jitterTolerancerelative-timeoptional[a]  

A tolerance that makes matches less sensitive to the ordering of events when timestamps are very close in time, have different precision, or a known time skew (i.e. ingest delay). Increasing this value risks false positives in cases where order matters.

Only relevant if sequence is set to true.

The jitter tolerance must be less than or equal to within.

maxPerRootintegeroptional[a] 1 Set the max number of matches to use from the root query (first query).
query[b]queryrequired   A list of the queries to be executed. Each query has the format QUERY_NAME: { QUERY } [ATTRIBUTE=VALUE], specifying the name, query definition and optional attributes.
rootstringoptional[a] first query Name of the query to be used as the root node for the constellation.
sequencebooleanoptional[a] false Whether events in a constellation must occur in the order they are declared.
   Values
   falseEvents in a constellation can occur in any order
   trueEvents in a constellation must occur in the order that queries are defined
sequenceByarrayoptional[a]   When sequence=true, this specifies the event ordering to use. Contains a list of fields to compare by. Defaults to the timestamp. Requires sequence=true.
withinrelative-timeoptional[a]   Defines the maximum time span between earliest and latest matching events within a constellation. A smaller timeframe (like 15m) ensures tighter temporal correlation but may miss related events, while larger values allow for detecting longer-running activities. The value cannot be more than the time window of the parent query.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name query can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

To create a correlation query, specify the following elements:

  • Query criteria — Define two or more queries. Each query has a unique name that identifies the corresponding returned events.

  • Connections — Defines how events from different queries relate to each other to produce a correlation. These relationships determine which combinations of events constitute a valid match.

  • Correlation constraints (optional) — Define time-frame or sequence based constraints.

    • Sequence: enforcing the constellation (matched events) to occur in chronological order.

    • Within: enforcing the constellation to occur within a specific time frame.

    sequence and within are constraints on the correlation used to determine how the data should be compared.

The correlate() function is a type of query that can iterate over data multiple times and use the information extracted in previous iterations to improve the result. This allows for events from multiple queries to be matched against each other, and also connected to each other by their time span or sequence.

Terminology for correlate()

correlate() is a special function which introduces unique concepts which do not always have counterparts elsewhere in LogScale. Here is a list of terms for the different things involved in expressing a correlate() query.

  • Query: a LogScale query defining events of a particular type. The queries are named, so that they can be referred to.

    In the above example, there are three queries defining the kinds of events we're interested in.

  • Link: a connection between two queries, stating that we're interested in events which share a value in particular fields. A link goes from one field in one query to one field in another query. The two fields may have different names.

    Occasionally, we're interested in more than two queries being connected through the same fields. (In the above example, this is the case for the ComputerName fields.) This can be expressed with more than one link, or in some cases with a global constraint.

  • Constraint: the criteria a set of events need to fulfill to be relevant, beside each event fulfilling their respective queries. The primary kind of constraint is the links, but there are others.

  • Constellation: A set of events which fulfill all of the criteria given, including the constraints. The result of the function is a list of constellations.

  • Global constraint: When all of the events need to agree on a particular field, this can be expressed as a global constraint on that field. This is a more concise way than writing the same constraint as a number of links.

  • Temporal constraints: constraints concerning the relative timestamps of the events of a constellation. See Sequence constraint and Within constraint.

  • Sequence constraint: A constraint which means that events of a constellation must come in a certain order (often by timestamp). See Correlation by Sequence.

  • Within constraint: A constraint which means that events of a constellation must occur 'close together' in time — with a given timespan. See Correlation by Timeframe.

  • Query Graph: A diagram of the queries and their relations. When you run a correlate() query, you can find a visualization of the query graph in the Query Graph tab. This can be used for instance to check whether the query is expressing the right event pattern and align your mental model.

correlate() Operation

The correlate() function has specific implementation and operational considerations, outlined below.

Comparison to other functions

The correlate() function is similar to using join() and defineTable() with match(), in that it pulls together different events through a common correlation key.

correlate() differs in how it operates, though, and therefore provides different properties:

  • When using for join() and defineTable() the relations between the two queries to be joined is asymmetric, for example a left or right join may match more events on the right or the left side. See Query Joins and Lookups for more information.

    By comparison, within correlate() it is symmetric, the events matched will be corrsponding; correlate() is explicitly matching events according to the link criteria and constraints.

  • While join() and defineTable()+match() can only join two queries, correlate() can join multiple queries.

  • The limitation for join() and defineTable() is that the subquery result must fit in a query's result size quota. For correlate(), the combined relevant events must fit within the quota. However, the correlation keys can be used to narrow down the set of relevant events for all of the queries.

  • This means that correlate() is particularly useful for use cases where:

    • Three or more queries are to be correlated, but the correlation key relationships are such that they cannot be expressed with regular joins.

    • Two or more queries are to be joined, but each in isolation yields too large a result - as long as the correlation constraints narrow down the set of relevant events sufficiently. (Correlations are rare, but events which might be relevant are not.)

Placement restrictions

The correlate() function cannot be used after aggregators or inside function parameters of other functions (excepting subqueries which are run as their own independent queries, such as in defineTable()). It also cannot be used after source functions such ascreateEvents() or readFile().

Output

The output of the function is a list of rows, one per constellation that satisfies the specified constraints. See Output Format, which also describes how to select the fields included in the output.

The syntax of the correlate() function is like that of a normal function, except that it contains queries which are named, and that each query may end with a number of link constraints. In addition, certain attributes may be specified per query. The structure is like this:

logscale
correlate(
  NAME1: { QUERY1 [| LINK]* } [QUERYATTR],
  NAME2: { QUERY2 [| LINK]* } [QUERYATTR],
  NAME#: { QUERY# [| LINK]* } [QUERYATTR],
  OPTIONS
)

For example:

logscale
correlate(
  LoginFail1 : { logon = FAILURE },
  LoginFail2 : { logon = FAILURE },
  LoginSuccess : { logon = SUCCESS } include: [user.email,platform],
  globalConstraints=[user.id],
  sequence=true,
  within=15m
)
| groupBy([LoginSuccess.user.email, LoginSuccess.platform])

In the above example:

  • There are three named queries, LoginFail1, LoginFail2, LoginSuccess. LoginFail1 matches failed logins, LoginFail2 matches failed logins, LoginSuccess matches successful logins.

    Note

    The underlying data may have come from multiple sources, so there are small differences in the field names used for common elements, for example the login email is in both user.email and user.id. The correlate() function supports these differences by allowing for matching across different fields.

    For more information on the query format and options, see Defining a Query Node.

  • In the queries LoginFail1 and LoginFail2, the connection is defined as UserID, using the source event field user.id, the email used to attempt the login.

  • In the query LoginSuccess, the connection field is the source event field user.email which is the email of the successfully logged in user.

  • The globalConstraints parameter defines how the values of each query are correlated with each other.

  • The sequence and within are constraints on the correlation used to determine how the data should be compared.

    In this example, we are specifically looking for events in the given order, and within 15 minutes.

Note

When running the query and visualising data in a Table widget, for better readability the UI automatically groups the fields by prefix: for more information on this format setting, see Group fields by prefix.

A more typical example might be:

logscale
correlate(
    AuthError: { 
        #event_type="authentication_error" 
    } include: [username, error_code, service],
    DatabaseError: { 
        #event_type="database_error" 
    } include: [query_type, table_name, error_message],
    within=1h,
    globalConstraints=[username]
)

In thie example, we are looking for a number of failure issues occuring within a short period of time across different applications:

  • The authentication error might happen first (user can't log in, then database queries fail)

  • OR the database error might happen first (database is down, then auth systems that depend on it fail)

Defining a Query Node

A query node must be defined in a specific way to be used as part of a correlation query.

A query node requires the following elements:

  • NAME — Identifies matching events in output

  • QUERY — Specifies the search criteria

  • LINK — Specifies link constraints between events

  • QUERYATTR — Sets query-specific parameters

Each query within the list of queries submitted can be defined using the following syntax:

logscale
NAME: { QUERY [| LINK]* } QUERYATTR

Where:

  • NAME

    The identifying name for this query. The name is required, as it is needed to identify matching events in the output. NAME must be an unquoted string.

  • QUERY

    The query to be executed. Contains the search criteria and returns events for correlation analysis and should define a filter for the events to be identified and matched.

    Important

    The query cannot not contain aggregate query functions.

  • LINK

    The link variable declaration; that is, the constraint to use when matching across events. It uses the <=> operator to establish connections.

    The link variable creates relationships between event fields, defining the fields to be used as common keys when joining queries. This can be specified either for each query, or by specifying a common field shared across all queries within the correlation.

    • The constraint works both as a search requirement and as the method of joining across all the events across all of the queries. For example:

      logscale
      "LoginSuccess" : { logon = "SUCCESS" | user.id <=> LoginFail1.user.id}

      The query will match for a specific user with a successful login, and match the field in the specified query. In this case, user.id in the LoginFail1 query. The effect here is for the correlation to look specifically for matching events for the user ID across preceding filter query, and then using the common field across all the queries in the correlation.

    • Field correlation matches are defined using the <=> operator. For example:

      logscale
      ComputerName <=> QueryName.FieldName

      Uses the field name ComputerName from the events returned by this query. To define a common field across all queries, use the globalConstraints parameter. When matching events, the corresponding fields will be used to join the events together across all the queries in the set.

    • The source field names for each query do not need to be same; the role on the constraint in this case is to define the common value to be used to join the events.

    • Multiple fields can be defined to act as connection fields between queries. For example:

      logscale
      correlate(
        "LoginA" : { logon = "FAILURE" | user.id <=> LoginB.user.id},
        "LoginB" : { logon = "FAILURE" | user.id <=> LoginC.user.email | host.ip <=> LoginC.host.ip},
        "LoginC" : { logon = "SUCCESS" | user.email <=> UserID | host.ip <=> LoginB.host.ip},
        sequence=true,
        within=15m
      )

      In this case, matches are made for both connection fields. The connection fields do not need to be consistent across each query, so that different connections can be supported between different combinations of queries. In the preceeding, we are explicitly looking at the matching of both the user.id and host.ip in the last two queries, meaning that the first failure could occur on any host, but the success and the failure queries must match on both the host and user ID.

    Important

    The <=> is the only operator supported for linking across events within the correlate(). For instance, the following will not work:

    Invalid Example for Demonstration - DO NOT USE
    logscale
    #event_simpleName=/DnsRequest/
    | DomainName=/(?<sub_domain>.+).azurewebsites.net/
    | DualRequest=1
    | correlate(
      DomainOne: { sub_domain=* } include: [#cloud, cid, aid, ContextProcessId, DomainName, sub_domain],
      DomainTwo: { sub_domain=* | sub_domain != DomainOne.sub_domain } include: [#cloud, cid, aid, ContextProcessId, DomainName, sub_domain],
      within=1m, sequence=true, globalConstraints=[#cloud, cid, aid, ContextProcessId]
      )

    In this case, the query will be comparing sub_domain and the string DomainOne.sub_domain.

  • QUERYATTR

    One or more attributes specific to this query. For example, the list of fields to extract from each event matching this query can be defined here to provide additional information as part of the result set.

    Supported attributes:

    • include

      Enables selection of additional fields to be included in the events returned by this specific query. By default, the fields added in the event stream from this query will only include the fields defined in the connection constraints. To switch off inclusion of the constraint values in the dataset, set includeConstraintValues to false.

      To specify additional fields, supply an array of fields. For example:

      logscale
      "LoginC" : { logon = "SUCCESS" | user.id <=> UserID} include:[firstname,lastname,computername]

      A wildcard can be used to include all fields:

      logscale
      "LoginC" : { logon = "SUCCESS" | user.id <=> UserID} include:*

      This will include all fields, but also adds additional memory overhead to include the data. On very large queries, this may have a significant impact on performance and memory usage. We recommend using * only in the beginning where you author the query to look at the output result, then refine the include with the explicit fields you are interested in in the input before including it in your detection pipeline for instance running it as a trigger.

Output Format

Matching events are returned as a set of unified events where each field in the returned event is composed from the query name and the individual field. For example, given the query:

logscale
correlate(
  "LoginA" : { logon = "FAILURE" | user.email <=> UserID},
  "LoginB" : { logon = "FAILURE" | user.email <=> UserID | host.ip <=> HostIP},
  "LoginC" : { logon = "SUCCESS" | user.email <=> UserID | host.ip <=> HostIP},
  sequence=true,
  within=15m
)

Matching events would be returned with the following fields derived from the queries:

  • LoginA.user.email

  • LoginB.user.email

  • LoginC.user.email

  • LoginB.host.ip

  • LoginC.host.ip

Additional fields can be added using the include parameter.

When viewing within the UI, the Group fields by prefix option can be used to view the events by their correlated prefix. This is enabled by default when a correlate() function is used.

For example:

In the display, the output shows:

  • A row for each match

  • For each column in the original correlate() query, a column for the corresponding fields returned.

Clicking on the ⋮ for a row will show the source events that make up the constallation, as seen in the video below:

All event results include the following additional fields as standard:

Correlation Options

The correlate() function includes a number of additional parameters that can be used to adjust the behavior of the correlation:

  • Correlating by Sequence

    By default, correlation looks for matching constellations of events according to the constraint. To look for a sequence of events that match each query in the sequence to the correlate() function, see Correlation by Sequence for more information.

  • Correlating by Timeframe

    By default, correlation looks for matching constellations of events according to the constraint. To look for a group of events within a given timespan that match each query in the correlate() function, see Correlation by Timeframe for more information.

These options can be used separately, or together; correlating by a specific sequence within a specific timeframe.

Correlation by Sequence

Set sequence=true to require events to match in chronological order.

When matching events across each query within correlate() one or more constraints can be added to the selection, which control how matches across each event set are determined.

Using sequence set to true forces the constraint and correlation to require matching events to occur in sequence (by their @timestamp by default) as defined in the list of queries. To change the field to be used for the order, use the sequenceBy parameter.

For example, the query:

logscale
correlate(
  LoginA: { logon = "FAILURE" },
  LoginB: { logon = "FAILURE" | user.id <=> LoginA.UserID },
  LoginC: { logon = "SUCCESS" | user.email <=> LoginA.UserID },
  sequence=true
)

Forces correlation only to occur if there were two login failures, followed by a success for this specific user. Other combinations (for example: SUCCESS, FAILURE, SUCCESS, or FAILURE, SUCCESS, SUCCESS) would not correlate.

To visualize, the following event sequence would match:

--- displayMode: compact --- %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1hour todayMarker off section Correlation LoginA :crit, 07:00, 10m LoginB :crit, 07:40, 10m LoginC :crit, 08:00, 20m

While this sequence would not correlate, because the sequence is incorrect:

--- displayMode: compact --- %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1hour todayMarker off section Correlation LoginA :crit, 07:00, 10m LoginC :crit, 07:40, 10m LoginB :crit, 08:20, 20m

The identification of this sequence covers the overall query timespan. For example, if the query is searching over the last 24 hours, the sequence will be matched if that sequence occurs within that 24 hour timeframe. If the event occurred multiple times, it will match multiple times. To look for that specific sequence within a smaller timeframe, use the within parameter.

By default, individual events can appear in multiple correlation sets. To limit it so that each event can only appear once in each event set, set the includeMatchesOnceOnly parameter to true.

Correlation Field (sequenceBy Parameter)

By default, correlate() will sequence by @timestamp and @timestamp.nanos or @ingesttimestamp based on the time field chosen when the query was submitted.

If sequenceBy is set, the first field in the array will be compared between two events, in the event they are equal, correlate() will try to compare by the next field and so on. The @id field will automatically be used in the end as the final tiebreaker.

For example, if the query has @ingesttimestamp, but you want to sequence by @timestamp and @timestamp.nanos:

logscale Syntax
sequenceBy=[@timestamp, @timestamp.nanos]

If the income events have an order, to sequence lexicographically by order:

logscale Syntax
sequenceBy=[order]
Controlling Matches with Jitter Tolerance

Because events can be registered with minor time differences, using correlated queries on event data with a specific sequence may not be identified correctly. Events occurring out of sequence due to small differences in the timestamps means that the events do not occur in the required sequence.

To account for this, correlate() supports a jitterTolerance parameter which adds a tolerance to the sequence correlation to allow for this difference. Use of this option is only valid if sequence is set to true. The tolerance allows for matches within the given relative time. For example, a jitterTolerance of 1m would allow for the sequence to match even if individual events were out of sequence within a minute of each other. Note that this does not change the value of the within parameter, which defines the timespan of the overall sequence, it only affects how individual events are matched sequentially, for example allowing a single event to occur up to a minute before, rather than after, an event.

For example, with the following query:

logscale
correlate(
  LoginA: { logon = "FAILURE" },
  LoginB: { logon = "FAILURE" | user.id <=> LoginA.UserID },
  LoginC: { logon = "SUCCESS" | user.email <=> LoginA.UserID },
  sequence=true,
  within=5m,
  jitterTolerance=2m
)

The corresponding event sequence is shown below, with events in red matching the correlation.

--- displayMode: compact --- gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1min todayMarker off section Events Failure :crit, 07:01, 1m Failure :crit, 07:05, 1m Success :crit, 07:05, 1m section Event A Tolerance :done, 07:00, 1m Failure :crit, 07:01, 1m Tolerance :done, 07:02, 1m section Event B Tolerance :crit, 07:04, 1m Failure :done, 07:05, 1m Tolerance :done, 07:06, 1m section Event C Tolerance :done, 07:04, 1m Success :crit, 07:05, 1m Tolerance :done, 07:06, 1m section Correlation Correlation A :crit, 07:01, 5m

Event B and C (in red) will match the correlation (which is matching for Failure, Failure and Success, even though the sequence don't occur perfectly in sequence because the jitter tolerance means that the event overlap across the events still matches the required event sequence.

Correlation by Timeframe

Use the within parameter and Relative Time Syntax to specify time windows for correlation matches.

For example, in the query:

logscale
correlate(
  LoginA: { logon = "FAILURE" },
  LoginB: { logon = "FAILURE" | user.id <=> LoginA.UserID },
  LoginC: { logon = "SUCCESS" | user.email <=> LoginA.UserID },
  within=5m
)

Each of the logon events across LoginA, LoginB, and LoginC must occur within 5 minutes of each to be considered a match. During this time, the correlation may match multiple times:

--- displayMode: compact --- gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1min todayMarker off section A Failure :crit, 07:01, 1m Failure :crit, 07:02, 1m Success :crit, 07:05, 1m Failure :crit, 07:08, 1m Failure :crit, 07:10, 1m Success :crit, 07:11, 1m Failure :crit, 07:12, 1m Failure :crit, 07:14, 1m Success :crit, 07:18, 1m section B Correlation A :crit, 07:01, 5m Correlation B :crit, 07:08, 4m Correlation C :a10, 07:12, 7m

In the above sequence, the first two sets of events (Correlate A and B) match, because they both occur within 5 minutes of each other, but the third correlation (Correlate C) does not match because the span of the events is outside the timespan.

Identifying Multiple Matches using maxPerRoot

When looking for matching events, correlate() iterates over the events for the given timespan multiple times and looks for matching events across each defined query. By default, the query will match the event correlation only once per root query. The root query is by default the first query defined in the correlation. The root parameter can be used to select an alternative root query.

When iterating over the events, correlate() first looks for matches for Query A, then looks for matching events in Query B, and then determines whether the sequence or timeframe matches apply. With maxPerRoot set to 1, the function will return the first match, to return additional matches that originate from query A that should be matched. For example, consider the event sequence below:

--- displayMode: compact --- gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1min todayMarker off section Events A Success :crit, 07:01, 1m B Failure :crit, 07:03, 1m C Failure :crit, 07:04, 1m D Failure :crit, 07:08, 1m E Failure :crit, 07:10, 1m

There are 5 events, one with success (labelled A), and four with failures (B-E). With the query:

logscale
correlate(
  LoginA : { logon = "SUCCESS" | user.email <=> LoginB.user.id },
  LoginB : { logon = "FAILURE" | user.id <=> LoginA.user.email },
  LoginC : { logon = "FAILURE" | user.id <=> LoginA.user.email },
  globalConstraints=[user.id]
  within=10m
)

With maxPerRoot set to 1, only the first correlation (X) will be returned (matching events A, B and C). But within the given timespan, there is a second correlation, events A, D and E also match (Success, Failure, Failure). To return both correlations, maxPerRoot must be set to 2:

--- displayMode: compact --- gantt dateFormat HH:mm axisFormat %H:%M tickInterval 1min todayMarker off section Events E1 Success :crit, 07:01, 1m E2 Failure :crit, 07:03, 1m E3 Failure :crit, 07:04, 1m E4 Failure :crit, 07:08, 1m E5 Failure :crit, 07:10, 1m section Correlation X Correlation A :crit, 07:01, 5m section Correlation Y Correlation B :crit, 07:01, 1m Correlation B :crit, 07:08, 3m

When Should I Use correlate()

You can use correlate() when:

  • You want to pull together different events, through some correlation keys that conceptually link the events together.

  • The event patterns you're looking for can be expressed as a graph of events and these links (see below), plus optionally some timing constraints.

  • The result set is moderately small – that is, tens of thousands of results rather than millions.

    In general, correlate() is useful for finding the small number of events in a large volume of data. It is not designed to build or generate statistics over events. For that, use Query Joins and Lookups.

  • correlate() can also be used efficient when processing large volumes of data where a typical join may fail due to the volume of information. Joins require a large volume of data to bwe kept in memory for processing.

How do I use correlate()?

The function is perhaps best explained using an example.

Let's first look at the use case and how it can be visualized as a graph, then how to solve it, and finally some terminology which will be used in the rest of this documentation.

An example

Suppose you're looking for a particular pattern of suspicious activity in your logs: an Outlook process starts another process which then begins writing in a particular part of the file system.

That is, log-wise, we're looking for the combination of three things:

  • A ProcessInfo event which shows that a particular process is an outlook.exe process.

  • A ProcessSpawned event which shows that this process spawned some other process.

  • A FileOpenedForWrite event which tells that this other process started writing a file somewhere in the C:\companydata folder.

All happening on the same computer, of course.

Neither of these three kinds of events is particular rare or suspicious in themselves; it is the combination that is suspicious.

As a graph

The relations between the events we want to find can be expressed in the form of a diagram – a graph, with a node for each of the three events:

graph LR PI[ProcessInfo event] PS[ProcessSpawned event] FO[FileOpenedForWrite event] CN((ComputerName)) PI <--> |linked by parent process ID| PS PS <--> |linked by child process ID| FO CN --> PI CN --> PS CN --> FO subgraph "All three linked by ComputerName" PI PS FO end
As a correlate() query

This use case can be solved using correlate() as follows:

logscale
correlate(
  ParentInfo: {
    #event=ProcessInfo
    | FileName=outlook.exe
  },
  ChildSpawn: {
    #event=ProcessSpawned
    | ParentProcessID <=> ParentInfo.ProcessID
    | ChildProcessID <=> ChildWriting.ProcessID
    | ComputerName <=> ParentInfo.ComputerName
  },
  ChildWriting: {
    #event=FileOpenedForWrite
    | FilePath="C:\\companydata\\*"
    | ComputerName <=> ParentInfo.ComputerName
  }
)

This is basically just the above diagram expressed as a LogScale query.

Let's look at its parts:

correlate(
  ParentInfo: {
    #event=ProcessInfo
    | FileName=outlook.exe
  },
   ChildSpawn: {
    #event=ProcessSpawned
    | ParentProcessID <=> ParentInfo.ProcessID
    | ChildProcessID <=> ChildWriting.ProcessID
    | ComputerName <=> ParentInfo.ComputerName
  },
   ChildWriting: {
    #event=FileOpenedForWrite
    | FilePath="C:\\companydata\\*"
    | ComputerName <=> ParentInfo.ComputerName
  }
)

The green parts are three queries expressing how to find each of the three kinds of events in isolation.

The yellow parts are names we give to those queries. They serve both to help make the query more readable, and as labels so that the queries can be referred to from elsewhere.

The blue parts describe the relationships — the link — between the different kinds of events. These links must be placed after the end of the queries, and use a special operator <=> called the link operator. On the left hand side is an ordinary field name; on the right side is the name of a field in a different query — this is where the query names come into the picture.

correlate() Syntax Examples

Example 1 — Look for URL's being accessed by specific users

In this example, the correlate() function is used to look for constellations of events, where specific users access the same URL. No order constraint is defined.

correlate(
  UserA: { userid=peter },
  UserB: { userid=geeta },
  UserC: { userid=christian },
  globalConstraints=[url]
)

The Query graph tab in the UI represents the relationship between the correlated events (click the green node to display the fields as labels on the links):

Example 2 — Look for a user visiting a URL and getting different statuses

In this example, the correlate() function is used to look for the constellation of events, where a user gets an error response (for example statuscode=4xx) when visiting a URL the first time, followed by a successful response (statuscode=200) the second time they visit the URL. The parameter sequence=true is set to require events to match in chronological order.

correlate(
  ErrorResponse: { statuscode=/4\d\d/ },
  SuccessResponse: { statuscode=200 | userid <=> ErrorResponse.userid | url <=> ErrorResponse.url },
  sequence=true
)

The UI's Query graph tab represents the relationship between the correlated events (click the green nodes to display the fields as labels on the links):

correlate()Examples

Click + next to an example below to get the full details.

Correlate AWS Federation Token Generation with Console Logins

Correlate AWS CloudTrail events (GetFederationToken action and ConsoleLogin action) by the same user within a 60-minute window using the correlate() function

Query
logscale
#Vendor="aws" #event.kind="event" #event.module="cloudtrail"
| correlate(
    GetFederationToken:{event.action="GetFederationToken"},
    ConsoleLogin:{event.action="ConsoleLogin" 
    | user.name <=> GetFederationToken.user.name},
sequence=true,within=60m)
Introduction

In this example, the correlate() function is used to match AWS GetFederationToken events with corresponding ConsoleLogin events for the same user within a 60-minute window, ensuring the events occur in sequence.

The correlate() function finds relationships between the events within the GetFederationToken and ConsoleLogin queries. sequence=true means that events in a constellation must occur in the order that the queries are defined within the time window

Example incoming data might look like this:

@timestamp#Vendor#event.kind#event.moduleevent.actionuser.namesource.ipaws.cloudtrail.event_id
2023-06-15T08:00:00ZawseventcloudtrailGetFederationTokenalice.smith10.0.1.100a1b2c3d4
2023-06-15T08:01:30ZawseventcloudtrailConsoleLoginalice.smith10.0.1.100e5f6g7h8
2023-06-15T09:15:00ZawseventcloudtrailGetFederationTokenbob.jones10.0.2.200i9j0k1l2
2023-06-15T10:25:00ZawseventcloudtrailConsoleLoginbob.jones10.0.2.200m3n4o5p6
2023-06-15T10:30:00ZawseventcloudtrailGetFederationTokencharlie.brown10.0.3.150q7r8s9t0
2023-06-15T11:45:00ZawseventcloudtrailConsoleLogincharlie.brown10.0.3.150u1v2w3x4
2023-06-15T11:00:00ZawseventcloudtrailGetFederationTokendave.wilson10.0.4.175y5z6a7b8
2023-06-15T12:15:00ZawseventcloudtrailConsoleLogindave.wilson10.0.4.175c9d0e1f2
2023-06-15T12:45:00ZawseventcloudtrailGetFederationTokeneve.parker10.0.5.225g3h4i5j6
2023-06-15T12:46:15ZawseventcloudtrailConsoleLogineve.parker10.0.5.225k7l8m9n0
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    #Vendor="aws" #event.kind="event" #event.module="cloudtrail"

    Filters for events from #Vendor=aws with #event.kind=event and #event.module=cloudtrail.

  3. logscale
    | correlate(
        GetFederationToken:{event.action="GetFederationToken"},

    Defines the first query named GetFederationToken to match AWS federation token generation events. Filters for events with event.action=GetFederationToken which captures all AWS Security Token Service (STS) federation token creation requests.

  4. logscale
    ConsoleLogin:{event.action="ConsoleLogin" 
        | user.name <=> GetFederationToken.user.name},

    Defines the second query named ConsoleLogin to match AWS console login attempts. Filters for events with event.action=ConsoleLogin which captures all AWS Management Console authentication events.

    The correlation relationship (condition) is specified using the <=> operator which requires exact matches between fields (field correlation matches).

    The user.name field from this ConsoleLogin event must match the user.name field from the GetFederationToken event. The corresponding fields will be used to join the events together across all the queries in the set. This ensures that events will only be correlated when they relate to the same user's authentication sequence, preventing false correlations between different users' token generations and logins.

  5. logscale
    sequence=true,within=60m)

    Sets correlation parameters:

    • sequence=true ensures that GetFederationToken events must occur before ConsoleLogin events.

    • within=60m specifies that the console login must occur within 60 minutes of token generation, matching the typical federation token validity period.

    By default, the query will match the event correlation only once per root query (return the first match), as the maxPerRoot parameter is not specified.

    The correlate() function outputs each pair of matched events as a single event containing fields from both sources, prefixed with their respective subquery names (for example, GetFederationToken.user.name, consoleLogin.@timestamp).

  6. Event Result set.

Summary and Results

The query is used to track the complete authentication flow for federated users accessing the AWS Console, ensuring proper sequence and timing of authentication events. The correlate() function tracks authentication sequences by matching federation token generation events with subsequent console login attempts within a specified time window.

This query is useful, for example, to monitor federation token usage patterns, detect potential token sharing or misuse, and verify that console logins occur within expected timeframes after token generation.

Sample output from the incoming example data:

@timestampGetFederationToken.user.nameGetFederationToken.source.ipGetFederationToken.aws.cloudtrail.event_idConsoleLogin.@timestampConsoleLogin.source.ipConsoleLogin.aws.cloudtrail.event_idTimeBetweenEvents
2023-06-15T08:00:00Zalice.smith10.0.1.100a1b2c3d42023-06-15T08:01:30Z10.0.1.100e5f6g7h890
2023-06-15T09:15:00Zbob.jones10.0.2.200i9j0k1l22023-06-15T10:25:00Z10.0.2.200m3n4o5p64200
2023-06-15T12:45:00Zeve.parker10.0.5.225g3h4i5j62023-06-15T12:46:15Z10.0.5.225k7l8m9n075

Note that the following events are excluded from the results because their ConsoleLogin occurred more than 60 minutes after the GetFederationToken:

  • charlie.brown's login at 11:45:00Z (75 minutes after token generation at 10:30:00Z)

  • dave.wilson's login at 12:15:00Z (75 minutes after token generation at 11:00:00Z)

This demonstrates how the within=60m parameter filters out login attempts that occur outside the expected federation token usage window.

Note that the output includes timestamps from each query. The TimeBetweenEvents field shows the seconds between token generation and console login, useful for identifying unusual login patterns.

Correlate Inbound Email URLs with Subsequent Access Attempts

Correlate Inbound Email URLs with Subsequent Access Attempts to detect when a malicious URL in an email is subsequently accessed using the correlate() function

Query
logscale
((#repo="abnormal_security" AND #event.module="email-security") OR (#repo="corelight" AND #event.module="ids"))
| case{
    #Vendor="corelight" 
    | url.prefix := "http://" 
    | url.original := concat([url.prefix,"/",client.address,"/",url.path]); *
}
| correlate(
    emailInbound:{#event.module="email-security"},
    emailAccess:{#event.module="ids" 
    | url.original <=> emailInbound.url.original},
    sequence=true,within=1h)
Introduction

In this example, the correlate() is used to match URLs from inbound emails with subsequent HTTP requests to those URLs within a one-hour window, indicating when recipients click on email link.

The correlate() matches URLs received in emails with subsequent access attempts detected in network traffic, helping identify potential phishing or malicious link interactions.

Example incoming data might look like this:

@timestamp#repo#event.module#Vendorclient.addressurl.pathurl.originalemail.from.addressemail.to.address
2023-06-15T10:00:00Zabnormal_securityemail-securityabnormal<no value><no value>http://malicious.com/path1sender@external.comuser1@company.com
2023-06-15T10:15:30Zcorelightidscorelight10.0.1.100path1<no value><no value><no value>
2023-06-15T11:00:00Zabnormal_securityemail-securityabnormal<no value><no value>http://suspicious.net/offerphish@bad.comuser2@company.com
2023-06-15T11:05:45Zcorelightidscorelight10.0.2.200offer<no value><no value><no value>
2023-06-15T12:30:00Zabnormal_securityemail-securityabnormal<no value><no value>http://legitimate.org/docspartner@vendor.comuser3@company.com
2023-06-15T12:45:15Zcorelightidscorelight10.0.3.150docs<no value><no value><no value>
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    ((#repo="abnormal_security" AND #event.module="email-security") OR (#repo="corelight" AND #event.module="ids"))

    Filters for events that are either email-security events from the abnormal_security repository OR ids (Intrusion Detection System) events from the corelight repository.

    This filter ensures that only relevant events from these two specific security tools are processed for the subsequent correlation analysis.
  3. logscale
    | case{
        #Vendor="corelight" 
        | url.prefix := "http://" 
        | url.original := concat([url.prefix,"/",client.address,"/",url.path]); *
    }

    The case statement processes events from the corelight vendor (#Vendor="corelight") only to standardize their URL format.

    For these events, it first creates a new field named url.prefix containing the value http://. Then, using the concat() function, it constructs a complete URL string in a new field named url.original.

    This URL is built by combining the HTTP prefix, followed by a forward slash, then the client's IP address from the client.address field, another forward slash, and finally the URL path from the url.path field.

    The wildcard (*) at the end ensures all other original fields in the events are preserved.

    For example, if an event has client.address=10.0.1.100 and url.path=download/file.exe, the resulting url.original field would contain http://10.0.1.100/download/file.exe. This standardization allows for proper correlation with URLs found in email security events.

  4. logscale
    | correlate(
        emailInbound:{#event.module="email-security"},

    Defines the first query named emailInbound to match email security events. Filters for events with #event.module=email-security which captures all inbound emails containing URLs. All fields from matching events are preserved since no field restrictions are specified.

    This query forms the base pattern for correlation, and its fields (particularly url.original) will be referenced by the second query using the prefix notation (emailInbound.url.original).

    These events represent the initial detection of URLs in incoming emails, which is crucial for tracking potential phishing attempts or malicious links.

  5. logscale
    emailAccess:{#event.module="ids" 
        | url.original <=> emailInbound.url.original},

    Defines the second query named emailAccess to match URL access attempts. Filters for events with #event.module=ids which captures network traffic events.

    The correlation relationship (condition) is specified using the <=> operator which requires exact matches between fields (field correlation matches).

    The url.original field from this emailAccess event must match the url.original field from the emailInbound event. This ensures that events will only be correlated when they show access to exactly the same URL that was received in an email, helping identify when recipients click on email links.

  6. logscale
    sequence=true,within=1h)

    Sets the correlation parameters:

    • sequence=true ensures that emailInbound events must occur before emailAccess events, preventing matching of access events that occurred before the email was received.Email receipt must occur before URL access

    • within=1h specifies a one-hour maximum time window between email receipt and URL access, focusing on immediate user interactions with email links. Access attempts more than an hour after email receipt are excluded.

  7. Event Result set.

Summary and Results

The query is used to identify when users click on URLs received in emails by correlating email security events with network traffic events. The correlate() matches URLs received in emails with subsequent access attempts detected in network traffic, helping identify potential phishing or malicious link interactions.

This query is useful, for example, to detect potential security incidents where users interact with phishing emails, track the effectiveness of security awareness training, or monitor for suspicious URL access patterns.

Sample output from the incoming example data:

@timestampemailInbound.email.from.addressemailInbound.email.to.addressemailInbound.url.originalemailAccess.@timestampemailAccess.client.addressTimeBetweenEvents
2023-06-15T10:00:00Zsender@external.comuser1@company.comhttp://malicious.com/path12023-06-15T10:15:30Z10.0.1.100930
2023-06-15T11:00:00Zphish@bad.comuser2@company.comhttp://suspicious.net/offer2023-06-15T11:05:45Z10.0.2.200345
2023-06-15T12:30:00Zpartner@vendor.comuser3@company.comhttp://legitimate.org/docs2023-06-15T12:45:15Z10.0.3.150915

Correlate Two Scheduled Task Events

Correlate two scheduled task events (registration and deletion) within a 5-minute window using the correlate function

Query
logscale
correlate(
    ScheduledTaskRegistered: {
        #repo="base_sensor" #event_simpleName=ScheduledTaskRegistered RemoteIP=* 
    | upper(field=TaskName, as=scheduledTaskName)
    } include: [*],
  ScheduledTaskDeleted: {
          #repo="base_sensor" #event_simpleName=ScheduledTaskDeleted RemoteIP=* 
          | upper(field=TaskName, as=scheduledTaskName)
          | aid <=> ScheduledTaskRegistered.aid
          | scheduledTaskName <=> ScheduledTaskRegistered.scheduledTaskName
          } include: [*],
sequence=false, within=5m)
Introduction

In this example, the correlate() function is used to identify when scheduled tasks are registered and subsequently deleted by matching events based on the task identifier (aid field) and name (TaskName field) within a 5-minute window.

The correlate() function finds relationships between the events within the ScheduledTaskRegistered and ScheduledTaskDeleted queries. sequence=false means that events can occur in any order within the time window

Example incoming data might look like this:

@timestamp#repo#event_simpleNameaidTaskNameRemoteIP
2023-06-15T10:00:00Zbase_sensorScheduledTaskRegisteredaid123backup_task192.168.1.100
2023-06-15T10:02:00Zbase_sensorScheduledTaskDeletedaid123backup_task192.168.1.100
2023-06-15T10:05:00Zbase_sensorScheduledTaskRegisteredaid456cleanup_task192.168.1.101
2023-06-15T10:07:00Zbase_sensorScheduledTaskDeletedaid456cleanup_task192.168.1.101
Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    correlate(
        ScheduledTaskRegistered: {
            #repo="base_sensor" #event_simpleName=ScheduledTaskRegistered RemoteIP=* 
        | upper(field=TaskName, as=scheduledTaskName)
        } include: [*],

    Defines the first query named ScheduledTaskRegistered to match scheduled task registrations. Filters for events from #repo=base_sensor with #event_simpleName=ScheduledTaskRegistered and any RemoteIP. RemoteIP=* ensures that the field exists.

    The upper() function converts the TaskName field to uppercase and returns the converted results in a new field named scheduledTaskName.

    include: [*] ensures that the query includes all fields from matching events.

  3. logscale
    ScheduledTaskDeleted: {
              #repo="base_sensor" #event_simpleName=ScheduledTaskDeleted RemoteIP=* 
              | upper(field=TaskName, as=scheduledTaskName)
              | aid <=> ScheduledTaskRegistered.aid
              | scheduledTaskName <=> ScheduledTaskRegistered.scheduledTaskName
              } include: [*],

    Defines the second query named ScheduledTaskDeleted to match scheduled task deletions. Filters for events from #repo=base_sensor with #event_simpleName=ScheduledTaskDeleted and any RemoteIP. RemoteIP=* ensures that the field exists.

    The correlation relationships (conditions) are specified using the <=> operator which requires exact matches between fields (field correlation matches).

    The aid field from this ScheduledTaskDeleted event must match the aid field from the ScheduledTaskRegistered event, and similarly for the scheduledTaskName field. The corresponding fields will be used to join the events together across all the queries in the set. This ensures that events will only by dcorrelated when related to the same task instance.

    include: [*] ensures that the query includes all fields from matching events.

  4. logscale
    sequence=false, within=5m)

    Sets correlation parameters:

    • sequence=false allows events to match regardless of order.

      Setting the sequence parameter to false in this example is useful as deletion events could theoretically be recorded before registration events due to system delays.

    • within=5m specifies a 5-minute time window for matching events meaning that only events within 5 minutes of each other will be correlated.

    By default, the query will match the event correlation only once per root query (return the first match), as the maxPerRoot parameter is not specified.

    The correlate() function outputs each pair of matched events as a single event containing fields from both sources, prefixed with their respective subquery names (for example, ScheduledTaskRegistered.aid, ScheduledTaskDeleted.@timestamp).

  5. Event Result set.

Summary and Results

The query is used to identify and correlate scheduled task registration and deletion events for the same task within a 5-minute window.

The output demonstrates successful correlation of scheduled task registration and deletion events, showing the complete lifecycle of tasks that were both created and deleted within the specified timeframe.

This query is useful, for example, to monitor task lifecycle patterns, detect unusual task deletion behavior, or audit scheduled task management activities.

Sample output from the incoming example data:

@timestampScheduledTaskRegistered.aidScheduledTaskRegistered.scheduledTaskNameScheduledTaskRegistered.RemoteIPScheduledTaskDeleted.@timestampScheduledTaskDeleted.RemoteIP
2023-06-15T10:00:00Zaid123BACKUP_TASK192.168.1.1002023-06-15T10:02:00Z192.168.1.100
2023-06-15T10:05:00Zaid456CLEANUP_TASK192.168.1.1012023-06-15T10:07:00Z192.168.1.101

Note that the output includes timestamps from each query. In this case, this allows for analysis of the time between task creation and removal.