FAQ: How do I do a join() statement?

The join() function is generally used when you have two query results that you'd like to combine, and both results share a common value. There is also a selfJoin() and selfJoinFilter() function for certain situations, both described in the official documentation. Please note the field names can be different: it's the value of those fields that should be identical.

Before we provide the example, some background:

  • We want to combine ProcessRollup2 and NetworkListenIP4 to find processes that have created a listener on a port.

  • ProcessRollup2 contains the process information. The field we're using in the results is TargetProcessId to designate the ID of the process.

  • NetworkListenIP4 contains the listener information. Instead of TargetProcessId, it uses ContextProcessId. This also designates the ID of the process.

  • Both TargetProcessId and ContextProcessId share a common value, e.g. 123456 would represent the same process.

And the join() to combine them:

#event_simpleName=ProcessRollup2
| join({#event_simpleName=NetworkListenIP4 LocalPort>1024 LocalPort!=0}, field=TargetProcessId, key=ContextProcessId, include=[LocalAddressIP4, LocalPort])

But what does that query actually mean?

  • The first query should be the larger of the two. In this case, #event_simpleName=ProcessRollup2 is the larger query.

  • The query within join() should be the smaller of the two queries, and is known as the subquery. It's the query within the { ... } after the join() function. The query is simple enough: #event_simpleName=NetworkListenIP4 LocalPort>1024 LocalPort!=0.

  • The field= and key= are the two fields that share a common value.

  • field= is tied to the larger query, i.e. #event_simpleName=ProcessRollup2. This is the field TargetProcessId in ProcessRollup2 events.

  • key= is tied to the smaller query, i.e. #event_simpleName=NetworkListenIP4 LocalPort>1024 LocalPort!=0. This is the field ContextProcessId in NetworkListenIP4 events.

  • The include=[ ... ] is a list of fields that should be pulled from the smaller query into the results.

And a visualization of everything that was just described:

Another example:

  • We want to find the UserName from a UserSid in a ProcessRollup2 event.

  • The ProcessRollup2 events only have a UserSid.

  • The UserIdentity events have a UserName and a UserSid.

  • The common field between them is UserSid.

Easy! We just join them together:

// Get the PR2 events.
#event_simpleName=ProcessRollup2
// Only grab the ones with a UserCid.
| UserSid=*
// Join the two together based on the UserSid.
| join({#event_simpleName=UserIdentity}, field=UserSid, include=[UserName], limit=100000)