Using the selfJoin() Function

The selfJoin() function enables efficient collation of information within a repository by executing two queries and joining them on a common field, similar to a traditional join operation but specifically designed for shared command fields. Through practical examples using connection data with source and destination addresses, the documentation demonstrates how selfJoin() can effectively summarize information across connection IDs, offering better performance than groupBy() for larger datasets.

The selfJoin() function does not perform a join in the traditional sense, but is designed to collate information from a repository within a query where they share a command field.

selfJoin() executes two queries, the primary query and subquery, and joins on a common field (specified by the field parameter) between the two queries in a similar fashion to join().

For example, given input data the includes the source and destination address for connections and a common connection ID:

Raw Events
from=192.168.1.15 to=10.0.0.22 connid=b36ecce120f181c6e79560c3160dd950
from=192.168.1.11 to=10.0.0.0 connid=532d011f0a5bf80e9b346e9e0dccbd0b
from=192.168.1.6 to=10.0.0.9 connid=d8b674841c6fc75f24572ccd6c9e92ea
from=192.168.1.20 to=10.0.0.11 connid=7fb64aeef6f08fd2f2724c07540adb02
from=192.168.1.14 to=10.0.0.7 connid=d8b674841c6fc75f24572ccd6c9e92ea
from=192.168.1.230 to=10.0.0.203 connid=2057644393c7ff4636d6911bfb674729
from=192.168.1.3 to=10.0.0.17 connid=1644bdaa1cad36b146955e09e0ac0319
from=192.168.1.11 to=10.0.0.13 connid=cd244fc908fb252f9a8ee5e1eb4f4d5c
from=192.168.1.22 to=10.0.0.5 connid=7b5ff133f72d6d5772f72d5db9fc5978
from=192.168.1.21 to=10.0.0.16 connid=fa59f655637787487486ab6a39c6c76e
from=192.168.1.15 to=10.0.0.0 connid=4f69ae37daf6f0f3c12cee4f0349201a
from=192.168.1.24 to=10.0.0.22 connid=b4454f0cca4d4d57f808bdc97db0cb4f
from=192.168.1.10 to=10.0.0.22 connid=904f87cd6281192c8ce4baacee1bf0c2
from=192.168.1.12 to=10.0.0.20 connid=305bb06e3a1c58a1806136b1de4a659b

The following query will summarize the information for source and destination across the connection ID:

logscale
selfJoin(field=connid,where=[{from="192.168.1.7"},{to="10.0.0.12"}])

This would output the following result:

connid
262565314144352e22375b816abbcd3c
55d29bf268390a96fc6891f8f5f3fb35
7b5ff133f72d6d5772f72d5db9fc5978
9ea512baa6655f0e27f247650f36aca5
bbad6ec69d8cc416aaefc229452586a3
cd244fc908fb252f9a8ee5e1eb4f4d5c

This process can be more efficient than using a groupBy() for larger datasets.