Using the selfJoin()
Function
The selfJoin() function enables efficient collation of information within a repository by executing two queries and joining them on a common field, similar to a traditional join operation but specifically designed for shared command fields. Through practical examples using connection data with source and destination addresses, the documentation demonstrates how selfJoin() can effectively summarize information across connection IDs, offering better performance than groupBy() for larger datasets.
The selfJoin()
function does not perform a join in
the traditional sense, but is designed to collate information from a
repository within a query where they share a command field.
selfJoin()
executes two queries, the primary query
and subquery, and joins on a common field (specified by the
field
parameter) between
the two queries in a similar fashion to join()
.
For example, given input data the includes the source and destination address for connections and a common connection ID:
from=192.168.1.15 to=10.0.0.22 connid=b36ecce120f181c6e79560c3160dd950 |
from=192.168.1.11 to=10.0.0.0 connid=532d011f0a5bf80e9b346e9e0dccbd0b |
from=192.168.1.6 to=10.0.0.9 connid=d8b674841c6fc75f24572ccd6c9e92ea |
from=192.168.1.20 to=10.0.0.11 connid=7fb64aeef6f08fd2f2724c07540adb02 |
from=192.168.1.14 to=10.0.0.7 connid=d8b674841c6fc75f24572ccd6c9e92ea |
from=192.168.1.230 to=10.0.0.203 connid=2057644393c7ff4636d6911bfb674729 |
from=192.168.1.3 to=10.0.0.17 connid=1644bdaa1cad36b146955e09e0ac0319 |
from=192.168.1.11 to=10.0.0.13 connid=cd244fc908fb252f9a8ee5e1eb4f4d5c |
from=192.168.1.22 to=10.0.0.5 connid=7b5ff133f72d6d5772f72d5db9fc5978 |
from=192.168.1.21 to=10.0.0.16 connid=fa59f655637787487486ab6a39c6c76e |
from=192.168.1.15 to=10.0.0.0 connid=4f69ae37daf6f0f3c12cee4f0349201a |
from=192.168.1.24 to=10.0.0.22 connid=b4454f0cca4d4d57f808bdc97db0cb4f |
from=192.168.1.10 to=10.0.0.22 connid=904f87cd6281192c8ce4baacee1bf0c2 |
from=192.168.1.12 to=10.0.0.20 connid=305bb06e3a1c58a1806136b1de4a659b |
The following query will summarize the information for source and destination across the connection ID:
selfJoin(field=connid,where=[{from="192.168.1.7"},{to="10.0.0.12"}])
This would output the following result:
connid |
---|
262565314144352e22375b816abbcd3c |
55d29bf268390a96fc6891f8f5f3fb35 |
7b5ff133f72d6d5772f72d5db9fc5978 |
9ea512baa6655f0e27f247650f36aca5 |
bbad6ec69d8cc416aaefc229452586a3 |
cd244fc908fb252f9a8ee5e1eb4f4d5c |
This process can be more efficient than using a
groupBy()
for larger datasets.