Sample Queries

The Corelight data provides an ideal data set for learning how to query Humio event data, and also extract information from Corelight event data for the purpose of identifying network and threat hinting data.

The following sections provide some guideance on how to search and extract information from the sample data set.


Because the Corelight sample data is the same for all users, the example output shown for a given query will match when executed, providing the same time range is seleted. All the examples shown were executed over the entire data set.

Identifying Event Data

To start processing and identifying the individual events and what information can be extracted from the sample data it is useful to understand the basic structure of the event data. One way to achieve this is to filter and summarise the information by the event types, protocols and session information.

Top Event Types

First, you can start by identyfing the different types by getting the top events. This basic search can be a good way to identify the outliers in the overall event stream, for example, odd protocols, or protocols used less or more than you might expect.

The #path tag identifies the top level type:

top(#path, limit=100)

We increase the limit on the output so that we get the full set of different event types.

#path _count
conn 626182
dns 309720
files 99111
http 65871
ssl 61076
ntp 25809
dhcp 25648
notice 20378
smartpcap-stats 19708
x509 15073
intel 11615
corelight_overall_capture_loss 9855
suricata_corelight 9245
weird 4971
dce_rpc 2012
specific_dns_tunnels 1712
smartpcap 1007
etc_viz 811
rdp 679
ssh 410
smb_mapping 379
kerberos 367
smtp 286
smtp_links 268
ntlm 208
smb_files 184
reporter 177
software 163
dpd 113
pe 103
snmp 95
ftp 39
meterpreter 10
meterpreter_headers 10
radius 9
stepping 7
dga 6
generic_icmp_tunnels 2
tunnel 2
log4j 1

The output highlights some specific protocols in the output that may warrant some further investigation.

Top Protocols

The sample data uses the service field to track specific protocols used in the events. Some attacks will use or overload a given protocol specification in order to initiate an attack, or they use invalid protocols to trigger a memory failure.

Let's see what output we get from this query:

top(service, limit=100)

The output generates a list of protocols:

service _count
dns 184074
ssl 60934
http 48452
dhcp 1033
tls 535
ssh 354
smb 294
krb_tcp 278
ntp 172
dce_rpc 156
IPC 132
krbtgt/ACMECORP.COM 97
failed 90
smtp 63
gssapi,smb,ntlm 53
gssapi,ntlm,smb 44
gssapi,smb,krb 40
ssl,smtp 36
ssl,xmpp 34
gssapi,smb 33
smtp,ssl 31
ftp 25
LDAP/ 25
cifs/ 22
smb,gssapi,ntlm 21
ldap/ 20
xmpp,ssl 20
gssapi 19
dce_rpc,ntlm 17
smb,ntlm,gssapi 15
krbtgt/ACMECORP 14
smb,krb,gssapi 14
smb,gssapi 13
rdp 11
cifs/ 11
ldap/ 10
ntlm,gssapi,smb 10
krbtgt/ 10
host/ 10
ldap/ 10
radius 9
A: 8
krb,smb,gssapi 8
ntlm,smb,gssapi 7
gssapi,krb,smb 7
ntlm,dce_rpc 7
rdpeudp 6
krbtgt/windomain.local 6
gssapi,smb,dce_rpc,krb 6
gssapi,ntlm,dce_rpc,smb 5
dce_rpc,ntlm,gssapi,smb 4
smb,gssapi,krb 4
TERMSRV/bas-ad-01.lab.local 4
ftp-data 3
http,smtp,ssl 3
gssapi,dce_rpc,ntlm,smb 3
ldap/ 3
ssl,smtp,http 3
krb,gssapi,smb 3
dce_rpc,gssapi,ntlm,smb 3
ntlm,smb,gssapi,dce_rpc 2
dce_rpc,krb,gssapi,smb 2
smb,ntlm,gssapi,dce_rpc 2
smb,gssapi,ntlm,dce_rpc 2
gssapi,smb,dce_rpc,ntlm 2
ntlm,gssapi,smb,dce_rpc 2
krb,smb,gssapi,dce_rpc 2
gssapi,smb,ntlm,dce_rpc 2
smb,dce_rpc,ntlm,gssapi 2
spicy_ipsec_ike_udp 1
dce_rpc,ntlm,smb,gssapi 1
smb,gssapi,dce_rpc,ntlm 1
dce_rpc,smb,krb,gssapi 1
dce_rpc,smb,gssapi,ntlm 1
smb,dce_rpc,krb,gssapi 1
gssapi,dce_rpc,smb,ntlm 1
gssapi,dce_rpc,krb,smb 1

The protocols in the output contain perfectly valid protocols, including http and smb. But there are also some protocols that do not look valid. For example, there is no protocol A:, or TERMSRV/bas-ad-01.lab.local.

High or Low Session Counts

The unique ID for each session identified by Corelight (see Session Identifier (uid)) can also be an identifier for unusual network traffic. A high number of events within a unique session ID may be suspcicious. The opposite is also true, a low number of events for a given session may indicate an attempt to attack that is merely probing for potential attack vectors.

Let's look at the top and bottom ten events by the unique session ID, starting with the bottom:

groupby(uid)| sort(order=desc,limit=10)

This generates the following summary output:

uid _count
CTYqn61mJNPJsIVG96 15655
CDZxIM1utVOc5M1GSk 1457
CxbxKB3bGrLfxvYe4c 1229
CiRMgRsjQR7ksp7Me 688
CTyrEe2rZtqGUNnnj5 682
CLd2aI1qvBiKZ1vlTb 561
CsgULS3wsvCcooOv8 479
CcY0Gu2zJP3r7iWmR 452
CBBHS11aPS4RsdHkRe 451

Let's take a closer look at that last UID:

uid=CBBHS11aPS4RsdHkRe| groupby([#path,service])
#path service _count
conn ntp 1
ntp <no value> 451
High Data Transfers

High transfer rates, or high amounts of data transferred for protocols or services that are normally small and discrete can be worth investigating.

DHCP requests, for example, should not normally contain excessive payloads of data, as the infomation returned. High data returns from a DHCP exchange might indicate a fake or spoof DHCP server masquerading on your network.

The response payload for requests is contained within the resp_bytes field. You can look for these by running a query looking for non-zero DHCP:

service = dhcp | top(resp_bytes)

The query returns the following data set:

resp_bytes _count
0 947
1096 11
1644 5
300 5
548 5
2192 4
900 4
2740 3
600 3
7672 2

We can see here that the majority of DHCP requests have a empty response, but there are some that return much larger payloads that require investigating.