Sample Queries
The Corelight data provides an ideal data set for learning how to query Humio event data, and also extract information from Corelight event data for the purpose of identifying network and threat hinting data.
The following sections provide some guideance on how to search and extract information from the sample data set.
Note
Because the Corelight sample data is the same for all users, the example output shown for a given query will match when executed, providing the same time range is seleted. All the examples shown were executed over the entire data set.
Identifying Event Data
To start processing and identifying the individual events and what information can be extracted from the sample data it is useful to understand the basic structure of the event data. One way to achieve this is to filter and summarise the information by the event types, protocols and session information.
Top Event Types
First, you can start by identyfing the different types by getting the top events. This basic search can be a good way to identify the outliers in the overall event stream, for example, odd protocols, or protocols used less or more than you might expect.
The #path
tag identifies the top level type:
top(#path, limit=100)
We increase the limit on the output so that we get the full set of different event types.
#path | _count |
---|---|
conn | 626182 |
dns | 309720 |
files | 99111 |
http | 65871 |
ssl | 61076 |
ntp | 25809 |
dhcp | 25648 |
notice | 20378 |
smartpcap-stats | 19708 |
x509 | 15073 |
intel | 11615 |
corelight_overall_capture_loss | 9855 |
suricata_corelight | 9245 |
weird | 4971 |
dce_rpc | 2012 |
specific_dns_tunnels | 1712 |
smartpcap | 1007 |
etc_viz | 811 |
rdp | 679 |
ssh | 410 |
smb_mapping | 379 |
kerberos | 367 |
smtp | 286 |
smtp_links | 268 |
ntlm | 208 |
smb_files | 184 |
reporter | 177 |
software | 163 |
dpd | 113 |
pe | 103 |
snmp | 95 |
ftp | 39 |
meterpreter | 10 |
meterpreter_headers | 10 |
radius | 9 |
stepping | 7 |
dga | 6 |
generic_icmp_tunnels | 2 |
tunnel | 2 |
log4j | 1 |
The output highlights some specific protocols in the output that may warrant some further investigation.
Top Protocols
The sample data uses the service
field to track
specific protocols used in the events. Some attacks will use or
overload a given protocol specification in order to initiate an
attack, or they use invalid protocols to trigger a memory failure.
Let's see what output we get from this query:
top(service, limit=100)
The output generates a list of protocols:
service | _count |
---|---|
dns | 184074 |
ssl | 60934 |
http | 48452 |
dhcp | 1033 |
tls | 535 |
ssh | 354 |
smb | 294 |
krb_tcp | 278 |
ntp | 172 |
dce_rpc | 156 |
IPC | 132 |
krbtgt/ACMECORP.COM | 97 |
failed | 90 |
smtp | 63 |
gssapi,smb,ntlm | 53 |
gssapi,ntlm,smb | 44 |
gssapi,smb,krb | 40 |
ssl,smtp | 36 |
ssl,xmpp | 34 |
gssapi,smb | 33 |
smtp,ssl | 31 |
ftp | 25 |
LDAP/DC1.ACMECORP.com/ACMECORP.com | 25 |
cifs/dc1.acmecorp.com | 22 |
smb,gssapi,ntlm | 21 |
ldap/DC1.ACMECORP.com | 20 |
<finance$@ACMECORP.COM>
| 20 |
xmpp,ssl | 20 |
gssapi | 19 |
dce_rpc,ntlm | 17 |
smb,ntlm,gssapi | 15 |
krbtgt/ACMECORP | 14 |
smb,krb,gssapi | 14 |
smb,gssapi | 13 |
rdp | 11 |
cifs/DC1.ACMECORP.com | 11 |
ldap/DC1.ACMECORP.com/ACMECORP.com | 10 |
ntlm,gssapi,smb | 10 |
FINANCE$ | 10 |
krbtgt/ACMECORP.com | 10 |
host/finance.acmecorp.com | 10 |
ldap/dc1.acmecorp.com | 10 |
radius | 9 |
A: | 8 |
krb,smb,gssapi | 8 |
ntlm,smb,gssapi | 7 |
gssapi,krb,smb | 7 |
ntlm,dce_rpc | 7 |
rdpeudp | 6 |
krbtgt/windomain.local | 6 |
krbtgt/PODTRONICS.ORG | 6 |
gssapi,smb,dce_rpc,krb | 6 |
gssapi,ntlm,dce_rpc,smb | 5 |
dce_rpc,ntlm,gssapi,smb | 4 |
smb,gssapi,krb | 4 |
TERMSRV/bas-ad-01.lab.local | 4 |
ftp-data | 3 |
http,smtp,ssl | 3 |
gssapi,dce_rpc,ntlm,smb | 3 |
ldap/podtronics-dc.podtronics.org | 3 |
ssl,smtp,http | 3 |
krb,gssapi,smb | 3 |
dce_rpc,gssapi,ntlm,smb | 3 |
ntlm,smb,gssapi,dce_rpc | 2 |
dce_rpc,krb,gssapi,smb | 2 |
smb,ntlm,gssapi,dce_rpc | 2 |
smb,gssapi,ntlm,dce_rpc | 2 |
gssapi,smb,dce_rpc,ntlm | 2 |
ntlm,gssapi,smb,dce_rpc | 2 |
krb,smb,gssapi,dce_rpc | 2 |
gssapi,smb,ntlm,dce_rpc | 2 |
smb,dce_rpc,ntlm,gssapi | 2 |
spicy_ipsec_ike_udp | 1 |
dce_rpc,ntlm,smb,gssapi | 1 |
smb,gssapi,dce_rpc,ntlm | 1 |
dce_rpc,smb,krb,gssapi | 1 |
dce_rpc,smb,gssapi,ntlm | 1 |
smb,dce_rpc,krb,gssapi | 1 |
gssapi,dce_rpc,smb,ntlm | 1 |
gssapi,dce_rpc,krb,smb | 1 |
The protocols in the output contain perfectly valid protocols,
including http
and smb
. But
there are also some protocols that do not look valid. For example,
there is no protocol A:
, or
TERMSRV/bas-ad-01.lab.local
.
High or Low Session Counts
The unique ID for each session identified by Corelight (see
Session Identifier (uid
)) can also
be an identifier for unusual network traffic. A high number of
events within a unique session ID may be suspcicious. The opposite
is also true, a low number of events for a given session may
indicate an attempt to attack that is merely probing for potential
attack vectors.
Let's look at the top and bottom ten events by the unique session ID, starting with the bottom:
groupby(uid)| sort(order=desc,limit=10)
This generates the following summary output:
uid | _count |
---|---|
CTYqn61mJNPJsIVG96 | 15655 |
CDZxIM1utVOc5M1GSk | 1457 |
CxbxKB3bGrLfxvYe4c | 1229 |
CiRMgRsjQR7ksp7Me | 688 |
CTyrEe2rZtqGUNnnj5 | 682 |
CLd2aI1qvBiKZ1vlTb | 561 |
CIKUnZ1EPs7PAW2ZIi | 499 |
CsgULS3wsvCcooOv8 | 479 |
CcY0Gu2zJP3r7iWmR | 452 |
CBBHS11aPS4RsdHkRe | 451 |
Let's take a closer look at that last UID:
uid=CBBHS11aPS4RsdHkRe| groupby([#path,service])
#path | service | _count |
---|---|---|
conn | ntp | 1 |
ntp | <no value> | 451 |
High Data Transfers
High transfer rates, or high amounts of data transferred for protocols or services that are normally small and discrete can be worth investigating.
DHCP requests, for example, should not normally contain excessive payloads of data, as the infomation returned. High data returns from a DHCP exchange might indicate a fake or spoof DHCP server masquerading on your network.
The response payload for requests is contained within the
resp_bytes
field. You can look for these by
running a query looking for non-zero DHCP:
service = dhcp | top(resp_bytes)
The query returns the following data set:
resp_bytes | _count |
---|---|
0 | 947 |
1096 | 11 |
1644 | 5 |
300 | 5 |
548 | 5 |
2192 | 4 |
900 | 4 |
2740 | 3 |
600 | 3 |
7672 | 2 |
We can see here that the majority of DHCP requests have a empty response, but there are some that return much larger payloads that require investigating.