Sample Queries
The Corelight data provides an ideal data set for learning how to query LogScale event data, and also extract information from Corelight event data for the purpose of identifying network and threat hunting data.
The following sections provide some guidance on how to search and extract information from the sample data set.
Note
Because the Corelight sample data is the same for all users, the example output shown for a given query will match when executed, provided that the same time range is selected. All the examples shown were executed over the entire data set.
Identifying Event Data
To start processing and identifying the individual events and what information can be extracted from the sample data, it is useful to understand the basic structure of the event data. One way to achieve this is to filter and summarise the information by the event types, protocols and session information.
Top Event Types
First, you can start by identifying the different types by getting the top events. This basic search can be a good way to identify the outliers in the overall event stream, for example, odd protocols, or protocols used less or more than you might expect.
The #path
tag identifies the top level
type:
top(#path, limit=100)
We increase the limit on the output so that we get the full set of different event types.
#path | _count |
---|---|
conn | 626182 |
dns | 309720 |
files | 99111 |
http | 65871 |
ssl | 61076 |
ntp | 25809 |
dhcp | 25648 |
notice | 20378 |
smartpcap-stats | 19708 |
x509 | 15073 |
intel | 11615 |
corelight_overall_capture_loss | 9855 |
suricata_corelight | 9245 |
weird | 4971 |
dce_rpc | 2012 |
specific_dns_tunnels | 1712 |
smartpcap | 1007 |
etc_viz | 811 |
rdp | 679 |
ssh | 410 |
smb_mapping | 379 |
kerberos | 367 |
smtp | 286 |
smtp_links | 268 |
ntlm | 208 |
smb_files | 184 |
reporter | 177 |
software | 163 |
dpd | 113 |
pe | 103 |
snmp | 95 |
ftp | 39 |
meterpreter | 10 |
meterpreter_headers | 10 |
radius | 9 |
stepping | 7 |
dga | 6 |
generic_icmp_tunnels | 2 |
tunnel | 2 |
log4j | 1 |
The output highlights some specific protocols in the output that may warrant some further investigation.
Top Protocols
The sample data uses the service
field
to track specific protocols used in the events. Some
attacks will use or overload a given protocol
specification in order to initiate an attack, or they use
invalid protocols to trigger a memory failure.
Let's see what output we get from this query:
top(service, limit=100)
The output generates a list of protocols:
service | _count |
---|---|
dns | 184074 |
ssl | 60934 |
http | 48452 |
dhcp | 1033 |
tls | 535 |
ssh | 354 |
smb | 294 |
krb_tcp | 278 |
ntp | 172 |
dce_rpc | 156 |
IPC | 132 |
krbtgt/ACMECORP.COM | 97 |
failed | 90 |
smtp | 63 |
gssapi,smb,ntlm | 53 |
gssapi,ntlm,smb | 44 |
gssapi,smb,krb | 40 |
ssl,smtp | 36 |
ssl,xmpp | 34 |
gssapi,smb | 33 |
smtp,ssl | 31 |
ftp | 25 |
LDAP/DC1.ACMECORP.com/ACMECORP.com | 25 |
cifs/dc1.acmecorp.com | 22 |
smb,gssapi,ntlm | 21 |
ldap/DC1.ACMECORP.com | 20 |
<finance$@ACMECORP.COM>
| 20 |
xmpp,ssl | 20 |
gssapi | 19 |
dce_rpc,ntlm | 17 |
smb,ntlm,gssapi | 15 |
krbtgt/ACMECORP | 14 |
smb,krb,gssapi | 14 |
smb,gssapi | 13 |
rdp | 11 |
cifs/DC1.ACMECORP.com | 11 |
ldap/DC1.ACMECORP.com/ACMECORP.com | 10 |
ntlm,gssapi,smb | 10 |
FINANCE$ | 10 |
krbtgt/ACMECORP.com | 10 |
host/finance.acmecorp.com | 10 |
ldap/dc1.acmecorp.com | 10 |
radius | 9 |
A: | 8 |
krb,smb,gssapi | 8 |
ntlm,smb,gssapi | 7 |
gssapi,krb,smb | 7 |
ntlm,dce_rpc | 7 |
rdpeudp | 6 |
krbtgt/windomain.local | 6 |
krbtgt/PODTRONICS.ORG | 6 |
gssapi,smb,dce_rpc,krb | 6 |
gssapi,ntlm,dce_rpc,smb | 5 |
dce_rpc,ntlm,gssapi,smb | 4 |
smb,gssapi,krb | 4 |
TERMSRV/bas-ad-01.lab.local | 4 |
ftp-data | 3 |
http,smtp,ssl | 3 |
gssapi,dce_rpc,ntlm,smb | 3 |
ldap/podtronics-dc.podtronics.org | 3 |
ssl,smtp,http | 3 |
krb,gssapi,smb | 3 |
dce_rpc,gssapi,ntlm,smb | 3 |
ntlm,smb,gssapi,dce_rpc | 2 |
dce_rpc,krb,gssapi,smb | 2 |
smb,ntlm,gssapi,dce_rpc | 2 |
smb,gssapi,ntlm,dce_rpc | 2 |
gssapi,smb,dce_rpc,ntlm | 2 |
ntlm,gssapi,smb,dce_rpc | 2 |
krb,smb,gssapi,dce_rpc | 2 |
gssapi,smb,ntlm,dce_rpc | 2 |
smb,dce_rpc,ntlm,gssapi | 2 |
spicy_ipsec_ike_udp | 1 |
dce_rpc,ntlm,smb,gssapi | 1 |
smb,gssapi,dce_rpc,ntlm | 1 |
dce_rpc,smb,krb,gssapi | 1 |
dce_rpc,smb,gssapi,ntlm | 1 |
smb,dce_rpc,krb,gssapi | 1 |
gssapi,dce_rpc,smb,ntlm | 1 |
gssapi,dce_rpc,krb,smb | 1 |
The protocols in the output contain perfectly valid
protocols, including
http
and
smb
. But there are
also some protocols that do not look valid. For example,
there is no protocol
A:
, or
TERMSRV/bas-ad-01.lab.local
.
High or Low Session Counts
The unique ID for each session identified by Corelight
(see
Session Identifier (uid
))
can also be an identifier for unusual network traffic. A
high number of events within a unique session ID may be
suspicious. The opposite is also true, a low number of
events for a given session may indicate an attempt to
attack that is merely probing for potential attack
vectors.
Let's look at the top and bottom ten events by the unique session ID, starting with the bottom:
groupby(uid)
| sort(order=desc,limit=10)
This generates the following summary output:
uid | _count |
---|---|
CTYqn61mJNPJsIVG96 | 15655 |
CDZxIM1utVOc5M1GSk | 1457 |
CxbxKB3bGrLfxvYe4c | 1229 |
CiRMgRsjQR7ksp7Me | 688 |
CTyrEe2rZtqGUNnnj5 | 682 |
CLd2aI1qvBiKZ1vlTb | 561 |
CIKUnZ1EPs7PAW2ZIi | 499 |
CsgULS3wsvCcooOv8 | 479 |
CcY0Gu2zJP3r7iWmR | 452 |
CBBHS11aPS4RsdHkRe | 451 |
Let's take a closer look at that last UID:
uid=CBBHS11aPS4RsdHkRe
| groupby([#path,service])
#path | service | _count |
---|---|---|
conn | ntp | 1 |
ntp | <no value> | 451 |
High Data Transfers
High transfer rates, or high amounts of data transferred for protocols or services that are normally small and discrete can be worth investigating.
DHCP requests, for example, should not normally contain excessive payloads of data, as the infomation returned. High data returns from a DHCP exchange might indicate a fake or spoof DHCP server masquerading on your network.
The response payload for requests is contained within the
resp_bytes
field. You can look for
these by running a query looking for non-zero DHCP:
service = dhcp
| top(resp_bytes)
The query returns the following data set:
resp_bytes | _count |
---|---|
0 | 947 |
1096 | 11 |
1644 | 5 |
300 | 5 |
548 | 5 |
2192 | 4 |
900 | 4 |
2740 | 3 |
600 | 3 |
7672 | 2 |
We can see here that the majority of DHCP requests have an empty response, but there are some that return much larger payloads that require investigating.