Corelight Sample Repository Data

The Corelight Sample Data Repository is accessible within LogScale Community Edition and provides a sample dataset that can be used to lean and understand the types of events and data within LogScale:

  • The data set is based on a real set of capture data and provides a wide gamut of sample event types.

  • Using the data set will help you learn about events in LogScale and how to query and extract information to identify security and threat details.

  • You can use the Corelight packages to view the information using preset dashboards and queries, or follow the Sample Queries guide.

  • The sample data set consists of events captured by a Corelight device. Data is organised according to distinct network packet types, augmented with information by Corelight, including identifying related session data, threat identification information, and the detailed of request and response packets.

For more information on the contents and structure of the data, see Data Format.

To try some sample queries, see Sample Queries.

Accessing the Sample Data

To access the Corelight Sample Data View:

  1. Click on the blue Feature Box on your HCE Homepage

    Accessing Sample Corelight Data

    Figure 330. Accessing Sample Corelight Data


  2. Switch to the Repository and Views page

  3. Access the humio-organization-corelight-demo view

Corelight Sample Data

Figure 331. Corelight Sample Data


You should be presented with a view of the Corelight Sample Data view and the active data:

Corelight Sample Data Repository

Figure 332. Corelight Sample Data Repository


Using Corelight Packages

An additional option for using and working with the sample Corelight data set is to install the Corelight packages that can be used to parse and display the information through a collection of pre-configured queries and dashboards.

Two packages are available that support processing the data:

  • corelight/sensor

    The corelight/sensor package includes the core parser and dashboard widgets for viewing the sensor data. The package includes the following dashboards:

    • Corelight Connectivity

    • Corelight DNS

    • Corelight Exec Overview

    • Corelight Files

    • Corelight HTTP

    • Corelight Intel

    • Corelight Log Hunter

    Corelight Sensor Sample Data Repository

    Figure 333. Corelight Sensor Sample Data Repository


    • Corelight Notice

    • Corelight SSH Inference

    • Corelight SSL

    • Corelight Software

    • Corelight Suricata

      Suricata

      Figure 334. Suricata


      Corelight Sensor Sample Data Repository

    • Corelight x509

  • corelight/threathuntingguide

    The package includes the following dashboards:

    • Saved Searches

    • Alerts

Installing Corelight Packages

To install the Corelight packages:

  1. Go to the humio-organization-corelight-demo view.

  2. Click on the Settings button

  3. Go to the Marketplace under the Settings tab in your view.

  4. Choose the corelight/sensor package from the package summary.

  5. Click the Install package button.

Repeat the process with the corelight/threathuntingguide package.

To start querying the data, see Sample Queries.

Data Format

The sample data is derived from Corelight installation dataset, parsed and presented within the Corelight repository. The data has been extracted from a running Corelight capture service and includes an array of different information, triggers, and threats from the captured data.

The repository content consists of 74 minutes data, replayed on a loop so that the information is active within the repository. Although the data is repeated, the format and structure of the information provides an ideal resource for running and executing queries to understand the format and output.

Common Event Data

Event data has been parsed and tagged from raw JSON presented by Corelight. Event data contains the following core information for each event:

  • Source and Destination IP address and Port

  • Top-level Protocol (UDP or TCP)

  • Service-level Protocol (dns, ssl, http)

  • Bytes sent and received

  • Hardware address for source and destination

  • Timestamps for the packet and when it was recorded

  • Duration of the connection

  • Protocol specific data

    Corelight recognises a number of distinct types of data where additionals fields and information are identified and included. The following are examples only:

    • For DNS the requested query type (e.g. address, server, MX (mail) record)

    • For HTTP, if identifiable, the source and content of the data

    • For DHCP, lease, IP address, MAC address and whether a specific name or ID is used

    • For file transfer, the file size, name, and digest contents

  • Alerts or triggers identified by Corelight thread detection:

    • Alert level

    • Category, including a description

The raw data from Corelight is presented as JSON, for example:

json
{
   "_path" : "conn",
   "_system_name" : "SmartPCAP_192_168_5_1",
   "_write_ts" : "2022-02-18T16:07:48.737324Z",
   "community_id" : "1:WOlQiyEP/B3qO3ib+RwAYV06Av8=",
   "conn_state" : "S0",
   "corelight_shunted" : false,
   "duration" : 0.00754499435424805,
   "history" : "S",
   "id.orig_h" : "10.9.18.101",
   "id.orig_p" : 49218,
   "id.resp_h" : "66.96.147.100",
   "id.resp_p" : 587,
   "local_orig" : true,
   "local_resp" : false,
   "missed_bytes" : 0,
   "orig_bytes" : 0,
   "orig_ip_bytes" : 152,
   "orig_l2_addr" : "00:08:02:1c:47:ae",
   "orig_pkts" : 3,
   "proto" : "tcp",
   "resp_bytes" : 0,
   "resp_cc" : "US",
   "resp_ip_bytes" : 0,
   "resp_l2_addr" : "20:e5:2a:b6:93:f1",
   "resp_pkts" : 0,
   "spcap.rule" : 6,
   "spcap.trigger" : "all-unencrypted",
   "spcap.url" : "https://192.168.5.1/spcap/v1/?uid=CxZD5YDQVXFfwHSW7",
   "ts" : "2022-02-18T16:07:43.737232Z",
   "uid" : "CxZD5YDQVXFfwHSW7"
}

Within the sample repository data, the raw JSON has been parsed into a combination of tags and fields.

Common fields from the data parsed into the events within LogScale:

  • #path

    Primary event type, this indicates the main log event, for example HTTP, SSL, raw TCP. See Event Types (#path).

  • @id

    Unique ID for each event

  • uid

    A unique ID for the session, which may include multiple events. This can be used to identify a sequence of communication that may include multiple types of events. See Session Identifier (uid).

  • ts

    Timestamp for the event, to the nearest millisecond.

Event Types (#path)

The events consist of the following major event types (identified through the event #path tag):

  • conn

    IP, TCP, UDP and ICMP connection details

  • dce_rpc

    DCE/RPC communication information

  • dhcp

    DHCP lease information

  • dns

    DNS query and response details

  • files

    File analysis results

  • notice

    Notices of identified information generated by the Corelight device

  • smartpcap-stats

    SmartPCAP statistics from th Corelight device

  • ssl

    SSL handshakes

  • x509

    X.509 certificate information

For more information on the specific fields within each event type, consult the Zeek log files reference <https://docs.zeek.org/en/master/script-reference/log-files.html.

To get a full list of all the available event types in the sample data you can use:

logscale
groupby(#path)|sort(_count)

This will product the following list:

logscale
#path _count
conn" 626182
dns 309720
files 99111
http 65871
ssl 61076
ntp 25809
dhcp 25648
notice 20378
smartpcap-stats 19708
x509 15073
intel 11615
corelight_overall_capture_loss 9855
suricata_corelight 9245
weird 4971
dce_rpc 2012
specific_dns_tunnels 1712
smartpcap 1007
etc_viz 811
rdp 679
ssh 410
smb_mapping 379
kerberos 367
smtp 286
smtp_links 268
ntlm 208
smb_files 184
reporter 177
software 163
dpd 113
pe 103
snmp 95
ftp 39
meterpreter 10
meterpreter_headers 10
radius 9
stepping 7
dga 6
generic_icmp_tunnels 2
tunnel 2
log4j 1

Session Identifier (uid)

The uid field identifies events related to an individual session. The duration of the session depends on the communication involved. For example, a DNS query might consist of only two events (the request and the response) or thousands of events across a variety of different types.

You can see this by searching the uid CCsBUu3O2Z0QfCd6Y8:

logscale
uid = CCsBUu3O2Z0QfCd6Y8

This returns just two events:

logscale
{"_path":"dns","_system_name":"SmartPCAP_192_168_5_1","_write_ts":"2022-02-18T16:07:44.147273Z",
 "ts":"2022-02-18T16:07:44.146859Z","uid":"CCsBUu3O2Z0QfCd6Y8","id.orig_h":"10.9.18.101",
 "id.orig_p":52060,"id.resp_h":"10.9.18.1","id.resp_p":53,"proto":"udp","trans_id":39459,
 "rtt":0.00041413307189941406,"query":"mail.aepl.com.pk","qclass":1,"qclass_name":"C_INTERNET",
 "qtype":1,"qtype_name":"A","rcode":0,"rcode_name":"NOERROR","AA":false,"TC":false,"RD":true,
 "RA":true,"Z":0,"answers":["162.250.121.176"],"TTLs":[5.0],"rejected":false}
{"_path":"conn","_system_name":"SmartPCAP_192_168_5_1","_write_ts":"2022-02-18T16:07:54.147004Z",
"ts":"2022-02-18T16:07:44.146859Z","uid":"CCsBUu3O2Z0QfCd6Y8","id.orig_h":"10.9.18.101",
"id.orig_p":52060,"id.resp_h":"10.9.18.1","id.resp_p":53,"proto":"udp","service":"dns",
"duration":0.00041413307189941406,"orig_bytes":34,"resp_bytes":50,"conn_state":"SF",
"local_orig":true,"local_resp":true,"missed_bytes":0,"history":"Dd","orig_pkts":1,
"orig_ip_bytes":62,"resp_pkts":1,"resp_ip_bytes":78,"corelight_shunted":false,
"orig_l2_addr":"00:08:02:1c:47:ae","resp_l2_addr":"20:e5:2a:b6:93:f1",
"spcap.url":"https://192.168.5.1/spcap/v1/?uid=CCsBUu3O2Z0QfCd6Y8",
"spcap.rule":6,"spcap.trigger":"all-unencrypted","community_id":"1:xd/MFWG+2N7nVcDr5rAJjgE3Jaw="}

By comparison, CTYqn61mJNPJsIVG96 returns over 15,000 events

Using the uid field enables you to tie multiple streams of events together. When diagnosing errors or attacks this can aid in collecting identical items together.