Configuring Humio Log Collector
The Humio Log Collector is configured through a yaml configuration file which can be found in:
Linux
/etc/humio-log-collector/config.yaml
Windows
C:\\Program Files (x86)\\CrowdStrike\\Humio Log Collector\\config.yaml
Additional environment variables can be configured in this file on Linux.
On Windows the environment variables have to be configured in system
properties /etc/default/humio-log-collector
.
Editing the Configuration
These steps explain how to configure the config.yaml file to ship data to Humio.
Open the file
config.yaml
to edit using the editor of your choice, for example on Linux:humiosudo vi /etc/humio-log-collector/config.yaml
Edit the file and specify the fields and values described in Configuration Objects or you can try data ingestion by specifying:
name
under
sources
you must specifytype
andinclude
under
sinks
you must specifytype
,token
andurl
Save the changes and restart the service.
sudo systemctl restart humio-log-collector.service
Minimal Configuration Example File Collection
This configuration is the minimal configuration needed to collect
events from local log files. The sources
section
describes the data that should be collected, and the
sinks
section describes where those events should
be sent. The sinks can be reused and are referenced by name in the
source.
dataDirectory: data
sources:
apache_logs:
type: file
include: /var/log/apache/*.log
sink: my_humio_instance
sinks:
my_humio_instance:
type: humio
token: <ingest-token>
url: https://cloud.community.humio.com
Note
You must set the url and token values that correspond to your Humio instance and repository.
Advanced Example - File, Windows Events, JournalD and Syslog
This configuration is an an example of the sections needed to collect file data, Windows Events and Syslogs.
dataDirectory: data
sources:
apache_logs:
type: file
# Glob patterns
include: /var/log/apache/.log
exclude: /var/log/apache/not_me.log
sink: my_humio_instance
parser: accesslog
multiLineBeginsWith: ^20\d{2}-
transforms:
# static_fields transform adds configured key, value pairs as fields
- type: static_fields
fields:
mykey: myvalue
# Passing environment variables is supported
myenvvar: $MY_ENV_VAR
syslog:
type: syslog
# Mode must be 'udp' or 'tcp'
mode: udp
# Port number to listen on
# Default: 514
port: 514
# Optional bind address.
# If unspecified the source will listen on all interfaces
# Don't specify port here. Use 'port' field for that
bind: 0.0.0.0
sink: my_other_humio_instance
wineventlog:
type: wineventlog
# Channels to open.
# If no channels are specified, all available channels will be used.
channels:
- name: System
- name: Application
# Only collect if event id matches any of these
onlyEventIDs:
- 1003
- 1004
- name: Security
sink: my_humio_instance
journal:
# Example for reading journald log data (linux only)
type: journald
sink: my_humio
# Optional. If not specified collect from the local journal
directory: /var/log/journal
# If specified only collect from these units
includeUnits:
- systemd-modules-load.service
# If specified collect from all units except these
excludeUnits:
- systemd-modules-load.service
# Default: false. If true only collect logs from the current boot
currentBootOnly: false
sinks:
my_other_humio_instance:
type: humio
token: <ingest-token_repo1>
url: https://cloud.us.humio.com
my_humio_instance:
type: humio
token: <ingest-token-repo2> or an environment variable
url: https://cloud.us.humio.com
# auto, none, gzip, deflate, none. Default: auto
compression: gzip
# Number between: 1 ... 9.
# 1 = highest speed
# 9 = highest compression
# If unspecified or 0 the default value for chosen compression algorithm is used
compressionLevel: 9
# Override default tls configuration
# Only one of the following options should be used at a time.
# If multiple are given, the precedence is: 'insecure', 'caCert', 'caFile'.
tls:
# Specify insecure to skip certificate validation
insecure: false
# Specify caCert to load a PEM certificate from the config file
caCert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
# Specify caFile to load PEM certificate from an external file.
caFile: /etc/ssl/cert.pem
# Override proxy configuration for the sink. Must be set to 'none' for Windows Server.
# Accepted values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for an http proxy.
# Defaults to system, which will try to determine the appropriate proxy or fallback to none.
proxy: none
queue:
# Default: 1024
maxEventsPerRequest: 4096
# fullAction determines queue behavior when it is full.
# pause = queue pauses ingesting new batches if it is full (Default if not mentioned) deleteLatest is no longer support and automatically set to pause.
# deleteOldest = queue deletes the oldest batch to accept new batches if it is full
# Default: pause
fullAction: deleteOldest
memory:
# Default: 1000
flushTimeOutInMillisecond: 200
# Default: 2048
maxLimitInMB: 1024
Exec Example
dataDirectory: data
sources:
cmd_ls:
type: cmd
cmd: ls
# scheduled or streaming
mode: scheduled
args:
- -l
- -h
workingDir: /foo
# Interval between each invocation of the cmd
interval: 60
# Environment variables can be configured and passed to the command
environment:
# define CONFIGURED_ENV1 as environment variable
CONFIGURED_ENV1: my_configured_env_1
# Pass environment variable: MY_ENV_VAR to command
MY_ENV_VAR: $MY_ENV_VAR
sink: my_humio
cmd_tail:
type: cmd
cmd: tail
mode: streaming
args:
- -F
workingDir: /foo
sink: my_humio
Checkpoints
By default, the configuration file points to the directory
var/lib/humio-log-collector
as the storage for
checkpoints.
Stop the Log Collector service humio-log-collector.service.
Delete the
checkpoints.json
file to reset the state of the installation.Restart the Humio Log Collector service.
Debug Log
The Humio Log Collector debug log can be sent to a Humio instance by providing the following environment variables:
HUMIO_DEBUG_LOG_ADDRESS=https://<your-humio-instance>
HUMIO_DEBUG_LOG_TOKEN=<your-ingest-token>
To view low levels logs for the Humio Log Collector run the shipper on the CLI and pass the switch :
-log -level debug
Logs are then emitted to stdout; this can be helpful if connectivity is part of the problem.
Configuration Objects
This section describes the objects and keys under pipelines.
name
Specify a unique name for your pipeline.
sources
The sources block configures the sources of data that the log collector will send to Humio.
type
This key specifies the type of log, possible values are file, syslog, journal, cmd, and wineventlog.
file
If
type
is set tofile
the include and exclude fields must be specified.include
Specify which logs to include by specifying the path of the file or using a glob pattern.
exclude
Specify which logs to exclude, also using a glob pattern, this is only applied to type file.
parser
Specify the parser to use to parse the logs, if you install the parser through a package you must specify the type and name as displayed on the parsers page for example linux/system-logs:linux-filebeat.
multiLineBeginsWith
ormultiLineContinuesWith
The file input can join consecutive lines together to create multiline events, by using a regular expression. It can be configured to use a pattern to look for the beginning or the continuation of multiline events,.
Example all multiline events beginning with a date, e.g. 2022- multiLineBeginsWith:
^20\d{2}-
in this case every line that doesn't match the pattern, gets appended to the latest line that did.Example lines that start with whitespace are continuations of the previous line multiLineContinuesWith:
^\s+
in this case every line that matches the pattern, gets appended to the latest line that didn't .
transforms
Specify transforms to use for this source (optional), if
static_field
is specified you must specify a key and a value which can be an environment variable for examplemyenvvar:$MY_ENV_VAR
syslog
If
type
is set tosyslog
you must specify theport
,address
andmode
fields.port
Specify the number of the port on which to listen. The default is 524.
address
Specify the address to bind to. This defaults to all addresses.
mode
Specify the protocol to listen to, which can be tcp or udp.
wineventlog
If
type
is set towineventlog
you must specify thechannel
.channel
Specify the windows event log channels to read, if no channel is specified the log collector will subscribe to all available channels. You can also specific IDs usingonlyEventIDs
.
Important
Subscribing to all channels may impact performance as the amount of data logged would be very high.
channels:
- <Channel Name>
- ...
journald
If
type
is set toJournald
in order to read JournalD log data (linux only) you must specify the following fields:directory
Allows you to specify the journal directory to collect from, if not specified collects from the local journal.
includeUnits
If specified only collect from these units
excludeUnits
If specified collect from all units except these.
currentBootOnly
Set to false by default. If true only collect logs from the current boot.
cmd
If
type
is set tocmd
you must specify the fields:cmd
Specify the command to run.
mode
Can be set to
scheduled
to collect data at intervals in which case you must specify theinterval
orstreaming
to collect data constantly.args
The arguments of the command.
workingDir
Specifies the directory in which to run the command.
interval
Specifies how frequently the command should be invoked when set to
scheduled
.environment
Specify the Environment variables and pass them command to commands using this section.
sink
Set to humio.
queue
This block defines the behavior of the queue.
type
This object defines how memory is managed and can be set to:
memory
default, ThemaxLimitInMB
can be set but is set to 1024mb by default.disk
when set to queue the data is written in the<dataDirectory>/queue/<sinkName>/
unless specified usingstorageDir
. ThemaxLimitInMB
must be set to the maximum size of the queue when set to disk, by default set to 1024.
maxEventsPerRequest
Specify the maximum number of events in a request before the log is sent to Humio. This is applied along with
flushTimeOutInMillisecond
and whichever is reached first triggers the data to be sent to Humio. The default is 1024.fullAction
Specify the action to take when the queue is full. The possible values are:
deleteOldest
accepts new batches but deletes the oldest batchpause
this is the default value. the queue does not ingest new batches when it is full. Note that deleteLatest is no longer supported and automatically set to pause.
flushTimeOutInMillisecond
Specify how often data is sent to humio log shipper. The default is 1000.
maxBufferedEvents
Specify the maximum number of events maintained in the buffer, if for example Humio cannot be reached. The default is 100000.
sinks
This object defines details on the sink.
type
Specify the type of sink. This must be set to Humio.
token
Specify the ingest token for your Humio repository or an environment variable.
url
Specify the url of your Humio account for example https://cloud.humio.com.
compression
Specify the type of data compression, possible values: auto, none, gzip, deflate. The default value is auto.
compressionLevel
Specify the level of compression where 1 is best speed and 9 is best compression and undefined or 0 = auto for compression algorithm.
tls
This object contains details on the PEM certificates. this section allows you to override the defaults. Only one of the following options should be specified:
insecure
Specify if certificate validation is needed, if set to true the certificate validation is skipped.
caCert
Specify this key to load a certificate from the config file.
caFile
Specify this key to load the PEM certificate from an external file.
proxy
Set to none for Windows Server or specify, if required, an override proxy configuration for the sink, possible values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for a http proxy. The default is system, which will try to determine the appropriate proxy or fallback to none.
Additional Example Configuration Files
These example configurations can be copied and customized to meet your needs.
Linux
This is an example configuration for a linux environment.
dataDirectory: /var/lib/humio-log-collector
sources:
# Collect local files.
var_log:
type: file
include: /var/log/*
exclude: /var/log/*.gz
sink: humio
# Collect syslog udp 5140.
syslog_udp_5140:
type: syslog
mode: udp
port: 5140
sink: humio
# Collect syslog tcp 5140.
syslog_tcp_5140:
type: syslog
mode: tcp
port: 5140
sink: humio
sinks:
humio:
type: humio
# Replace with your specified ingest token.
token: $INGEST_TOKEN
# Replace with your "standard endpoint" API URL: https://library.humio.com/endpoints/
url: $HUMIO_URL