Configuring Humio Log Collector

The Humio Log Collector is configured through a yaml configuration file which can be found in:

  • Linux

    /etc/humio-log-collector/config.yaml

  • Windows

    C:\\Program Files (x86)\\CrowdStrike\\Humio Log Collector\\config.yaml

Additional environment variables can be configured in this file on Linux. On Windows the environment variables have to be configured in system properties /etc/default/humio-log-collector.

Editing the Configuration

These steps explain how to configure the config.yaml file to ship data to Humio.

  1. Open the file config.yaml to edit using the editor of your choice, for example on Linux:

    humio
    sudo vi /etc/humio-log-collector/config.yaml
  2. Edit the file and specify the fields and values described in Configuration Objects or you can try data ingestion by specifying:

    • name

    • under sources you must specify type and include

    • under sinks you must specify type, token and url

  3. Save the changes and restart the service.

humio
sudo systemctl restart humio-log-collector.service
Minimal Configuration Example File Collection

This configuration is the minimal configuration needed to collect events from local log files. The sources section describes the data that should be collected, and the sinks section describes where those events should be sent. The sinks can be reused and are referenced by name in the source.

yaml
dataDirectory: data
sources:
  apache_logs:
    type: file
    include: /var/log/apache/*.log
    sink: my_humio_instance

sinks:
  my_humio_instance:
    type: humio
    token: <ingest-token>
    url: https://cloud.community.humio.com

Note

You must set the url and token values that correspond to your Humio instance and repository.

Advanced Example - File, Windows Events, JournalD and Syslog

This configuration is an an example of the sections needed to collect file data, Windows Events and Syslogs.

yaml
dataDirectory: data
 sources:
   apache_logs:
      type: file
      # Glob patterns
      include: /var/log/apache/.log
      exclude: /var/log/apache/not_me.log
      sink: my_humio_instance
      parser: accesslog
      multiLineBeginsWith: ^20\d{2}-
      transforms:
        # static_fields transform adds configured key, value pairs as fields
        - type: static_fields
          fields:
            mykey: myvalue
            # Passing environment variables is supported
            myenvvar: $MY_ENV_VAR

    syslog:
      type: syslog
      # Mode must be 'udp' or 'tcp'
      mode: udp
      # Port number to listen on
      # Default: 514
      port: 514
      # Optional bind address.
      # If unspecified the source will listen on all interfaces
      # Don't specify port here. Use 'port' field for that
      bind: 0.0.0.0
      sink: my_other_humio_instance

    wineventlog:
      type: wineventlog
      # Channels to open.
      # If no channels are specified, all available channels will be used.
      channels:
      - name: System
      - name: Application
        # Only collect if event id matches any of these
        onlyEventIDs:
          - 1003
          - 1004
      - name: Security
      sink: my_humio_instance


    journal:
      # Example for reading journald log data (linux only)
      type: journald
      sink: my_humio
      # Optional. If not specified collect from the local journal
      directory: /var/log/journal
      # If specified only collect from these units
      includeUnits:
        - systemd-modules-load.service
      # If specified collect from all units except these
      excludeUnits:
        - systemd-modules-load.service
      # Default: false. If true only collect logs from the current boot
      currentBootOnly: false

  sinks:
    my_other_humio_instance:
      type: humio
      token: <ingest-token_repo1>
      url: https://cloud.us.humio.com
    my_humio_instance:
      type: humio
      token: <ingest-token-repo2> or an environment variable
      url: https://cloud.us.humio.com

      # auto, none, gzip, deflate, none. Default: auto
      compression: gzip

      # Number between: 1 ... 9.
      #   1 = highest speed
      #   9 = highest compression
      # If unspecified or 0 the default value for chosen compression algorithm is used
      compressionLevel: 9

      # Override default tls configuration
      # Only one of the following options should be used at a time.
      # If multiple are given, the precedence is: 'insecure', 'caCert', 'caFile'.
      tls:
        # Specify insecure to skip certificate validation
        insecure: false

        # Specify caCert to load a PEM certificate from the config file
        caCert: |
          -----BEGIN CERTIFICATE-----
          ...
          -----END CERTIFICATE-----

        # Specify caFile to load PEM certificate from an external file.
        caFile: /etc/ssl/cert.pem
      # Override proxy configuration for the sink. Must be set to 'none' for Windows Server.
      # Accepted values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for an http proxy.
      # Defaults to system, which will try to determine the appropriate proxy or fallback to none.
      proxy: none

  queue:
      # Default: 1024
      maxEventsPerRequest: 4096

      # fullAction determines queue behavior when it is full.
      #   pause = queue pauses ingesting new batches if it is full (Default if not mentioned) deleteLatest is no longer support and automatically set to pause.
      #   deleteOldest = queue deletes the oldest batch to accept new batches if it is full
      # Default: pause
      fullAction: deleteOldest

      memory:
        # Default: 1000
        flushTimeOutInMillisecond: 200
        # Default: 2048
        maxLimitInMB: 1024
Exec Example
yaml
dataDirectory: data
sources:
   cmd_ls:
     type: cmd
     cmd: ls
     # scheduled or streaming
     mode: scheduled
     args:
       - -l
       - -h
     workingDir: /foo
     # Interval between each invocation of the cmd
     interval: 60


     # Environment variables can be configured and passed to the command
     environment:
       # define CONFIGURED_ENV1 as environment variable
       CONFIGURED_ENV1: my_configured_env_1
       # Pass environment variable: MY_ENV_VAR to command
       MY_ENV_VAR: $MY_ENV_VAR
     sink: my_humio

   cmd_tail:
     type: cmd
     cmd: tail
     mode: streaming
     args:
       - -F
     workingDir: /foo
     sink: my_humio
Checkpoints

By default, the configuration file points to the directory var/lib/humio-log-collector as the storage for checkpoints.

  1. Stop the Log Collector service humio-log-collector.service.

  2. Delete the checkpoints.json file to reset the state of the installation.

  3. Restart the Humio Log Collector service.

Debug Log

The Humio Log Collector debug log can be sent to a Humio instance by providing the following environment variables:

humio
HUMIO_DEBUG_LOG_ADDRESS=https://<your-humio-instance>
HUMIO_DEBUG_LOG_TOKEN=<your-ingest-token>

To view low levels logs for the Humio Log Collector run the shipper on the CLI and pass the switch :

humio
-log -level debug

Logs are then emitted to stdout; this can be helpful if connectivity is part of the problem.

Configuration Objects

This section describes the objects and keys under pipelines.

  • name Specify a unique name for your pipeline.

sources

The sources block configures the sources of data that the log collector will send to Humio.

  • type

    This key specifies the type of log, possible values are file, syslog, journal, cmd, and wineventlog.

  • file

    If type is set to file the include and exclude fields must be specified.

    • include

    Specify which logs to include by specifying the path of the file or using a glob pattern.

    • exclude

    Specify which logs to exclude, also using a glob pattern, this is only applied to type file.

  • parser

    Specify the parser to use to parse the logs, if you install the parser through a package you must specify the type and name as displayed on the parsers page for example linux/system-logs:linux-filebeat.

  • multiLineBeginsWith or multiLineContinuesWith

    The file input can join consecutive lines together to create multiline events, by using a regular expression. It can be configured to use a pattern to look for the beginning or the continuation of multiline events,.

    • Example all multiline events beginning with a date, e.g. 2022- multiLineBeginsWith: ^20\d{2}- in this case every line that doesn't match the pattern, gets appended to the latest line that did.

    • Example lines that start with whitespace are continuations of the previous line multiLineContinuesWith: ^\s+ in this case every line that matches the pattern, gets appended to the latest line that didn't .

  • transforms

    Specify transforms to use for this source (optional), if static_field is specified you must specify a key and a value which can be an environment variable for example myenvvar:$MY_ENV_VAR

  • syslog

    If type is set to syslog you must specify the port, address and mode fields.

    • port

    Specify the number of the port on which to listen. The default is 524.

    • address

    Specify the address to bind to. This defaults to all addresses.

    • mode

    Specify the protocol to listen to, which can be tcp or udp.

  • wineventlog

    If type is set to wineventlog you must specify the channel.

    • channel Specify the windows event log channels to read, if no channel is specified the log collector will subscribe to all available channels. You can also specific IDs using onlyEventIDs.

Important

Subscribing to all channels may impact performance as the amount of data logged would be very high.

yaml
channels:
- <Channel Name>
- ...
  • journald

    If type is set to Journald in order to read JournalD log data (linux only) you must specify the following fields:

    • directory

      Allows you to specify the journal directory to collect from, if not specified collects from the local journal.

    • includeUnits

      If specified only collect from these units

    • excludeUnits

      If specified collect from all units except these.

    • currentBootOnly

      Set to false by default. If true only collect logs from the current boot.

  • cmd

    If type is set to cmd you must specify the fields:

    • cmd

    Specify the command to run.

    • mode

      Can be set to scheduled to collect data at intervals in which case you must specify the interval or streaming to collect data constantly.

    • args

      The arguments of the command.

    • workingDir

      Specifies the directory in which to run the command.

    • interval

      Specifies how frequently the command should be invoked when set to scheduled.

    • environment

      Specify the Environment variables and pass them command to commands using this section.

    • sink

      Set to humio.

queue

This block defines the behavior of the queue.

  • type

    This object defines how memory is managed and can be set to:

    • memory default, The maxLimitInMB can be set but is set to 1024mb by default.

    • disk when set to queue the data is written in the <dataDirectory>/queue/<sinkName>/ unless specified using storageDir. The maxLimitInMB must be set to the maximum size of the queue when set to disk, by default set to 1024.

  • maxEventsPerRequest

    Specify the maximum number of events in a request before the log is sent to Humio. This is applied along with flushTimeOutInMillisecond and whichever is reached first triggers the data to be sent to Humio. The default is 1024.

  • fullAction

    Specify the action to take when the queue is full. The possible values are:

    • deleteOldest accepts new batches but deletes the oldest batch

    • pause this is the default value. the queue does not ingest new batches when it is full. Note that deleteLatest is no longer supported and automatically set to pause.

  • flushTimeOutInMillisecond

    Specify how often data is sent to humio log shipper. The default is 1000.

  • maxBufferedEvents

    Specify the maximum number of events maintained in the buffer, if for example Humio cannot be reached. The default is 100000.

sinks

This object defines details on the sink.

  • type

    Specify the type of sink. This must be set to Humio.

  • token

    Specify the ingest token for your Humio repository or an environment variable.

  • url

    Specify the url of your Humio account for example https://cloud.humio.com.

  • compression

    Specify the type of data compression, possible values: auto, none, gzip, deflate. The default value is auto.

  • compressionLevel

    Specify the level of compression where 1 is best speed and 9 is best compression and undefined or 0 = auto for compression algorithm.

  • tls

    This object contains details on the PEM certificates. this section allows you to override the defaults. Only one of the following options should be specified:

    • insecure

      Specify if certificate validation is needed, if set to true the certificate validation is skipped.

    • caCert

      Specify this key to load a certificate from the config file.

    • caFile

      Specify this key to load the PEM certificate from an external file.

  • proxy

    Set to none for Windows Server or specify, if required, an override proxy configuration for the sink, possible values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for a http proxy. The default is system, which will try to determine the appropriate proxy or fallback to none.

Additional Example Configuration Files

These example configurations can be copied and customized to meet your needs.

Linux

This is an example configuration for a linux environment.

yaml
dataDirectory: /var/lib/humio-log-collector
   sources:
     # Collect local files.
     var_log:
     type: file
     include: /var/log/*
     exclude: /var/log/*.gz
     sink: humio

     # Collect syslog udp 5140.
     syslog_udp_5140:
     type: syslog
     mode: udp
     port: 5140
     sink: humio

     # Collect syslog tcp 5140.
     syslog_tcp_5140:
     type: syslog
     mode: tcp
     port: 5140
     sink: humio

   sinks:
     humio:
        type: humio
          # Replace with your specified ingest token.
        token: $INGEST_TOKEN
          # Replace with your "standard endpoint" API URL: https://library.humio.com/endpoints/
        url: $HUMIO_URL