Configuring Humio Log Collector

The Humio Log Collector is configured through a yaml configuration file which can be found in:

  • Linux

    /etc/humio-log-collector/config.yaml

  • Windows

    C:\\Program Files (x86)\\CrowdStrike\\Humio Log Collector\\config.yaml

Additional environment variables can be configured in this file on Linux. On Windows the environment variables have to be configured in system properties /etc/default/humio-log-collector.

Editing the Configuration

These steps explain how to configure the config.yaml file to ship data to Humio.

  1. Open the file config.yaml to edit using the editor of your choice, for example on Linux:

    humio
    sudo vi /etc/humio-log-collector/config.yaml
  2. Edit the file and specify the fields and values described in Configuration Objects or you can try data ingestion by specifying:

    • name

    • under sources you must specify type and include

    • under sinks you must specify type, token and url

  3. Save the changes and restart the service.

humio
sudo systemctl restart humio-log-collector.service
Minimal Configuration Example File Collection

This configuration is the minimal configuration needed to collect events from local log files. The sources section describes the data that should be collected, and the sinks section describes where those events should be sent. The sinks can be reused and are referenced by name in the source.

yaml
dataDirectory: data
sources:
  apache_logs:
    type: file
    include: /var/log/apache/*.log
    sink: my_humio_instance

sinks:
  my_humio_instance:
    type: humio
    token: <ingest-token>
    url: https://cloud.community.humio.com

Note

You must set the url and token values that correspond to your Humio instance and repository.

Possible Sources and Example Configuration Files

The following sections details the specific configurations for each sources type along with example configuration files additionally you can find a description of the fields below each example .

File. 
yaml
fleetManagement:
   token: 4b09c4f7-2364-605t-a55f-d5d2fg881d66
   url: https://cloud.us.humio.com

dataDirectory: data
sources:
  apache_logs:
    type: file
    # Glob patterns
    include: /var/log/apache/.log
    exclude: /var/log/apache/not_me.log
    sink: my_humio_instance
    parser: accesslog
    multiLineBeginsWith: ^20\d{2}-
    transforms:
      # static_fields transform adds configured key, value pairs as fields
      - type: static_fields
        fields:
          mykey: myvalue
          # Passing environment variables is supported
          myenvvar: $MY_ENV_VAR
sinks:
  my_other_humio_instance:
    type: humio
    token: <ingest-token_repo1>
    url: https://cloud.us.humio.com
  my_humio_instance:
    type: humio
    token: <ingest-token-repo2> or an environment variable
    url: https://cloud.us.humio.com
    compression: gzip
    compressionLevel: 9
    tls:
      insecure: false

        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      caFile: /etc/ssl/cert.pem

    proxy: none

    queue:
      fullAction: deleteOldest
      memory:
        flushTimeOutInMillisecond: 200
        maxLimitInMB: 1024
File Source When type is set to file the following configurations:
  • If type is set to file the include and exclude fields must be specified.

    • include

      Specify which logs to include by specifying the path of the file or using a glob pattern.

    • exclude

      Specify which logs to exclude, also using a glob pattern, this is only applied to type file.

  • inactivityTimeout

    Specify the period of inactivity (not written for a configurable period default: 60 seconds) for a file being monitored before the file descriptor is closed to release system resource, and watched for changes instead. Whenever the file changes, it is re-opened.

  • parser

    Specify the parser to use to parse the logs, if you install the parser through a package you must specify the type and name as displayed on the parsers page for example linux/system-logs:linux-filebeat.

  • multiLineBeginsWith or multiLineContinuesWith

    The file input can join consecutive lines together to create multiline events, by using a regular expression. It can be configured to use a pattern to look for the beginning or the continuation of multiline events,.

    • Example all multiline events beginning with a date, e.g. 2022- multiLineBeginsWith: ^20\d{2}- in this case every line that doesn't match the pattern, gets appended to the latest line that did.

    • Example lines that start with whitespace are continuations of the previous line multiLineContinuesWith: ^\s+ in this case every line that matches the pattern, gets appended to the latest line that didn't .

  • transforms

    Specify transforms to use for this source (optional), if static_field is specified you must specify a key and a value which can be an environment variable for example myenvvar:$MY_ENV_VAR

See Common Configuration Elements for information on the common elements in the configuration file.
Syslog. 
yaml
fleetManagement:
  token: 4b09c4f7-2364-605t-a55f-d5d2fg881d66
  url: https://cloud.us.humio.com

dataDirectory: data
sources:
  syslog:
    type: syslog
    # Mode must be 'udp' or 'tcp'
    mode: udp
    # Port number to listen on
    # Default: 514
    port: 514
    # Optional bind address.
    # If unspecified the source will listen on all interfaces
    # Don't specify port here. Use 'port' field for that
    bind: 0.0.0.0
    sink: my_other_humio_instance
  sinks:
    my_other_humio_instance:
      type: humio
      token: <ingest-token_repo1>
      url: https://cloud.us.humio.com
    my_humio_instance:
      type: humio
      token: <ingest-token-repo2> or an environment variable
      url: https://cloud.us.humio.com
      compression: gzip
      compressionLevel: 9
      tls:
        insecure: false

        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      caFile: /etc/ssl/cert.pem

    proxy: none

    queue:
      fullAction: deleteOldest
      memory:
        flushTimeOutInMillisecond: 200
        maxLimitInMB: 1024
Syslog Source If type is set to syslog you must specify the port, address and mode fields.
  • port

    Specify the number of the port on which to listen. The default is 524.

  • address

    Specify the address to bind to. This defaults to all addresses.

  • mode

    Specify the protocol to listen to, which can be tcp or udp.

See Common Configuration Elements for information on the common elements in the configuration file.
Windows Event Log Example. 
yaml
fleetManagement:
  token: b2XXXXXX-fd23-XXXX-98e9-1890e6XXXXXX
  ## Change the URL if needed to reflect your LogScale URL.
  url: https://cloud.us.humio.com
  ## Keep this option as "none" unless you actually need a proxy.
  proxy: none
  ## The TLS option can be uncommented if you're using a self-signed certificate. 
  #tls:
    #insecure: true
 
dataDirectory: C:\ProgramData\CrowdStrike\Humio Log Collector\
 
sources:
  windows_events:
    type: wineventlog
    ## Add other channels by simple adding additional "name" lines.
    ## The following command can be used to find other channels:
    ## Get-WinEvent -ListLog * -EA silentlycontinue | sort-object -Property Recordcount -desc
    channels:
      - name: Application
      - name: Security
      - name: System
      - name: Windows PowerShell
    ## You can manually specify a parser to be used here.
    ## This overrides the parser specified in the LogScale UI.
    #parser: myparser
    sink: humio
     
sinks:
  humio:
    type: humio
    token: 2eXXXXXX-81d1-XXXX-bc22-05e430XXXXXX
    ## Change the URL if needed to reflect your LogScale URL.    
    url: https://cloud.us.humio.com
    ## Keep this option as "none" unless you actually need a proxy.
    proxy: none
    ## The TLS option can be uncommented if you're using a self-signed certificate. 
    #tls:
      #insecure: true
    ## This increases the maximum single event size to 8 MB. You can change as needed.
    maxEventSize: 8388608
    ## Uncomment if you would like to force a specific level of gzip compression. 9 is the highest.
    #maxBatchSize: 16777216
      #compression: gzip
      #compressionLevel: 9
Windows Event Log Source If type is set to wineventlog you must specify the channel.
  • channel Specify the windows event log channels to read, if no channel is specified the log collector will subscribe to all available channels. You can also specific IDs using onlyEventIDs.

    Important

    Subscribing to all channels may impact performance as the amount of data logged would be very high.

    yaml
    channels:
    - <Channel Name>
    - ...
  • providers specify an array of provider names to filter events by provider.

  • parser Specify the parser to use to parse the logs, if you install the parser through a package you must specify the type and name as displayed on the parsers page for example linux/system-logs:linux-filebeat, see Parsers for more information.

See Common Configuration Elements for information on the common elements in the configuration file.

Important

Override proxy configuration for the sink. Must be set to none for Windows Server.

Journal. 
yaml
fleetManagement:
token: 4b09c4f7-2364-605t-a55f-d5d2fg881d66
url: https://cloud.us.humio.com

dataDirectory: data
sources:
  journal:
    # Example for reading journald log data (linux only)
    type: journald
    sink: my_humio
    # Optional. If not specified collect from the local journal
    directory: /var/log/journal
    # If specified only collect from these units
    includeUnits:
      - systemd-modules-load.service
    # If specified collect from all units except these
    excludeUnits:
      - systemd-modules-load.service
    # Default: false. If true only collect logs from the current boot
    currentBootOnly: false
sinks:
    my_other_humio_instance:
      type: humio
      token: <ingest-token_repo1>
      url: https://cloud.us.humio.com
    my_humio_instance:
      type: humio
      token: <ingest-token-repo2> or an environment variable
      url: https://cloud.us.humio.com
      compression: gzip
      compressionLevel: 9
      tls:
        insecure: false
  
          -----BEGIN CERTIFICATE-----
          ...
          -----END CERTIFICATE-----
        caFile: /etc/ssl/cert.pem
  
      proxy: none
  
      queue:
        fullAction: deleteOldest
        memory:
          flushTimeOutInMillisecond: 200
          maxLimitInMB: 1024
Journal Source type is set to Journald in order to read JournalD log data (linux only) you must specify the following fields:
  • directory

    Allows you to specify the journal directory to collect from, if not specified collects from the local journal.

  • includeUnits

    If specified only collect from these units

  • excludeUnits

    If specified collect from all units except these.

  • currentBootOnly

    Set to false by default. If true only collect logs from the current boot.

See Common Configuration Elements for information on the common elements in the configuration file.
Exec Example. 
yaml
fleetManagement:
    token: 4b09c4f7-2364-605t-a55f-d5d2fg881d66
    url: https://cloud.us.humio.com

dataDirectory: data
sources:
   cmd_ls:
     type: cmd
     cmd: ls
     # scheduled or streaming
     mode: scheduled
     args:
       - -l
       - -h
     workingDir: /foo
     # Interval between each invocation of the cmd
     interval: 60
     
    # Output mode when using mode 'scheduled'. Either 'streaming' (default) or 'consolidateOutput'.
    # When outputMode is set to 'consolidateOutput', the entire output of the scheduled command is sent as a single event.
    # outputMode: consolidateOutput


     # Environment variables can be configured and passed to the command
     environment:
       # define CONFIGURED_ENV1 as environment variable
       CONFIGURED_ENV1: my_configured_env_1
       # Pass environment variable: MY_ENV_VAR to command
       MY_ENV_VAR: $MY_ENV_VAR
     sink: my_humio

   cmd_tail:
     type: cmd
     cmd: tail
     mode: streaming
     args:
       - -F
     workingDir: /foo
     sink: my_humio

sinks:
  my_other_humio_instance:
    type: humio
    token: <ingest-token_repo1>
    url: https://cloud.us.humio.com
  my_humio_instance:
    type: humio
    token: <ingest-token-repo2> or an environment variable
    url: https://cloud.us.humio.com
    compression: gzip
    compressionLevel: 9
    tls:
      insecure: false

        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      caFile: /etc/ssl/cert.pem

    proxy: none

    queue:
      fullAction: deleteOldest
      memory:
        flushTimeOutInMillisecond: 200
        maxLimitInMB: 1024
Exec Source If type is set to cmd you must specify the fields:
  • cmd

    Specify the command to run.

  • mode

    Can be set to scheduled to collect data at intervals in which case you must specify the interval or streaming to collect data constantly. To create a single multiline event when running in the schedule mode set the option consolidateOutput to true.

  • args

    The arguments of the command.

  • workingDir

    Specifies the directory in which to run the command.

  • interval

    Specifies how frequently the command should be invoked when set to scheduled.

  • environment

    Specify the Environment variables and pass them command to commands using this section.

  • sink

    Set to humio.

See Common Configuration Elements for information on the common elements in the configuration file.
Linux Example. 
yaml
dataDirectory: /var/lib/humio-log-collector
sources:
  # Collect local files.
  var_log:
  type: file
  include: /var/log/*
  exclude: /var/log/*.gz
  sink: humio

  # Collect syslog udp 5140.
  syslog_udp_5140:
  type: syslog
  mode: udp
  port: 5140
  sink: humio

  # Collect syslog tcp 5140.
  syslog_tcp_5140:
  type: syslog
  mode: tcp
  port: 5140
  sink: humio

sinks:
  humio:
    type: humio
      # Replace with your specified ingest token.
    token: $INGEST_TOKEN
      # Replace with your "standard endpoint" API URL: https://library.humio.com/endpoints/
    url: $HUMIO_URL
File Linux Source This configuration example which uses the file source with specific values for collecting var logs. See Common Configuration Elements for information on the common elements in the configuration file.
Common Configuration Elements

Configuration Elements that apply to all log sources.

Fleet Management (fleet management)

The fleet management block configures instances of the log collector to work on the Log Collector Fleet Management

yaml
fleetManagement:
         token: 4b09c4f7-2364-605t-a55f-d5d2fg881d66
         url: https://cloud.us.humio.com
  • token

    This key specifies the token which instances of the log collector to be visualized on the Log Collector Fleet Management page.

  • url

    URL of the humio installation where the fleet management page is hosted.

Sources (sources)

The sources block configures the sources of data that the log collector will send to Humio.

Sinks (sinks)

The sinks block configures the sinks that are used by the source or sources.

yaml
sinks:
            my_other_humio_instance:
              type: humio
              token: <ingest-token_repo1>
              url: https://cloud.us.humio.com
            my_humio_instance:
              type: humio
              token: <ingest-token-repo2> or an environment variable
              url: https://cloud.us.humio.com
            # maxEventSize (default 1MB) sets the limit for a single event in bytes, if exceeded the event will be truncated.
            maxEventSize: 1048576
          
            # maxBatchSize (default: 16 MB), sets the maximum size in bytes of a batch which is sent to the configured sink.
            # This includes fields as well as event data. If exceeded data will be sent in a subsequent batch.
            maxBatchSize: 16777216
          
              # auto, none, gzip, deflate, none. Default: auto
              compression: gzip
          
              # Number between: 1 ... 9.
              #   1 = highest speed
              #   9 = highest compression
              # If unspecified or 0 the default value for the compression algorithm specified in compression is used
              compressionLevel: 9
          
              # Override default tls configuration
              # Only one of the following options should be used at a time.
              # If multiple are given, the precedence is: 'insecure', 'caCert', 'caFile'.
              tls:
                # Specify insecure to skip certificate validation
                insecure: false
          
                # Specify caCert to load a PEM certificate from the config file
                caCert: |
                  -----BEGIN CERTIFICATE-----
                  ...
                  -----END CERTIFICATE-----
          
                # Specify caFile to load PEM certificate from an external file.
                caFile: /etc/ssl/cert.pem
              # Override proxy configuration for the sink. Must be set to 'none' for Windows Server.
              # Accepted values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for an http proxy.
              # Defaults to system, which will try to determine the appropriate proxy or fallback to none.
              proxy: none
  • type

    Specify the type of sink. This must be set to Humio.

  • token

    Specify the Ingest Tokens for your Humio repository or an environment variable.

  • url

    Specify the url of your Humio account for example https://cloud.humio.com.

  • maxBatchSize

    Specifies the max size of batch (default 16MB) and works along with the maximum events per request. The limits are propagated to the queue and replace the maxEventsPerRequest. The limits are also propagated to all the sources that reference the sink.

  • maxEventsPerRequest

    Specify the max number of events per request by size (Default 1MB) and works with maxBatchSize.

  • compression

    Specify the type of data compression, possible values: auto, none, gzip, deflate. The default value is auto.

  • compressionLevel

    Specify the level of compression where 1 is best speed and 9 is best compression, if set to undefined or 0 the default value for the compression algorithm specified in compression is applied.

  • tls

    This object contains details on the PEM certificates. this section allows you to override the defaults. Only one of the following options should be specified:

    • insecure

      Specify if certificate validation is needed, if set to true the certificate validation is skipped.

    • caCert

      Specify this key to load a certificate from the config file.

    • caFile

      Specify this key to load the PEM certificate from an external file.

  • proxy

    Set to none for Windows Server or you can specify, if required, an override proxy configuration for the sink, possible values: 'system', 'none' or a URL such as: http://127.0.0.1:3129 for a http proxy. The default is system, which will try to determine the appropriate proxy or fallback to none.

Queue (queue)

The queue block is part of the Sinks (sinks) and configures the behaviour of the queue.

Note

The memory queue no longer supports configuration of maxEventsPerRequest, it inherits the maximum bytes per request from the sink maxBatchSize.

yaml
queue:
                # Default: 1024
          
                # fullAction determines queue behavior when it is full.
                #   pause = queue pauses ingesting new batches if it is full (Default if not mentioned) deleteLatest is no longer support and automatically set to pause.
                #   deleteOldest = queue deletes the oldest batch to accept new batches if it is full
                # Default: pause
                fullAction: deleteOldest
          
                memory:
                  # Default: 1000
                  flushTimeOutInMillisecond: 200
                  # Default: 2048
                  maxLimitInMB: 1024
  • type

    This object defines how memory is managed and can be set to:

    • memory default, The maxLimitInMB can be set but is set to 1024mb by default.

    • disk when set to queue the data is written in the <dataDirectory>/queue/<sinkName>/ unless specified using storageDir. The maxLimitInMB must be set to the maximum size of the queue when set to disk, by default set to 1024.

  • fullAction

    Specify the action to take when the queue is full. The possible values are:

    • deleteOldest accepts new batches but deletes the oldest batch

    • pause this is the default value. the queue does not ingest new batches when it is full. Note that deleteLatest is no longer supported and automatically set to pause.

  • flushTimeOutInMillisecond

    Specify how often data is sent to humio log shipper. The default is 1000.

Disabling Updates

By default, the log collector is automatically updated however, if you have connection issues or the server on which you are installing the log collector is not connected to the internet, you may need to disable automatic updates.

LOG_COLLECTOR_UPDATE_SERVER=disabled
  • Set the server setting to disable In this case, updates are disabled. This is useful in airgapped environments.

  • Not set. In this case, Logscale uses our update server via a URL defined in the code.

  • Set to a specific URL. In this case, we will connect to the specified URL for updates.

Checkpoints

By default, the configuration file points to the directory var/lib/humio-log-collector as the storage for checkpoints.

  1. Stop the Log Collector service humio-log-collector.service.

  2. Delete the checkpoints.json file to reset the state of the installation.

  3. Restart the Humio Log Collector service.

Troubleshooting

You can troubleshoot the Humio Log Collector using Using Console Stderr or the Debug Log.

Using Console Stderr

The Log Collector sends information to stderr if run from the CLI, the information is sent using JSON format and the detail level is controlled by the log-level. The log-level can be specified using two different approaches (highest priority first):

  • Using a command line argument: -log-level debug

  • Configuring a log-level in the config file (yaml):logLevel: debug

The following log-levels are supported:

  • trace (highest verbosity)

  • debug

  • info

  • warn

  • error

  • fetal

The -log-pretty command line argument enables pretty-printing of console output for all logs, it has no effect on logs sent to Humio, they use JSON format.

Debug Log

The Humio Log Collector debug log can be sent to a Humio instance by setting the HUMIO_DEBUG_LOG_ADDRESS and HUMIO_DEBUG_LOG_TOKEN environment variables:

humio
HUMIO_DEBUG_LOG_ADDRESS=https://<your-humio-instance>
HUMIO_DEBUG_LOG_TOKEN=<your-ingest-token>

To view low levels logs for the Humio Log Collector run the shipper on the CLI and pass the switch :

humio
-log -level debug

Logs are then emitted to stdout; this can be helpful if connectivity is part of the problem.

Configuration Objects

This section describes the objects and keys under pipelines.

  • name Specify a unique name for your pipeline.