Humio Log Collector

The Humio Log Collector is the native Humio Log Collector which can be used to collect and send events to a Humio repository. The Log Collector uses Humio ingest tokens to route data to the relevant repositories.

Refer to the following documentation for more information on the Humio Log Collector:

See the following sections for more information on:

Event Sources

The Humio Log Collector currently supports the following inputs or data sources:

Collecting Events from Files

Collecting events from local files on disk is one of the most common log collection scenarios. Examples include logs produced by custom applications, web servers, and firewalls.

  • Glob pattern to specify the file(s) to collect; recursively collect files from a directory

  • Glob pattern to exclude files

  • Sends the entire existing content of files it finds

  • Tails existing files looking for new events

  • Multiline logs

  • Handles log rotation scenarios

Collecting Windows Events

Collecting Windows Events is simple and produces rich events. The Log Collector attempts to automatically detect which channels are available, or you can explicitly identify which channels you want to collect.

The Log Collector uses the internal Windows events templates to ensure the event is fully parsed where possible; this means that not only can you see the human readable representation of the event, you get all fields parsed automatically and the XML representation of the event.

Syslog Receiver

Collecting TCP and UDP syslog streams from within the infrastructure is an important feature in securing legacy logging scenarios. The Log Collector can listen for TCP or UDP traffic on any port and will receive and buffer that data and stream it to Humio. If you have an appliance that can only produce unsecured syslog traffic to a local device, or sends syslog over UDP and you want maximum durability for those events, deploying the Log Collector close to such data sources is the answer.

The Log Collector does not have side effects in this scenario, and doesn't tamper with the events in any way (i.e. no manipulation of the syslog headers), but does provide additional useful metadata on the events outside of the syslog envelope.

Exec Input

The Log Collector supports running a user configured subprocess to gather log data. This process is run based on a schedule and all the output produced by the subprocess on stderr and stdout is streamed to Humio as events.

This allows the Log Collector to gather any information from the host that is available from the standard tools, or administrators can provide a script. This custom input type can be used to extend the Log Collector to check host metrics, perform ping and HTTP based polling, or pull data from any other kind of API or service.

Collecting Logs from SystemD on Linux

The journald source collects systemd logs from a local linux journal. The structured journal has some advantages compared to plain text files, including built in filtering on specific systemd units, reading logs from the current boot only and built in log rotation. The output of the source is similar, depending on the configuration, to what you would see with the journal viewer journalctl.

Outputs or Sinks

The Humio Log Collector is designed to send data to Humio only. It makes use of Humio’s proprietary ingest APIs as these have been optimized for efficient transport of event data including features like hierarchical metadata.

The Humio ingest APIs currently transport data over HTTP to the same ports that are used for the web interface for Humio, no special ports need to be configured. By default the data is compressed and requires HTTPS, although these can be configured. The Log Collector also supports custom TLS configuration, and HTTP(S) proxies as required.

Buffering

The Humio Log Collector buffers events before sending to Humio. This allows the Log Collector to optimize between efficient batch sizes and minimal ingest lag. For input types where the data cannot be re-read (syslog, and exec) these buffers also provide some durability for the data.

Metadata

To ensure the data that comes from the Log Collector is useful we attach metadata to all the events that are sent. The exact metadata that is sent depends on the source, but everything is prefixed with @collect.*, this includes details about the host that sent the event, etc.