Best Practice: Choosing a Log Shipper

Last Updated: 2022-02-01

Humio can ingest data from a variety of sources including:

This KB article covers recommendations specifically related to choosing log shippers to use and how to use them in a highly available way.

Log Shipper Recommendations

The choice of log shippers to use in conjunction with Humio should be made based on the following factors:

  • Functionality

    Can the log shipper handle all of the log sources that an organization needs to send to their logging platform? Does the log shipper feature the ability to transform events and/or fields if required by the organization?

  • Ease of deployment

    How easy is it to deploy the selected log shipper, configure it to send data to Humio, and deploy updates as required?

  • Reliability

    How reliable is the chosen log shipper? Does your organization have the ability to troubleshoot issues should they arise?

  • Speed

    Will the log shipper be able to keep up with the volume of data you need to send to Humio?

Humio supports a wide variety of log shippers (see Log Shippers for more infromation), however the three most commonly used are Filebeat, Vector, and Fluentd.

Highly Available Configurations

Most modern log shippers are designed to be able to handle common failure scenarios including:

  • Log shipper failure

    Log shippers maintain a record of the last event successfully transmitted to the target platform. When a log shipper recovers from its failure state it will refer to this record to begin sending data again.

  • Network failure/target unreachable

    Under network failure conditions, when the target source is unavailable, or when the target source is unable to keep up with the volume of logs being sent most log shippers are able to buffer outgoing logs until the condition resolves.

While it isn't possible to configure log shippers to stop network or target system failures it is possible to deploy log shippers in highly available configurations using a load balancer or queuing system that can provide the following benefits:

  • Increased uptime

    Minimize the downtime associated with log shipper failures and application upgrades.

  • Increased throughput

    Increase the number of events per second that can be shipped by combining multiple log shippers behind a firewall or queue which can be particularly useful in scenarios like when using the Windows Event Forwarding Framework.

Recommendations for how to deploy the log shippers and which load balancer or queuing system to use are use case specific. For many organizations the decision of which approach to use will be obvious but for others the answer might be a combination of both load balancing and queue based approaches. The following sections briefly describe load balancer and queue based high availability models and their pros and cons.

High Availability Using A Load Balancer

One way to achieve high availability with log shipping is to place a pool of log shipping agents behind a load balancer. In this model source systems sending logs to Humio would send their logs to a single address (URL or IP) and the load balancer would manage the process of balancing the traffic across a series of log shippers. This model of load balancing log shippers is commonly used under for the following reasons:

  • The organization wants to minimize the number of devices sending log events through their firewall to external targets (Humio);

  • The organization uses a framework like the Windows Event Forwarding framework that gathers event logs from Windows servers and desktops in a centralized manner before forwarding on to Humio;

  • The organization wants to take advantage of the transformation capabilities offered by a log shipper like Fluentd or Vector without having to deploy the log shipper on every source machine (when sending syslog from firewalls or network switches for example).

For most organizations the primary reason to deploy log shippers in a highly available configuration is when those log shippers are acting as a centralized point for forwarding logs on to Humio. Most common load balancers (HAProxy, Nginx, etc.) would support this use case. The following diagram presents a high level architectural overview of a load balanced log shipping solution:

graph TD src1[Source 1] --> lb1[Load Balancer] src2[Source 2] --> lb1 src3[Source 3] --> lb1 src4[Source 4] --> lb1 src5[Source 5] --> lb1 srcN[Source N] --> lb1

lb1 --> logS1 lb1 --> logS2 lb1 --> logS3

logS1[Log Shipper] --> lb2[Load Balancer] logS2[Log Shipper] --> lb2 logS3[Log Shipper] --> lb2

lb2 --> humio1 lb2 --> humio2 lb2 --> humioN

humio1[Humio 1] humio2[Humio 2] humioN[Humio N]

style lb1 fill:#2ac76d; style lb2 fill:#32a852;

Example Nginx Configuration

The following sample Nginx configuration demonstrates how to load balance Syslog connections over TCP and UDP to two log shipping nodes (10.100.0.5:514, 10.100.0.6:514) from the NGINX proxy (10.100.0.26:514):

stream
{
 upstream syslog_udp {
 Server 10.100.0.5:514;
 Server 10.100.0.6:514;
 }

 upstream syslog_tcp {
 Server 10.100.0.5:514;
 Server 10.100.0.6:514;
 }

 server {
 Listen 10.100.0.26:514 udp;
 Proxy_pass syslog_udp;
 Proxy_responses 0;
 }

 server {
 Listen 10.100.0.26:514;
 Proxy_pass syslog_tcp;
 }
}

worker_rlimit_nofile 1000000;

events
{
 worker_connections 20000;
}
Load Balancing using a Queue

An alternative solution is to use a queue. In this high availability model all of your log producers write events to the target queue. Multiple agents (log shippers) are deployed to read from the queue. There are several possible queue platforms that you can use including Kafka, Amazon SQS, and Amazon S3. The decision of which platform to use for your queue will depend on your organization's existing infrastructure. It should be noted however that one of the important aspects of a queuing system is that there is a mechanism in place to prevent events from being read and forwarded by multiple log shippers.

Generally speaking organizations should not attempt to build ad hoc queuing systems when it is possible to use a platform like Amazon SQS or Kafka. These platforms have existing integrations with Humio that minimize the level of effort required to implement a queuing solution.