LogScale Collector Sizing Guide

Minimum Resource Recommendations

In the case where the LogScale Collector is used on a laptop or desktop for gathering systems logs, the requirements are quite sparse and the service running in the background should not be noticeable.

An example of such a setup could consist of:

The System and Application channels from the Windows Event Log source
Log files from your VPN
A cmd source which measures the systems resource usage

In a scenario like this we recommend these resources as a minimum:

Resource	Recommendation
Memory	`4 GB`
Disk	`4 GB`

Note

These numbers are conservative to account for peak buffer/queue usage. During normal operations with a working network connection etc. the actual memory consumption in a scenario like above would be below 100 MB.

Scaling

Generally speaking, the concurrency model behind the LogScale Collector automatically takes advantage of the systems CPU resources.

Source Throughput

Each source has different performance characteristics. The numbers for throughput are based on measurements but it will vary depending on your actual workload.

Source	Throughput	Notes
File	`154 MB/s/vCPU`	Throughput of the file source is bound by disk and/or network I/O. This measurement was done with AWS `io1` disks (64000 IOPS)
Journald	`32 MB/s`
Syslog (TCP)	`100 MB/s/vCPU`	The vCPUs are only utilized when multiple TCP connections are sending data to the LogScale Collector
Syslog (UDP)	`26 MB/s`	The throughput is with UDP packages of size 1472 bytes.
Windows Event Logs	`5 MB/s`	Measured average of around 3000 event/s. Currently the WinEventLog source does not scale automatically with numbers of vCPUs. To improve throughput, isolate high load channels to their own source in the configuration

1 vCPU = 1 ARM physical CPU or 0.5 Intel physical CPU with hyper-threading.

Sink Workers

In some high throughput scenarios the LogScale ingestion endpoint can be a bottleneck, meaning that the measured throughput of a LogScale Collector deployment is lower than expected given the table above.

In those cases it can be beneficial to increase the number of concurrent requests a sink is using to ship logs towards the LogScale ingestion endpoint.

The default number of concurrent network connections requests per sink is 4 and can be increased in the configuration, using workers:

yaml

sinks:
  my_sink:
    type: humio
    url: <..>
    token: <..>
    # Increases number of concurrent connections to LogScale to 8
    workers: 8

It should only be necessary to increase the number of workers when the bottleneck is the number of parallel requests. This can happen when an expensive parser is being used, causing the ingest requests to take longer.

The throughput of a sink is constrained by the time per request in the following function: maxBatchSize * workers * 1/timePerRequest.

If the machine running the Log Collector is not the bottleneck, and LogScale has the capacity to process more requests in parallel, then the number of workers should be increased.

Note

Each worker keeps an internal buffer, starting at 16 MB per worker, which it uses to serialize requests. Therefore, increasing the number of workers also puts additional memory pressure on the Log Collector. If a larger pool of workers is specified than necessary, the Log Collector will also be using more memory than necessary.

Sink Workers Example

How many workers to use in any situation depends on the response time per request of the LogScale server, which in turn depends on the parser used, if requests are going to an on-prem or SaaS solution, the server configuration etc.

Description	#
Goal	11 TB/day = 139 MB/s
Measured server response time	600 ms

Using the default and recommended batchSize of 16 MB, the theoretical limit per worker in this example is: 1/0.600s * 16 MB = 26.66 MB/s.

Thus, the number of workers should be: 139/26.66 = 5.2, rounded up to 6 workers.

This calculation is based on the assumption that data can be read fast enough from the source.

Memory

The memory requirement is linearly proportional to the number of sinks in the configuration plus a constant baseline requirement of 1 GB.

It should not be necessary to increase the default memory queue size. The purpose of the memory queue is to ensure that data is always readily available to the sink, such that the Log Collector can always be actively ingesting. Increasing the queue size is not going to increase the throughput of the sink. If the throughput of the sink is lower than that of the data that is being collected, the queue will eventually fill up.

The default queue size per sink is 1 GB and can be increased (or decreased) in the configuration:

yaml

sinks:
  my_sink:
    type: humio
    token: <..>
    url: <..>
    # Increases queue size to 2 GiB
    queue:
      type: memory
      maxLimitInMB: 2048

  another_sink:
    type: humio
    token: <..>
    url: <..>

The configuration above therefore has a total memory requirement of 1 GB (baseline) + 2 GB (my_sink) + 1 GB (another_sink) = 4 GB.

Back-filling

A running LogScale Collector which is able to deliver the logs continuously to LogScale would not normally use the resources listed above, however, some situations can cause log data to pile up - for instance if a machine is without internet connection for a while but still generates logs.

In such a scenario the LogScale Collector will back-fill the log data when an active internet connection is re-established. The internal memory buffers will fill up for efficient log shipping, and the utilization of the queue could reach 100% (This limit is by default 1 GB/sink).

In addition, if the LogScale Collector is unable to deliver the logs to the server fast enough or not at all, a large amount of memory could potentially be used.

For instance, if the LogScale Collector is tasked with back-filling 1000 large files, data will potentially be read into the systems faster than it can be delivered to the LogScale server, and in such an example the memory usage would rise to: 1 GB (baseline) + 1 GB (sink) + 1000 * 16 MB (internal buffer per file, one batch size) = 18 GB.

Disk

Disk size is only relevant if the disk queue is used. In most scenarios, When and if the disk queue makes sense depends on the deployment setup.

For instance the disk queue is unnecessary if the LogScale Collector is able to read back the data from a source in case of an interruption. This is the case for these sources: Windows Event Logs, journald and file sources. All these use a bookmarking system to keep track of how far data has been read and processed.

So, essentially the disk queue only makes sense for source where such a book keeping system is impossible, which at the moment only is the syslog source.

When using the disk queue, it is usually sufficient to keep 10 minutes worth of data is usually sufficient. So, if data flowing through a LogScale Collector deployment is averaging 40 MB/s, you should provision at least 24 GB of disk space (40 MB * 60 seconds * 10 minutes).

Example Deployments

Make sure your LogScale deployment is provisioned accordingly and meets the requirements for the ingestion amount. See Installing LogScale.

Large Syslog (TCP) deployment - 10TB/day

10 TB/Day = 121.4 MB/s (121.4 MB/s)
(100 MB/s/vCPU) = 1.21 vCPUs, rounded up to 2 vCPUs
Recommended m6i.xlarge with 4 vCPUs to account for spikes in traffic and possible backpressure from network

Table: Large Syslog Source

Software	Instances	EC2 Instance Type / vCPU	Memory	Storage
LogScale Collector	1	m6i.xlarge / 4	16 GB	gp2

Medium Windows Event Logs Deployment - 1 TB/Day

By isolating the ForwardedEvents channel to its own source in the configuration, it is possible to get a throughput of roughly 10 MB/s on an instance.
1 TB/Day = 12.14 MB/s
(12.14 MB/s) / (10 MB/s/instance) = 1,2 instance rounded up to 2

Table: Medium Windows Event Source

Software	Instances	EC2 Instance Type / vCPU	Memory	Storage
LogScale Collector	2	m6i.large / 2	16 GB	gp2

Large File Source Deployment - 1 TB/Day

100 TB/Day = 1214 MB/s
(1214 MB/s) / (154 MB/s/vCPU) = 7,9 vCPUs, rounded up to 8.
Since 1214 MB/s is more than the max throughput of AWS io1 volumes of 1000 MB/s, we go with two instances.

Table: Large File Source

Software	Instances	EC2 Instance Type / vCPU	Memory	Storage
LogScale Collector	2	m6i.xlarge / 4 vCPU	16 GB	io2

Minimum Resource Recommendations

Note

Scaling

Source Throughput

Sink Workers

Note

Sink Workers Example

Memory

Back-filling

Disk

Example Deployments

Large Syslog (TCP) deployment - 10TB/day

Medium Windows Event Logs Deployment - 1 TB/Day

Large File Source Deployment - 1 TB/Day

Enter search term