Falcon LogScale Collector Sizing Guide
The numbers in this guide are based on measurements and experience from running the LogScale Collector in production. However, the actual needed size of your LogScale Collector instances depends on the workloads, and we recommend testing to determine those numbers.
See the following for more information:
Minimum Resource Recommendations
In the case where the LogScale Collector is used on a laptop or desktop for gathering systems logs, the requirements are quite sparse and the service running in the background should not be noticeable.
An example of such a setup could consist of:
The
System
andAppliation
channels from the Windows Event Log sourceLog files from your VPN
A
cmd
source measuring the systems resource usage
In a scenario like this we recommend these resources as a minimum:
Resource | Recommendation |
---|---|
Memory |
4 GB
|
Disk |
4 GB
|
Note
These numbers are conservative to account for peak buffer/queue usage. During normal operations with a working network connection etc. the actual memory consumption in a scenario like above would be below 100 MB.
Scaling
Generally speaking, the concurrency model behind the LogScale Collector automatically takes advantage of the systems CPU resources.
Source Throughput
Each source has different performance characteristics. The numbers for throughput are based on measurements but it will vary depending on your actual workload.
Source | Throughput | Notes |
---|---|---|
File |
154 MB/s/vCPU
|
Throughput of the file source is bound by disk and/or
network I/O. This measurement was done with AWS
io1 disks (64000 IOPS)
|
Journald |
32 MB/s
| |
Syslog (TCP) |
100 MB/s/vCPU
| The vCPUs are only utilized when multiple TCP connections are sending data to the LogScale Collector |
Syslog (UDP) |
26 MB/s
| The throughput is with UDP packages of size 1472 bytes. |
Windows Event Logs |
5 MB/s
| Measured average of around 3000 event/s. Currently the WinEventLog source does not scale automatically with numbers of vCPUs. To improve throughput, isolate high load channels to their own source in the configuration |
1 vCPU = 1 ARM physical CPU or 0.5 Intel physical CPU with hyperthreading.
Sink Workers
In some high throughput scenarios the LogScale ingestion endpoint can be a bottleneck, meaning that the measured throughput of a LogScale Collector deployment is lower than expected given the table above.
In those cases it can be beneficial to increase the number of concurrent requests a sink is using to ship logs towards the LogScale ingestion endpoint.
The default number of concurrent network connections requests per
sink is 4
and can be increased in
the configuration, using workers
:
sinks:
my_sink:
type: humio
url: <..>
token: <..>
# Increases number of concurrent connections to LogScale to 8
workers: 8
Sink Workers Example
How many workers to use in any situation depends on the response time per request of the LogScale server, which in turn depends on the parser used, if requests are going to an on-prem or SaaS solution, the server configuration etc.
Description | # |
---|---|
Goal | 11 TB/day = 139 MB/s |
Measured server response time | 600 ms |
Using the default and recommended batchSize
of
16 MB
, the theoretical limit per worker in this
example is: 1/0.600s * 16 MB = 26.66 MB/s
.
Thus, the number of workers should be: 139/26.66 =
5.2
, rounded up to 6 workers.
This calculation is based on the assumption that data can be read fast enough from the the source.
Memory
The memory requirement is linearly proportional to the number of sinks in the configuration plus a constant baseline requirement of 1 GB.
The default queue size per sink is 1 GB and can be increased (or decreased) in the configuration:
sinks:
my_sink:
type: humio
token: <..>
url: <..>
# Increases queue size to 2 GiB
queue:
type: memory
maxLimitInMB: 2048
another_sink:
type: humio
token: <..>
url: <..>
The configuration above therefore has a total memory requirement of
1 GB (baseline) + 2 GB (my_sink) + 1 GB (another_sink) = 4
GB
.
Memory Usage Log Messages
The LogScale Collector will output the log messages below in case of high memory usage,
For sinks with default queue configuration,
fullAction
: pause
Table: fullAction: pause
Queue Utilization | Log Level | Log Message |
---|---|---|
100% | Warning | Memory queue is full. Sources that are sending to this sink are paused until space is available again. |
80% | Warning | Memory queue is 80% full. If the queue becomes full, sources will be paused until there is space. |
50% | Warning | Memory queue is 50% full. If the queue becomes full, sources will be paused until there is space. |
For sinks with queue configuration,
fullAction
: deleteOldest
.
Table: fullAction: deleteOldest
Queue Utilization | Log Level | Log Message |
---|---|---|
100% | Info | mem-queue is full, dropping oldest batch as configured |
50% | Warning | Memory queue is 50% full. If the queue becomes full, the oldest data will be deleted to make space for new data. |
80% | Warning | Memory queue is 80% full. If the queue becomes full, the oldest data will be deleted to make space for new data. |
Back-filling
A running LogScale Collector which is able to deliver the logs continuously to LogScale would not normally use the resources listed above, however, some situations can cause log data to pile up - for instance if a machine is without internet connection for a while but still generates logs.
In such a scenario the LogScale Collector will back-fill the log data when an active internet connection is re-established. The internal memory buffers will fill up for efficient log shipping, and the utilization of the queue could reach 100% (This limit is by default 1 GB/sink).
In addition, if the LogScale Collector is unable to deliver the logs to the server fast enough or not at all, a large amount of memory could potentially be used.
For instance, if the LogScale Collector is tasked with back-filling 1000
large files, data will potentially be read into the systems faster than
it can be delivered to the LogScale server, and in such an example the
memory usage would rise to: 1 GB (baseline) + 1 GB (sink) +
1000 * 16 MB (internal buffer per file, one batch size) = 18
GB
.
Disk
Disk size is only relevant if the disk queue is used. In most scenarios, When and if the disk queue makes sense depends on the deployment setup.
For instance the disk queue is unnecessary if the LogScale Collector is able to read back the data from a source in case of an interruption. This is the case for these sources: Windows Event Logs, journald and file sources. All these use a bookmarking system to keep track of how far data has been read and processed. So, essentially the disk queue only makes sense for source where such a book keeping system is impossible, which at the moment only is the syslog source.
When using the disk queue, it is usually sufficient to keep 10 minutes
worth of data is usually sufficient.So, . E.g. if data flowing through a
LogScale Collector deployment is averaging 40 MB/s, you should provision
at least 24 GB of disk space (40 MB * 60 seconds * 10
minutes
).
Example Deployments
Make sure your LogScale deployment is provisioned accordingly and meets the requirements for the ingestion amount. See Recommended Installation Architectures.
Large Syslog (TCP) deployment - 10TB/day
10 TB/Day = 121.4 MB/s (121.4 MB/s)
(100 MB/s/vCPU) = 1.21 vCPUs, rounded up to 2 vCPUs
Recommended m6i.xlarge with 4 vCPUs to account for spikes in traffic and possible backpressure from network
Table: Large Syslog Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
LogScale Collector | 1 | m6i.xlarge / 4 | 16 GB | gp2 |
Medium Windows Event Logs Deployment - 1 TB/Day
By isolating the ForwardedEvents channel to its own source in the configuration, it is possible to get a throughput of roughly 10 MB/s on an instance.
1 TB/Day = 12.14 MB/s
(12.14 MB/s) / (10 MB/s/instance) = 1,2 instance rounded up to 2
Table: Medium Windows Event Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
LogScale Collector | 2 | m6i.large / 2 | 16 GB | gp2 |
Large File Source Deployment - 1 TB/Day
100 TB/Day = 1214 MB/s
(1214 MB/s) / (154 MB/s/vCPU) = 7,9 vCPUs, rounded up to 8.
Since 1214 MB/s is more than the max throughput of AWS io1 volumes of 1000 MB/s, we go with two instances.
Table: Large File Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
LogScale Collector | 2 | m6i.xlarge / 4 vCPU | 16 GB | io2 |