Falcon Log Collector Sizing Guide
The numbers in this guide are based on measurements and experience from running the Falcon Log Collector in production. However, the actual size needed for your Falcon Log Collector instances depends on the workloads, and we recommend testing to determine those numbers.
See the following for more information:
Minimum Resource Recommendations
In the case where the Falcon Log Collector is used on a laptop or desktop for gathering systems logs, the requirements are quite sparse and the service running in the background should not be noticeable.
An example of such a setup could consist of:
The
System
andApplication
channels from the Windows Event Log sourceLog files from your VPN
A
cmd
source which measures the systems resource usage
In a scenario like this we recommend these resources as a minimum:
Resource | Recommendation |
---|---|
Memory |
4 GB
|
Disk |
4 GB
|
Note
These numbers are conservative to account for peak buffer/queue usage. During normal operations with a working network connection etc. the actual memory consumption in a scenario like above would be below 100 MB.
Scaling
Generally speaking, the concurrency model behind the LogScale Collector automatically takes advantage of the systems CPU resources.
Source Throughput
Each source has different performance characteristics. The numbers for throughput are based on measurements but it will vary depending on your actual workload. The Version column states the version of the LogScale Collector used for the throughput test.
Source | Throughput | Version | Notes |
---|---|---|---|
File |
154 MB/s/vCPU
| 1.3.3 |
Throughput of the file source is bound by disk and/or
network I/O. This measurement was done with AWS
io1 disks (64000 IOPS)
|
Journald |
32 MB/s
| 1.3.3 | |
Syslog (TCP) |
100 MB/s/vCPU
| 1.3.3 | The vCPUs are only utilized when multiple TCP connections are sending data to the Falcon Log Collector |
Syslog (UDP) |
26 MB/s
| 1.4.2 | The throughput is with UDP packages of size 1472 bytes. |
Windows Event Logs |
15 MB/s
| 1.4.1 |
Measured average of around 3000 event/s. Currently the
WinEventLog source does not scale automatically with
numbers of vCPUs.
NoteTo improve throughput, isolate high load channels to their own source in the configuration. |
Unified Log Data |
30MB/s
| 1.4.2 | Measured by having the collector catch up after a 2 hours pause. Approximately 1 GB JSON data. |
1 vCPU = 1 ARM physical CPU or 0.5 Intel physical CPU with hyper-threading.
Sink Workers
In some high throughput scenarios the LogScale ingestion endpoint can be a bottleneck, meaning that the measured throughput of a Falcon Log Collector deployment is lower than expected given the table above.
In those cases it can be beneficial to increase the number of concurrent requests a sink is using to ship logs towards the LogScale ingestion endpoint.
The default number of concurrent network connections requests per
sink is 4
and can
be increased in the configuration, using
workers
:
sinks:
my_sink:
type: humio
url: <..>
token: <..>
# Increases number of concurrent connections to LogScale to 8
workers: 8
It should only be necessary to increase the number of workers when the bottleneck is the number of parallel requests. This can happen when an expensive parser is being used, causing the ingest requests to take longer.
The throughput of a sink is constrained by the time per request in
the following function: maxBatchSize
*
workers
*
1/timePerRequest
.
If the machine running the Falcon Log Collector is not the bottleneck, and LogScale has the capacity to process more requests in parallel, then the number of workers should be increased.
Note
Each worker keeps an internal buffer, starting at 16 MB per worker, which it uses to serialize requests. Therefore, increasing the number of workers also puts additional memory pressure on the Falcon Log Collector. If a larger pool of workers is specified than necessary, the Falcon Log Collector will also be using more memory than necessary.
Sink Workers Example
How many workers to use in any situation depends on the response time per request of the LogScale server, which in turn depends on the parser used, if requests are going to an on-prem or SaaS solution, the server configuration etc.
Description | # |
---|---|
Goal | 11 TB/day = 139 MB/s |
Measured server response time | 600 ms |
Using the default and recommended batchSize
of 16 MB
, the theoretical limit per worker in
this example is: 1/0.600s * 16 MB = 26.66
MB/s
.
Thus, the number of workers should be: 139/26.66 =
5.2
, rounded up to 6 workers.
This calculation is based on the assumption that data can be read fast enough from the source.
Memory
The memory requirement is linearly proportional to the number of sinks in the configuration plus a constant baseline requirement of 1 GB.
It should not be necessary to increase the default memory queue size. The purpose of the memory queue is to ensure that data is always readily available to the sink, such that the Falcon Log Collector can always be actively ingesting. Increasing the queue size is not going to increase the throughput of the sink. If the throughput of the sink is lower than that of the data that is being collected, the queue will eventually fill up.
The default queue size per sink is 1 GB and can be increased (or decreased) in the configuration:
sinks:
my_sink:
type: humio
token: <..>
url: <..>
# Increases queue size to 2 GiB
queue:
type: memory
maxLimitInMB: 2048
another_sink:
type: humio
token: <..>
url: <..>
The configuration above therefore has a total memory requirement of
1 GB (baseline) + 2 GB (my_sink) + 1 GB (another_sink) = 4
GB
.
Back-filling
A running Falcon Log Collector which is able to deliver the logs continuously to LogScale would not normally use the resources listed above, however, some situations can cause log data to pile up - for instance if a machine is without internet connection for a while but still generates logs.
In such a scenario the Falcon Log Collector will back-fill the log data when an active internet connection is re-established. The internal memory buffers will fill up for efficient log shipping, and the utilization of the queue could reach 100% (This limit is by default 1 GB/sink).
Disk
Disk size is only relevant if the disk queue is used. In most scenarios, When and if the disk queue makes sense depends on the deployment setup.
For instance the disk queue is unnecessary if the Falcon Log Collector is able to read back the data from a source in case of an interruption. This is the case for these sources: Windows Event Logs, journald and file sources. All these use a bookmarking system to keep track of how far data has been read and processed.
So, essentially the disk queue only makes sense for source where such a book keeping system is impossible, which at the moment only is the syslog source.
When using the disk queue, 10 minutes worth of data is usually
sufficient. So, if data flowing through a Falcon Log Collector deployment is
averaging 40 MB/s, you should provision at least 24 GB of disk space,
for example(40 MB * 60 seconds * 10 minutes
) where
the value should be 2x the max downtime allowed for any connection in
the path to Falcon LogScale and Falcon LogScale itself.
Example Deployments
Make sure your LogScale deployment is provisioned accordingly and meets the requirements for the ingestion amount. See Installing LogScale.
Large Syslog (TCP) deployment - 10TB/day
10 TB/Day = 121.4 MB/s (121.4 MB/s)
(100 MB/s/vCPU) = 1.21 vCPUs, rounded up to 2 vCPUs
Recommended
m6i.xlarge
with 4 vCPUs to account for spikes in traffic and possible backpressure from network
Table: Large Syslog Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
Falcon Log Collector | 1 | m6i.xlarge / 4 | 16 GB | gp2 |
Medium Windows Event Logs Deployment - 1 TB/Day
By isolating the ForwardedEvents channel to its own source in the configuration, it is possible to get a throughput of roughly 15 MB/s on an instance.
1 TB/Day = 12.14 MB/s
(12.14 MB/s) / (15 MB/s/instance) = 0.8 instances.
Table: Medium Windows Event Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
Falcon Log Collector | 1 | m6i.large / 2 | 16 GB | gp2 |
Large File Source Deployment - 100 TB/Day
100 TB/Day = 1214 MB/s
(1214 MB/s) / (154 MB/s/vCPU) = 7,9 vCPUs, rounded up to 8.
Since 1214 MB/s is more than the max throughput of AWS io1 volumes of 1000 MB/s, we go with two instances.
Table: Large File Source
Software | Instances | EC2 Instance Type / vCPU | Memory | Storage |
---|---|---|---|---|
Falcon Log Collector | 2 | m6i.xlarge / 4 | 16 GB | io1 |