Best Practice: Log Collector Resiliency and Monitoring

When collecting critical data with the log collector, we recommend:

  1. Tune the collector configuration and test under load in a non-prod environment before deploying to prodl see Falcon LogScale Collector Sizing Guide

  2. Load testing to determine the correct sizing and configuration

    1. Simulate log volumes that exceed your highest traffic events/workloads

    2. Watch for log collector errors or delays in ingestion and tune parameters as needed

  3. Monitoring:

    1. Production alerts and a process to investigate and remediate for missing data (any collector not reporting logging for x minutes)

    2. Production monitoring for critical servers (errors, CPU, memory and other resources monitoring), especially production servers hosting the log collector

    3. Monitoring the collector itself - collect the collector log file and set up alerts for errors/issues

    4. You could place two collectors on the same server, but this is not recommended/preferred by our Collector Engineering team as this will double the resources needed for collection. This is also a good doc to review: Highly Available Configurations.