Troubleshooting FDR Ingest
Duplicate Messages in the SQS Queue
If the same message appears in the SQS queue more than once, make sure your consumer script reads, processes, and explicitly deletes the SQS message within the visibility timeout period (typically two hours). If, within the timeout period, the SQS message is not downloaded or doesn't process it, the message returns to the queue to be consumed again.
If the consumer script used is based on the sample that CrowdStrike
provides, data_replicator_sample_consumer.py
, be
sure the msg.delete()
call is not commented out.
Also be sure in the data_replicator_config.py
configuration file for the sample script that the
VISIBILITY_TIMEOUT
value is enough time for your consumer
to process any downloaded files and delete the SQS message.
Duplicate messages might start to appear as the result of an increase in the volume of events. The extra events produce more files per SQS message, which in turn increases the processing time of the data in a SQS message.
FDR Ingest Lag
If you notice a large (+2 hours) lag between FDR event creation and ingest time in LogScale, you may need to check and adjust the fileDownloadParallelism setting.
The query to check your fileDownloadParallelism setting is:
query {
repository(name: "reponame") {
fdrFeedControl(id: "fdrFeedId") {
id
maxNodes
fileDownloadParallelism
}
}
}
If this returns null, then this means that fileDownloadParallelism is set to 1, which is the default.
To set fileDownloadParallelism, run the following GraphQL mutation:
mutation { updateFdrFeedControl(input: {
repositoryName: "reponame"
id: "fdrFeedId"
fileDownloadParallelism: { value: 8 }
}) {
id
fileDownloadParallelism
maxNodes
}}
If fileDownloadParallelism is 8 as shown above, it will use at most 8 threads to download files and it will likely use 8 threads all the time during working hours as those messages contained many files. If the value at 8 affects other ingests, queries, and so on, try setting it to 4 and monitor the number of messages on the queue. If maxNodes is 2 and fileDownloadParallelism is 2, then the total amount of files across the entire cluster downloaded in parallel is 4 (2 * 2). MaxNodes impacts the amount of SQS messages that are processed in parallel; a message can contain multiple files. So in order to find out what values are most suitable, it really depends on the structure of the SQS messages. The number of files per SQS message is more or less what determines what fileDownloadParallelism should be.