Troubleshooting FDR Ingest

Duplicate Messages in the SQS Queue

If the same message appears in the SQS queue more than once, make sure your consumer script reads, processes, and explicitly deletes the SQS message within the visibility timeout period (typically two hours). If, within the timeout period, the SQS message is not downloaded or doesn't process it, the message returns to the queue to be consumed again.

If the consumer script used is based on the sample that CrowdStrike provides, data_replicator_sample_consumer.py, be sure the msg.delete() call is not commented out. Also be sure in the data_replicator_config.py configuration file for the sample script that the VISIBILITY_TIMEOUT value is enough time for your consumer to process any downloaded files and delete the SQS message.

Duplicate messages might start to appear as the result of an increase in the volume of events. The extra events produce more files per SQS message, which in turn increases the processing time of the data in a SQS message.

FDR Ingest Lag

If you notice a large lag (i.e., more than two hours) between FDR event creation and ingest time in LogScale, you may need to check and adjust the fileDownloadParallelism setting. You can do this using the GraphQL API, which can be accessed from the LogScale UI with the API Explorer.

The query here uses the repository field to check the value of fileDownloadParallelism:

graphql
query {
  repository(name: "reponame") {
    fdrFeedControl(id: "fdrFeedId") 
    { id, maxNodes, fileDownloadParallelism }
  }
}

A null result means that fileDownloadParallelism is set to 1, which is the default. To set it to some other value, run the following GraphQL mutation:

graphql
mutation { 
   updateFdrFeedControl(
     input: {
       repositoryName: "reponame",
       id: "fdrFeedId",
       fileDownloadParallelism: { value: 8 }
      }
   ) 
  { id, fileDownloadParallelism, maxNodes}
}

The mutation above sets fileDownloadParallelism to 8. This means LogScale will use at most eight threads to download files. It will most likely use all eight threads during working hours as those messages contained many files. However, setting the value too high may affect other ingests, queries, and so on. Should this happen, try setting it to a lower value and then monitor the number of messages in the queue.

If maxNodes is 2 and fileDownloadParallelism is 2, then the total amount of files across the entire cluster downloaded in parallel is four (i.e., 2 * 2). The maxNodes impacts the amount of SQS messages that are processed in parallel; a message can contain multiple files. What values are most suitable depends on the structure of the SQS messages. The number of files per SQS message can determine what fileDownloadParallelism should be.