Error Handling for FDR Ingestion
Errors may occur as LogScale is polling FDR data from a feed and ingesting it into a repository.
If LogScale cannot pull messages from an SQS queue, for example if there is a network issue, it will continue to attempt to fetch messages with exponential backoff.
If a message can be pulled from the SQS queue, but an issue occurs during the download of data from S3 or during the subsequent ingest, then the message is not deleted.
This means that the message stays on the SQS queue rather than being deleted, as would happen in the case of a successful ingest.
At a later point in time, determined by the visibility timeout of the queue, LogScale will pull the message again and retry the download and ingest.
Note
Messages on the SQS queue are only retained for 7 days after they have originally been added.
If LogScale is not successful in ingesting a message before the retention period is up, then the message is removed from the queue and the data is lost.
If ingest is failing temporarily, for instance due to network issues that lasts a few hours, this is not a big problem, as LogScale will just re-attempt ingesting that data at a later point in time.
However, if the problem is persistent, such as if there is a mismatch between the S3 Identifier given by the user and the one indicated by a message, then you will need to take action before the retention period is up or risk losing data.
To help you monitor such issues, the
humio/activity
package includes a
dashboard, which will help you monitor the state of the polling and ingest
process.