Best Practices to Optimize Data Ingestion

There are some ways to prepare the data prior to ingest in that may help reduce costs. By defining the problem that the data should solve and preparing the data, the amount of data to ingest (and thus the cost) can be reduced.

Preparing the data can help reduce costs significantly by performing some actions prior to ingest. The following list contains suggestions of how to prepare data.

  • Handle missing or inconsistent data, remove duplicates, and handle outliers. When inspecting your data, you might find that some entries have missing or inconsistent values. You can clean this data by correcting inconsistencies or unrealistic values, removing missing data that isn't necessary, and so on.

  • Transform the data as needed to enable easier analysis. You might need to create new variables that combine two values into a single value. Or you might want to make continuous variables more discrete, or convert categorical variables into dummy variables.

  • Reduce the amount of data makes it more manageable. Maybe you can reduce the amount of data, only ingesting events within a certain time range, or by only focusing on certain categories within the dataset.

  • Combining data from different sources may help you resolve inconsistencies or missing data across datasets and help reduce costs. Instead of ingesting two datasets where some data might overlap, you would ingest only one dataset.

A number of tools exist to help you prepare and analyze the data prior to ingest, such as Jupyter Notebook, pandas, and more. Many of these tools are open-source, which can be important to keep in mind depending on the sensitivity of the data you're analyzing.

Test and sample data are crucial when preparing any data for ingestion into a log management system. Sample data helps you ensure that parsing of the complete dataset will produce the expected events, that the parser is functioning as expected. Use the createEvents() to generate temporary events for generating sample data for testing and troubleshooting. The createEvents() function does NOT count against your usage. For examples of how to use this function, see createEvents().