How-To: Delete Data in Bulk

Last Updated: 2021-07-05

You may find that you need to delete or redact information in your stored data that is considered sensitive and needs to be removed after parsing. If you want to remove sensitive information that has already been parsed and stored, there are some methods available: you can adjust the retention settings of Humio to delete old data, automatically; or you can delete the data manually either using GraphQL or from the command-line.

Solution: Adjust Retention

Data Retention

Figure 284. Data Retention


You can configure Humio remote old data — which will cause the data to be automatically removed. This can be achieved by adjusting data retention. It's simple to make the adjustments in the User Interface. Data can be retain based on compressed file sizes, uncompressed file sizes, or on the age of data.

Setting data retention is available when you have Humio installed on-premise, on your own server or instance. It has not been available, though, on Humio Cloud accounts. However, it's now a new beta feature that's available to only a few Humio Cloud accounts.

Although you can set this when running Humio locally, Cloud accounts are limited to the amount of days requested when the account was established — unless you've requested Support to change it.

This beta feature allows you to set data retention to a maximum of 365 days (see Figure 1). You can change this value yourself, later. However, if you change it to a lower amount, older data will be deleted.

Organization Statistics

Figure 285. Organization Statistics


Please note that setting a long retention time means more usage of storage, and therefore will affect the amount you're charged for Humio Cloud services. You may monitor your usage on the main page of the Humio User Interface, in the bottom right corner. There you'll see your Organization Statistics, very much like what's shown and highlighted in Figure 2 here.

See the Data Retention documentation for more information.

Solution: Delete with GraphQL API
deleteEvents with GraphQL

Figure 286. deleteEvents with GraphQL


For a more a more targeted method of removing data, you can use the Delete Events API. This isn't as efficient as setting Data Retention, but it works well enough for one-time, manual deletions.

To access the Delete Events API, you can use the GraphQL. This can be used for either Cloud accounts or on-premise installations. The GraphQL mutation you would use is called, deleteEvents.

To perform the deletion, logging into Humio and locate the Humio GraphQL API Explorer associated with your Humio. It can be found by clicking on the question mark icon near the top right of the User Interface. One of the choices in its pull-down menu should be API Explorer; click on it. For Humio installed on your own server or instance, you can go to the URL where Humio is installed, followed by /docs/api-explorer.

deleteEvents Schema

Figure 287. deleteEvents Schema


Once there, you'll see a screen similar to the one in Figure 3 here. You'll then enter text in the left panel, like you see in the screenshot. That entry is explained in a moment.

The deleteEvents mutation is has the schema shown in Figure 4 here. This can be accessed from the GraphQL API interface by click Docs in the upper right corner and searching on the term deleteEvents and selecting mutation.

With these choices in mind, to bulk delete data from a repository, you would enter something like this:

graphql
mutation {
 deleteEvents(
 repositoryName: "Testeroo"
 start: "2021-04-10T10:15:30.00Z"
 end: "2021-04-15T10:15:30.00Z"
 query: ""
 userMessage: "Testing"
 )
}

This is the text shown in Figure 3, in the left panel of that screenshot. You would set the repositoryName to the name of your repository in which you want to delete events. In the example here, we've set a start and end date and time, the range of time we want to delete events.

The query parameter is required, but since we want to delete all events for the time period given, we've left it blank as shown above. We would include a query if we didn't want to delete all events, but instead ones that meet a specific query criteria.

The userMessage is optional; it's a message to record in the audit log for the action.

When you're ready, click the right-arrow head to execute the mutation and thereby delete the data. The results of that action will appear in the right panel (see Figure 3 again).

Solution: Delete from Command-Line

Instead of using GraphQL to delete manually, you can do the same from the command-line. To do this, to do the same as the example above, you would do something like the following:

curl -v https://$HUMIO_URL/api/v1/repositories/$REPO_NAME /deleteevents \
 -X POST \
 -H "Authorization: Bearer $TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"repositoryName": "Testeroo", "start": 1612219536721, "end": 1618820157526, "queryString": ""}

In this example, besides adjusting the repository name, start and end times, and other parameters in the -d hash, you would replace the URL and token variables. First, though, notice that the dates and times are UTC values. You may use normal dates and times, formatted as shown in the previous section, but only with GraphQL. For deleting from the command-line, you have to use the UTC values.

API Tokens

Figure 288. API Tokens


In the example here, replace the $HUMIO_URL with either the URL to your own server, or the URL to the Humio Cloud environment you're using. You would use https://cloud.humio.com:443/ for EU Cloud accounts. For US Cloud accounts, you would use https://cloud.us.humio.com:443/.

You would also replace the variable $TOKEN with the default API token for the repository. To find this token, go to the Settings tab in the Humio User Interface. Click on API Tokens to see a list of your tokens (see Figure 5 here). You can copy the default one from the panel there and paste it into the example above, or create an environment variable by which you would access it.