How-To: Delete Data in Bulk

Last Updated: 2021-07-05

You may find that you need to delete or redact information in your stored data that is considered sensitive and needs to be removed after parsing. If you want to remove sensitive information that has already been parsed and stored, there are some methods available: you can adjust the retention settings of Humio to delete old data, automatically; or you can delete the data manually either using GraphQL or from the command-line.

Solution: Adjust Retention

Data Retention

Figure 301. Data Retention


You can configure Humio to expire old data — which will cause the data to be automatically removed. This can be achieved by adjusting data retention. It's simple to make the adjustments in the User Interface. Data can be retain based on compressed file sizes, uncompressed file sizes, or on the age of data.

Setting data retention is available when you have Humio installed on-premise, on your own server or instance. It has not been available, though, on Humio Cloud accounts. However, it's now a new beta feature that's available to only a few Humio Cloud accounts.

Although you can set this when running Humio locally, Cloud accounts are limited to the amount of days requested when the account was established — unless you've requested Support to change it.

This feature allows you to set data retention to a maximum of 365 days (see Figure 1). You can change this value yourself, later. However, if you change it to a lower amount, older data will be deleted.

Organization Statistics

Figure 302. Organization Statistics


Please note that setting a long retention time means more usage of storage, and therefore will affect the amount you're charged for Humio Cloud services. You may monitor your usage on the main page of the Humio User Interface, in the bottom right corner. There you'll see your Organization Statistics, very much like what's shown and highlighted in Figure 2 here.

See the Data Retention documentation for more information.

Solution: Delete with GraphQL API

For a more a more targeted method of removing data, you can use the Redact Events API. This isn't as efficient as setting Data Retention, but it works well enough for one-time, manual deletions.

To use the Redact Events API, you can use the GraphQL. The GraphQL mutation is redactEvents .

To perform the deletion, logging into Humio and locate the Humio GraphQL API Explorer associated with your Humio. It can be found by clicking on the question mark icon near the top right of the User Interface. One of the choices in its pull-down menu should be API Explorer.

The mutation accepts the following fields:

  • repositoryName

    The name of the repository where the data you want to redact is stored.

  • start

    The start timestamp of the data to be deleted.

  • end

    The end timestamp of the data to be deleted.

  • query

    An optional query to filter the data.

For example, you could enter the following query:

graphql
mutation {
 redactEvents(
 repositoryName: "apache"
 start: "2021-04-10T10:15:30.00Z"
 end: "2021-04-15T10:15:30.00Z"
 query: ""
 userMessage: "Testing"
 )
}

In this example, the query will delete all data between the specified timestamps in the specified repository. No query has been specified, so all data will be selected.

The userMessage is optional; it's a message to record in the audit log for the action.

When the query is executed, the results of the query will appear in the right panel wut the API explorer.

Solution: Redact from Command-Line

Instead of using GraphQL to redact manually, you can do the same from the command-line. To do this, to do the same as the example above, you would do something like the following:

curl -v https://$HUMIO_URL/api/v1/repositories/$REPO_NAME /redactevents \
 -X POST \
 -H "Authorization: Bearer $TOKEN" \
 -H "Content-Type: application/json" \
 -d '{"repositoryName": "Testeroo", "start": 1612219536721, "end": 1618820157526, "queryString": ""}

In this example, besides adjusting the repository name, start and end times, and other parameters in the -d hash, you would replace the URL and token variables. First, though, notice that the dates and times are UTC values. You may use normal dates and times, formatted as shown in the previous section, but only with GraphQL. For deleting from the command-line, you have to use the UTC values.

API Tokens

Figure 303. API Tokens


In the example here, replace the $HUMIO_URL with either the URL to your own server, or the URL to the Humio Cloud environment you're using. You would use https://cloud.humio.com:443/ for EU Cloud accounts. For US Cloud accounts, you would use https://cloud.us.humio.com:443/.

You would also replace the variable $TOKEN with the default API token for the repository. To find this token, go to the Settings tab in the Humio User Interface. Click on API Tokens to see a list of your tokens (see Figure 5 here). You can copy the default one from the panel there and paste it into the example above, or create an environment variable by which you would access it.