How-To: Delete Data in Bulk

Security Requirements and Controls

Some data that LogScale collects may be considered sensitive and you may want to remove it. To remove sensitive information that has already been parsed and stored, there are two methods available:

  • Adjust the retention settings of LogScale to delete old data, or

  • Delete the data manually using GraphQL or from the command-line.

Adjust Retention Method

Data Retention Method

Figure 9. Data Retention


You can configure LogScale to expire old data — which will cause the data to be automatically removed. This can be achieved by adjusting data retention. It's simple to make the adjustments in the User Interface. Data can be retain based on compressed file sizes, uncompressed file sizes, or on the age of data.

Setting data retention is available when you have LogScale installed on-premise, on your own server or instance. It has not been available, though, on LogScale Cloud accounts. However, it's now a new beta feature that's available to only a few LogScale Cloud accounts.

Although you can set this when running LogScale locally, Cloud accounts are limited to the amount of days requested when the account was established — unless you've requested Support to change it.

This feature allows you to set data retention to a maximum of 365 days (see Figure 10). You can change this value yourself, later. However, if you change it to a lower amount, older data will be deleted.

Organization Statistics

Figure 10. Organization Statistics


Please note that setting a long retention time means more usage of storage, and therefore will affect the amount you're charged for LogScale Cloud services. You may monitor your usage on the main page of the LogScale User Interface, in the bottom right corner. There you'll see your Organization Statistics, very much like what's shown and highlighted in Figure 11 here.

See the Data Retention documentation for more information.

Solution: Delete with GraphQL API

For a more a more targeted method of removing data, you can use the Redact Events API. This isn't as efficient as setting Data Retention, but it works well enough for one-time, manual deletions.

To use the Redact Events API, you can use the GraphQL. The GraphQL mutation is redactEvents .

To perform the deletion, logging into LogScale and locate the LogScale GraphQL API Explorer associated with your LogScale. It can be found by clicking on the question mark icon near the top right of the User Interface. One of the choices in its pull-down menu should be API Explorer.

The mutation accepts the following fields:

  • repositoryName

    The name of the repository where the data you want to redact is stored.

  • start

    The start timestamp of the data to be deleted.

  • end

    The end timestamp of the data to be deleted.

  • query

    An optional query to filter the data.

For example, you could enter the following query:

graphql
mutation {
  redactEvents(
    input: {
      repositoryName: "REPO_NAME"
      start: "2023-08-09T10:52:18.041Z"
      end: "2023-08-09T11:02:12.662Z"
      query: ""
      userMessage: "Testing"
    }
  )
}

In this example, the query will delete all data between the specified timestamps in the specified repository. No query has been specified, so all data will be selected.

The userMessage is optional; it's a message to record in the audit log for the action.

When the query is executed, the results of the query will appear in the right panel with the API explorer.

Redact from Command-Line Method

Instead of using GraphQL to redact manually, you can do the same from the command-line. To do this, to do the same as the example above, you would do something like the following:

shell
$ curl -v https://$YOUR_LOGSCALE_URL/api/v1/repositories/$REPO_NAME /redactevents \
   -X POST \
   -H "Authorization: Bearer $TOKEN" \
   -H "Content-Type: application/json" \
   -d '{"repositoryName": "Testeroo", "start": 1612219536721, "end": 1618820157526, "queryString": ""}

In this example, besides adjusting the repository name, start and end times, and other parameters in the -d hash, you would replace the URL and token variables. First, though, notice that the dates and times are UTC values. You may use normal dates and times, formatted as shown in the previous section, but only with GraphQL. For deleting from the command-line, you have to use the UTC values.

API Tokens

Figure 11. API Tokens


In the example here, replace the $YOUR_LOGSCALE_URL with either the URL to your own server, or the URL to the LogScale Cloud environment you're using. You would use https://cloud.humio.com:443/ for EU Cloud accounts. For US Cloud accounts, you would use https://cloud.us.humio.com:443/.

You would also replace the variable $TOKEN with the default API token for the repository. To find this token, go to the Settings tab in the LogScale User Interface. Click on API Tokens to see a list of your tokens (see Figure 5 here). You can copy the default one from the panel there and paste it into the example above, or create an environment variable by which you would access it.