Best Practice: Estimating Local Disk Threshold

Last Updated: 2022-01-01

You can estimate the local disk threshold value by running two queries against the Humio repository.

The following two queries calculate what percentage of users' queries look for data in the 30 days, 60 days, 90 days or beyond time frame. This can help users estimate what their Humio query usage is and helps set a realistic threshold for better disaster recovery efforts.

Local disk storage threshold should be carefully calculated in order to not compromise the speed of the queries. If a large portion of the queries are longer term queries, customers may not be able to change the threshold to a smaller value. Default is 95%.

"creating new query"
| Relative
| top([start],percent=true,limit=20)
| sort(_count)

We can break this down to:

  • Humio reports new query events using the above text within the Humio event log.

    "creating new query"
  • Search for Relative events

    | Relative
  • Convert the output to select the Top 20 items, organised by percentages

    | top([start],percent=true,limit=20)

    For more information: top()

  • Sort the output by the count of items

    | sort(_count)

    For more information see sort()

The following query, which needs to be run in the Humio repository, shows the distribution of searches based on fixed time buckets – now, 30 days, 60 days ago.

"creating new query"
| Instant
| /start=Instant\((?<startTime>\d+)\)/
| /end=Instant\((?<endTime>\d+)\)/
| time:monthName(startTime,as="startMonth")
| time:monthName(endTime,as="endMonth")
| timeDiff := endTime - startTime
| timeDiffMinute := timeDiff/1000/60
| now()
| 30dTime := _now - 2592000000
| 60dTime := _now - (2592000000*2)
| case {
 test(startTime > 30dTime) | timeGroup := "last30d";
 test(startTime>60dTime) | test(startTime <= 30dTime) | timeGroup:= "30dto60d";
 test(startTime <= 60dTime) | timeGroup := "60dplus";
 *;
}
| top(timeGroup,percent=true)

The query assembles data, creates time periods for the query, outputting the results collated by these groups. The query can be broken down as follows:

  • Search for the events from the log:

    "creating new query"
  • Filter the events by the Instant entries

    | Instant
  • Extract the time from the event using a regular expression to use as the startTime

    | /start=Instant\((?<startTime>\d+)\)/
    
    See Regex Field Extraction for more information for extracting fields using regular expressions.
  • Extract the time and create the endTime variable

    | /end=Instant\((?<endTime>\d+)\)/
  • Extract the month name to form the startMonth

    | time:monthName(startTime,as="startMonth")

    See time:month()

  • Extract the month name to form the endMonth

    | time:monthName(endTime,as="endMonth")
  • Determine the difference between the start and the end time for each event group

    | timeDiff := endTime - startTime
  • Calculate the difference in minutes. Times are in milliseconds, so the value need to be divided by 1000 to get seconds, and then 60 to get minutes

| timeDiffMinute := timeDiff/1000/60
  • Get the current time; this will create the new field _now:

    | now()

    See now().

  • Calculate the time 30 days ago by taking 30 days (30 days x 24 hours x 60 minutes x 60 seconds x 1000 milliseconds) to create 30dTime

    | 30dTime := _now - 2592000000
  • Calculate the time 60 days ago by taking 60 days (2 * 30 days x 24 hours x 60 minutes x 60 seconds x 1000 milliseconds) to create 30dTime

    | 60dTime := _now - (2592000000*2)
  • Now filter events creating timegroups for each range match the time specifications, creating a new field timeGroup using the time ranges that have been created:

    | case {
     test(startTime > 30dTime) | timeGroup := "last30d";
     test(startTime>60dTime) | test(startTime <= 30dTime) | timeGroup:= "30dto60d";
     test(startTime <= 60dTime) | timeGroup := "60dplus";
     *;
    }
  • Aggregate the event data as percentages using the time group:

    | top(timeGroup,percent=true)