Setting Alert Throttle Periods

The throttle period can be set along with the other properties when creating a new alert, and is used to control how often the alert can trigger, so that it won't trigger again until after the throttle period has passed. Basically and typically, you'll get one alert for a batch of events, rather than one for each event.

There may be times when several events are found in a short period of time that meet the search criteria. You probably don't need to be alerted multiple times in a row. In the example here, we're accepting the default throttle setting of once per hour.

Alert Throttling

Figure 163. Alert Throttling


Two options are available:

  • Throttle all actions — once the alert has triggered, it will not trigger again until after the throttle period has passed.

  • Field-based throttling — once the alert triggers for the field specified in Throttle field name, no further events with the same values for that field will be sent again until the throttle period has passed. See details at Field-Based Throttling.

Field-Based Throttling

You can use field-based throttling if you want to only throttle certain results from your alert.

Example

Say you have an alert that triggers when a machine is running out of disk space: you want to throttle further messages for the same machine and you still want to receive a message if another machine also starts running out of disk space within the throttle period. You can decide to throttle on only events with identical field values, and select the field in your logs containing the name of the machine.

This alert searches for a specific log event with a time window of 1 hour and a throttle period of 1 hour. At some point, machine1 runs out of disk space, which results in an event in the log, and the alert triggers on this event. The alert search will continue to run and find this event every time, but it will not trigger the alert, since it is throttled. After some time, machine2 also runs out of disk space. The alert search will now find both events, but will only trigger for machine2, since machine1 is throttled. After an hour, if machine1 is still out of disk space (and thus there are newer log events for this), the alert will trigger again for machine1.

The field you throttle on should be in the result of the query, not just in the events that are input to the query. If a result from the query does not contain the field, it will be treated as if it had an empty value for the field.

Best Practices and Limitations

  • Select a field for which there are not too many different values.

  • Currently, there is a limit on how many values are stored (defaults to 100), so you should choose a field that does not have more values than this. @id will never be a good choice, since that is unique per event, so it basically means that no throttling would be applied. The only metadata field that in normal cases does not have too many values is @timezone.

  • There is a fixed limit on the values of the throttle field that Humio can store in memory per alert. If the throttle field assumes more values than this limit, the alert might trigger more frequently than indicated by the given throttle period.

Multiple Fields

It is only possible to throttle on a single field. If you need to throttle on multiple fields, you can simply add a new field that concatenates these fields in the alert query.

For example, if your events have a service and a host field, and you want to throttle on the combination of these, you can add a new field in the alert query by adding the following line to it:

humio
| serviceathost := concat([service, host]])

and then throttle on serviceathost.

Relation between Throttle Period and Time Window

If your search finds specific events, that you want to trigger the alert on, for example specific errors, you want to set the throttle period to match the time window of the search. If you set the throttle period higher than the time window, you might miss events, and if you set it lower, you might get duplicate alerts.

If your search involves an aggregate, you might want to set the time window larger in some cases. For example, if you want to be notified every hour, whether there are more than 5 errors within a 4 hour search window. You probably do not want to set the time window smaller than the throttle period, as this means that there will be events that are never evaluated by the alert. For Actions like email and Slack, you want a higher throttle period since these triggers do not deduplicate.