Setting Alert Throttle Period

Security Requirements and Controls

The throttle period can be set along with the other properties when creating a new alert, and is used to control how often the alert can trigger, so that it won't trigger again until after the throttle period has passed.

Whereas setting a throttle period is optional for Filter Alerts (because each event matching the query triggers the alert), it is mandatory for Standard Alerts. In the latter case, setting the throttle period allows you to to prevent the query triggering a configured action too often or too frequently: you'll always get one alert for a batch of events, rather than one for each event. For the same reason, in Standard Alerts the default Throttle period set in the UI matches that of the query time window (which, in turn, is irrelevant for Filter Alerts).

There may be times when several events are found in a short period of time that meet the search criteria. You probably don't need to be alerted multiple times in a row. In the example here, we're accepting the default throttle setting of once per hour.

Alert Throttling

Figure 199. Alert Throttling


The following options are available:

  • Throttle period

    The period during which the alert can be triggered. The alert will be triggered at most once per period.

    Starting from version 1.146, the maximum allowed throttle period is 1 week.

    The unit used for the throttle period can go from seconds to weeks.

  • Throttle all actions

    Once the alert has triggered, it will not trigger again until after the throttle period has passed.

  • Field-based throttling

    Once the alert triggers for the field specified in Throttle field name, no further events with the same values for that field will be sent again until the throttle period has passed. See details at Field-Based Throttling.

Field-Based Throttling

The alert can be configured with field-based throttling that you can use if you want to only throttle certain results from your alert.

Example

Say you have an alert that triggers when a machine is running out of disk space: you want to throttle further messages for the same machine and you still want to receive a message if another machine also starts running out of disk space within the throttle period. You can decide to throttle on only events with identical field values, and select the field in your logs containing the name of the machine.

This alert searches for a specific log event with a time window of 1 hour and a throttle period of 1 hour. At some point, machine1 runs out of disk space, which results in an event in the log, and the alert triggers on this event. The alert search will continue to run and find this event every time, but it will not trigger the alert, since it is throttled. After some time, machine2 also runs out of disk space. The alert search will now find both events, but will only trigger for machine2, since machine1 is throttled. After an hour, if machine1 is still out of disk space (and thus there are newer log events for this), the alert will trigger again for machine1.

The field you throttle on should be in the result of the query, not just in the events that are input to the query. If a result from the query does not contain the field, it will be treated as if it had an empty value for the field.

The alert won't trigger every hour until the throttled field in the search results has a different value than the previous value triggered alert. However, search will run continuously.

Best Practices and Limitations

  • Select a field for which there are not too many different values. For example, @id will never be a good choice, since that is unique per event, so it basically means that no throttling would be applied. The only metadata field that in normal cases does not have too many values is @timezone.

  • Currently, there is a limit on how many values are stored (defaults to 100), so you should choose a field that does not have more values than this. If the throttle field assumes more values than this limit, the alert might trigger more frequently than indicated by the given throttle period.

Throttling on Multiple Fields

It is only possible to throttle on a single field. If you need to throttle on multiple fields, you can simply add a new field that concatenates these fields in the alert query.

For example, if your events have a service and a host field, and you want to throttle on the combination of these, you can add a new field in the alert query by adding the following line to it:

logscale
| serviceathost := concat([service, host]])

and then throttle on serviceathost.

Relation between Throttle Period and Time Window in Standard Alerts

If your search finds specific events that you want to trigger a Standard Alert on, for example specific errors, you want to set the throttle period to match the time window of the search. If you set the throttle period higher than the time window, you might miss events, and if you set it lower, you might get duplicate alerts.

If your search involves an aggregate, you might want to set the time window larger in some cases. For example, if you want to be notified every hour, whether there are more than 5 errors within a 4 hour search window. You probably do not want to set the time window smaller than the throttle period, as this means that there will be events that are never evaluated by the alert. For Actions like Email and Slack, you want a higher throttle period since these triggers do not deduplicate.