Diagnosing Alerts
In case of errors occurring when managing alerts, warnings and errors will be reported through the user interface in the Notifications area and through more detailed report in the humio-activity repository (see Monitor Alerts with humio-activity Repository).
Whenever an alert fails — due to errors in the query that triggers it, or in the way an action is configured — error notifications are sent. There will be one notification per alert at most.
Errors or warnings are generated at different points in the execution depending on the alert type, and cleared using different criteria:
Condition | Standard Alert | |
---|---|---|
Query start | Yes, per alert | |
Query poll | Yes, per alert | |
Time Window | Yes, per alert | |
Trigger action | Yes, per action against the aggregate result | |
Error/Warning Cleared | When the next invocation of the corresponding phase succeeds |
Because by default aggregate and filter alerts retry to send events to actions for up to 24 hours, failure notifications will keep reappearing in the UI notifications area for every failed alert, for as long as the error stays on the alert.
An error with an action in a filter alert will only be notified within the UI if the alert has not successfully triggered on the failing event or a later event; if a later action fails the error will be cleared and no indication will be given.
When analyzing errors and warnings for alerts, the following additional factors should be taken into consideration:
Errors when running an alert will be stored and also set on the alert as an error, so that they can be seen on the properties' overview page.
Errors in standard alerts where multiple Actions have been attached. If some of the actions fail to run, this will be logged, but no error will be set on the alert. The alert will be considered to have fired, and will be throttled as normal. It will only be considered an error if all actions fail.
Warnings aimed at discouraging queries that include a live
join()
function in standard alerts. For more information, see Errors when Using Live join() Functions. (Standard alerts only)Behavior affecting only v1.93-1.111:
Transient errors. Many query warnings might appear on alert queries as errors at start up, but they will disappear after a while — for instance. This may indicate that LogScale is trying to catch up on ingested data; because of this, the default behavior is to not fire an alert if there are warnings from the alert query and instead wait for the warning to go away. See
ALERT_DESPITE_WARNINGS
.Errors that require some user interaction, for instance a warning on too many groups in a
groupBy()
function invocation in the alert query. (Standard alerts only)Errors due to the alert query only returning partial results, which may trigger the alert when it should not have been triggered, or make the alert only return some of the events it would otherwise have returned. (Standard alerts only)