Diagnosing Alerts
In case of errors occurring when managing alerts warnings and errors will be reported through the user interface and through more detailed report in the humio-activity repository.
Errors or warnings are generated at different points in the execution depending on the alert type, and cleared using different criteria:
Condition | Standard Alert | Filter Alert |
---|---|---|
Query start | Yes, per alert | Yes, per alert |
Query poll | Yes, per alert | Yes, per alert |
Time Window | Yes, per alert | No |
Trigger action | Yes, per action against the aggregate result | Yes, per action and event; Individual events track both whether they were triggered and whether an action was started successfully. |
Error/Warning Cleared | When the next invocation of the corresponding phase succeeds | For errors with starting or polling the query, those are cleared when that later succeeds, same as for standard alerts. For action errors, those are cleared when the same event or a later successfully triggers. |
This is important because an error with an action in a filter alert will only be notified within the UI if the alert has not successfully triggered on the failing event or a later event; if a later action fails the error will be cleared and no indication will be given.
When analyzing errors and warnings for alerts, the following additional factors should be taken into consideration:
Errors when running an alert will be stored and also set on the alert as an error, so that they can be seen on the properties' overview page.
Errors in standard alerts where multiple Actions have been attached. If some of the actions fail to run, this will be logged, but no error will be set on the alert. The alert will be considered to have fired, and will be throttled as normal. It will only be considered an error if all actions fail.
For filter alerts, this information is tracked for each event.
Warnings aimed at discouraging queries that include a live
join()
function in standard alerts. For more information, see Errors when Using Live join() Functions. (Standard alerts only)Behavior affecting only v1.93-1.111:
Transient errors. Many query warnings might appear on alert queries as errors at start up, but they will disappear after a while — for instance. This may indicate that LogScale is trying to catch up on ingested data; because of this, the default behavior is to not fire an alert if there are warnings from the alert query and instead wait for the warning to go away. See
ALERT_DESPITE_WARNINGS
.Errors that require some user interaction, for instance a warning on too many groups in a
groupBy()
function invocation in the alert query. (Standard alerts only)Errors due to the alert query only returning partial results, which may trigger the alert when it should not have been triggered, or make the alert only return some of the events it would otherwise have returned. (Standard alerts only)