Skip to content

Commit

Permalink
Add docs about custom threshold alert details page (#3871)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: Maryam Saeidi <[email protected]>
  • Loading branch information
dedemorton and maryam-saeidi authored May 15, 2024
1 parent 397a493 commit cb7ccef
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 2 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/en/observability/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ include::create-alerts.asciidoc[leveloffset=+1]
include::aggregation-options.asciidoc[leveloffset=+2]
include::view-observability-alerts.asciidoc[leveloffset=+2]
include::triage-slo-burn-rate-breaches.asciidoc[leveloffset=+3]
include::triage-threshold-breaches.asciidoc[leveloffset=+3]

//SLOs
include::slo-overview.asciidoc[leveloffset=+1]
Expand Down
6 changes: 4 additions & 2 deletions docs/en/observability/triage-slo-burn-rate-breaches.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ When this happens, you are at risk of exhausting your error budget and violating

To triage issues quickly, go to the alert details page:

. Go to **{observability}** -> **Alerts** (or open the SLO and click **Alerts**.)
. From the Alerts table, click the image:images/icons/boxesHorizontal.svg[More actions icon] icon next to the alert and select **View alert details**.
. Go to **{observability}** **Alerts** (or open the SLO and click **Alerts**).
. From the Alerts table, click the image:images/icons/boxesHorizontal.svg[More actions] icon next to the alert and select **View alert details**.

The alert details page shows information about the alert, including when the alert was triggered,
the duration of the alert, the source SLO, and the rule that triggered the alert.
Expand All @@ -37,3 +37,5 @@ After investigating the alert, you may want to:
* Click **Snooze the rule** to snooze notifications for a specific time period or indefinitely.
* Click the image:images/icons/boxesVertical.svg[Actions] icon and select **Add to case** to add the alert to a new or existing case. To learn more, refer to <<create-cases>>.
* Click the image:images/icons/boxesVertical.svg[Actions] icon and select **Mark as untracked**.
When an alert is marked as untracked, actions are no longer generated.
You can choose to move active alerts to this state when you disable or delete rules.
48 changes: 48 additions & 0 deletions docs/en/observability/triage-threshold-breaches.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
[[triage-threshold-breaches]]
= Triage threshold breaches
++++
<titleabbrev>Threshold breaches</titleabbrev>
++++

Threshold breaches occur when an {observability} data type reaches or exceeds the threshold set in your <<custom-threshold-alert,custom threshold rule>>.
For example, you might have a custom threshold rule that triggers an alert when the total number of log documents with a log level of `error` reaches 100.

To triage issues quickly, go to the alert details page:

. Go to **{observability}** → **Alerts**.
. From the Alerts table, click the image:images/icons/boxesHorizontal.svg[More actions] icon next to the alert and select **View alert details**.

The alert details page shows information about the alert, including when the alert was triggered,
the duration of the alert, and the last status update.
If there is a "group by" field specified in the rule, the page also includes the source.
You can follow the links to navigate to the rule definition.

Explore charts on the page to learn more about the threshold breach:

[role="screenshot"]
image::images/log-threshold-breach.png[Alert details for log threshold breach]

* The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
* If your rule is intended to detect log threshold breaches
(that is, it has a single condition that uses a count aggregation),
you can run a log rate analysis, assuming you have the required license.
Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs.
Notice that you can adjust the baseline and deviation, and then run the analysis again.
For more information about using the log rate analysis feature,
refer to the {kibana-ref}/xpack-ml-aiops.html#log-rate-analysis[AIOps Labs] documentation.
* The page may also include an alerts history chart that shows the number of triggered alerts per day for the last 30 days.
This chart is currently only available for rules that specify a single condition.
* Timelines on the page are annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.

Analyze these charts to better understand when the breach started, it's current
state, and how the issue is trending.

After investigating the alert, you may want to:

* Click **Snooze the rule** to snooze notifications for a specific time period or indefinitely.
* Click the image:images/icons/boxesVertical.svg[Actions] icon and select **Add to case** to add the alert to a new or existing case. To learn more, refer to <<create-cases>>.
* Click the image:images/icons/boxesVertical.svg[Actions] icon and select **Mark as untracked**.
When an alert is marked as untracked, actions are no longer generated.
You can choose to move active alerts to this state when you disable or delete rules.
2 changes: 2 additions & 0 deletions docs/en/observability/view-observability-alerts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ An alert is "Active" when the condition defined in the rule currently matches.
An alert has "Recovered" when that condition, which previously matched, is currently no longer matching.
An alert is "Untracked" when its corresponding rule is disabled or you mark the alert as untracked.
To mark the alert as untracked, go to the Alerts table, click the image:images/icons/boxesHorizontal.svg[More actions] icon to expand the "More actions" menu, and click *Mark as untracked*.
When an alert is marked as untracked, actions are no longer generated.
You can choose to move active alerts to this state when you disable or delete rules.

NOTE: There is also a "Flapping" status, which means the alert is switching repeatedly between active and recovered states.
This status is possible only if you have enabled alert flapping detection.
Expand Down

0 comments on commit cb7ccef

Please sign in to comment.