Skip to content

Commit

Permalink
[8.16](backport #4345) [synthetics] Update monitor status rule docs (#…
Browse files Browse the repository at this point in the history
…4424)

* [synthetics] Update monitor status rule docs (#4345)

* document synthetics monitor status rule

* document how to move from the uptime rule to the synthetics rule

* add code comments

* address feedback from @paulb-elastic

* add info on using slos for availability

* add link

* delete table

* port to serverless

(cherry picked from commit 08cf555)

# Conflicts:
#	docs/en/serverless/serverless-observability.docnav.json

* Delete docs/en/serverless directory

---------

Co-authored-by: Colleen McGinnis <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
  • Loading branch information
3 people authored Oct 23, 2024
1 parent c56683d commit 66ace86
Show file tree
Hide file tree
Showing 7 changed files with 257 additions and 7 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
264 changes: 257 additions & 7 deletions docs/en/observability/monitor-status-alert.asciidoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,158 @@
[[monitor-status-alert]]
= Create a monitor status rule

++++
<titleabbrev>Monitor status</titleabbrev>
++++

There are two types of monitor status rules:

* <<monitor-status-alert-synthetics>> for use with <<monitor-uptime-synthetics,Elastic Synthetics>>.
* deprecated:[8.15.0] <<monitor-status-alert-uptime>> for use with the Uptime app.
+
[WARNING]
====
*The Uptime app and the Uptime monitor status rule are deprecated as of version 8.15.0.*
If you are using the Uptime monitor status rule with the Uptime app, you should migrate the Uptime monitor and the Uptime monitor status rule to Elastic Synthetics and the Synthetics monitor rule.
If you are using the Uptime monitor status rule with a monitor created with Elastic Synthetics, you should migrate the Uptime monitor status rule to the Synthetics monitor rule.
Learn how in <<migrate-monitor-rule>>.
====

[discrete]
[[monitor-status-alert-synthetics]]
== Synthetics monitor status

Within the Synthetics UI, create a **Monitor Status** rule to receive notifications
based on errors and outages.

. To access this page, go to **{observability}** → **Synthetics**.
. At the top of the page, click **Alerts and rules** → **Create rule**.
. Select **Monitor status rule**.

[discrete]
[[synthetic-monitor-filters]]
=== Filters

The *Filter by* section controls the scope of the rule.
The rule will only check monitors that match the filters defined in this section.
In this example, the rule will only alert on `browser` monitors located in `Asia/Pacific - Japan`.

[role="screenshot"]
image::images/synthetic-monitor-filters.png[Filter by section of the Synthetics monitor status rule,width=600]

[discrete]
[[synthetic-monitor-conditions]]
=== Conditions

Conditions for each rule will be applied to all monitors that match the filters in the <<synthetic-monitor-filters,*Filter by* section>>.
You can choose the number of times the monitor has to be down relative to either a number of checks run
or a time range in which checks were run, and the minimum number of locations the monitor must be down in.

[NOTE]
====
Retests are included in the number of checks.
====

The *Rule schedule* defines how often to evaluate the condition. Note that checks are queued, and they run as close
to the defined value as capacity allows. For example, if a check is scheduled to run every 2 minutes, but the check
takes longer than 2 minutes to run, a check will not run until the previous check has finished.

You can also set *Advanced options* such as the number of consecutive runs that must meet the rule conditions before
an alert occurs.

In this example, the conditions will be met any time a `browser` monitor is down `3` of the last `5` times
the monitor ran across any locations that match the filter. These conditions will be evaluated every minute,
and you will only receive an alert when the conditions are met three times consecutively.

[role="screenshot"]
image::images/synthetic-monitor-conditions.png[Filters and conditions defining a Synthetics monitor status rule,width=600]

[discrete]
[[synthetic-monitor-action-types]]
=== Action types

Extend your rules by connecting them to actions that use the following supported built-in integrations.

include::../shared/alerting-connectors.asciidoc[width=600]

After you select a connector, you must set the action frequency.
You can choose to create a summary of alerts on each check interval or on a custom interval.
For example, send email notifications that summarize the new, ongoing, and recovered alerts each hour:

[role="screenshot"]
image::images/synthetic-monitor-action-types-summary.png[width=600]

Alternatively, you can set the action frequency such that you choose how often the action runs
(for example, at each check interval, only when the alert status changes, or at a custom action interval).
In this case, you must also select the specific threshold condition that affects when actions run:
the _Synthetics monitor status_ changes or when it is _Recovered_ (went from down to up).

[role="screenshot"]
image::images/synthetic-monitor-action-types-each-alert.png[width=600]

You can also further refine the conditions under which actions run by specifying that actions only run
when they match a KQL query or when an alert occurs within a specific time frame:

* *If alert matches query*: Enter a KQL query that defines field-value pairs or query conditions that must
be met for notifications to send. The query only searches alert documents in the indices specified for the rule.
* *If alert is generated during timeframe*: Set timeframe details. Notifications are only sent if alerts are
generated within the timeframe you define.

[role="screenshot"]
image::images/synthetic-monitor-action-types-more-options.png[width=600]

[discrete]
[[synthetic-monitor-action-variables]]
==== Action variables

Use the default notification message or customize it.
You can add more context to the message by clicking the icon above the message text box
and selecting from a list of available variables.

[role="screenshot"]
image::images/synthetic-monitor-action-variables.png[width=600]

The following variables are specific to this rule type.
You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules].

`context.checkedAt`:: Timestamp of the monitor run.
`context.hostName`:: Hostname of the location from which the check is performed.
`context.lastErrorMessage`:: Monitor last error message.
`context.locationId`:: Location id from which the check is performed.
`context.locationName`:: Location name from which the check is performed.
`context.locationNames`:: Location names from which the checks are performed.
`context.message`:: A generated message summarizing the status of monitors currently down.
`context.monitorId`:: ID of the monitor.
`context.monitorName`:: Name of the monitor.
`context.monitorTags`:: Tags associated with the monitor.
`context.monitorType`:: Type (for example, HTTP/TCP) of the monitor.
`context.monitorUrl`:: URL of the monitor.
`context.reason`:: A concise description of the reason for the alert.
`context.recoveryReason`:: A concise description of the reason for the recovery.
`context.status`:: Monitor status (for example, "down").
`context.viewInAppUrl`:: Open alert details and context in Synthetics app.

[discrete]
[[monitor-status-alert-uptime]]
== Uptime monitor status

[WARNING]
====
*The Uptime app and the Uptime monitor status rule are deprecated as of version 8.15.0.*
If you are using the Uptime monitor status rule with the Uptime app, you should migrate the Uptime monitor and the Uptime monitor status rule to Elastic Synthetics and the Synthetics monitor rule.
If you are using the Uptime monitor status rule with a monitor created with Elastic Synthetics, you should migrate the Uptime monitor status rule to the Synthetics monitor rule.
Learn how in <<migrate-monitor-rule>>.
====

Within the {uptime-app}, create a **Monitor Status** rule to receive notifications
based on errors and outages.

. To access this page, go to **{observability}** -> **Uptime**.
. At the top of the page, click **Alerts and rules** -> **Create rule**.
. To access this page, go to **{observability}** **Uptime**.
. At the top of the page, click **Alerts and rules** **Create rule**.
. Select **Monitor status rule**.

[TIP]
Expand All @@ -18,7 +162,7 @@ If you already have a query in the overview page search bar, it's populated here

[discrete]
[[status-alert-conditions]]
== Conditions
=== Conditions

You can specify the following thresholds for your rule.

Expand All @@ -45,7 +189,7 @@ the alert is triggered.

[discrete]
[[action-types-status]]
== Action types
=== Action types

You can extend your rules by connecting them to actions that use the following
supported built-in integrations. Actions are {kib} services or integrations with
Expand All @@ -69,7 +213,7 @@ image::images/uptime-run-when-selection.png[Action frequency for each alert,widt

[discrete]
[[action-variables-status]]
== Action variables
=== Action variables

Use the default notification message or customize it.
You can add more context to the message by clicking the icon above the message text box
Expand All @@ -80,9 +224,115 @@ image::images/monitor-status-alert-default-message.png[Default notification mess

[discrete]
[[recovery-variables-status]]
== Alert recovery
=== Alert recovery

To receive a notification when the alert recovers, select *Run when Recovered*. Use the default notification message or customize it. You can add more context to the message by clicking the icon above the message text box and selecting from a list of available variables.

[role="screenshot"]
image::images/monitor-status-alert-recovery.png[Default recovery message for monitor status rules with open "Add variable" popup listing available action variables,width=600]
image::images/monitor-status-alert-recovery.png[Default recovery message for monitor status rules with open "Add variable" popup listing available action variables,width=600]

[discrete]
[[migrate-monitor-rule]]
== Migrate from the Uptime rule to the Synthetics rule

If you are currently using the Uptime monitor status with a monitor created with Elastic Synthetics,
you should migrate the Uptime monitor status rule to:

* The *Synthetics monitor rule* for <<migrate-monitor-rule-synthetics-rule,synthetic monitor _status_ checks>>.
* The *Synthetics availability SLI* for <<migrate-monitor-rule-synthetics-sli,synthetic monitor _availability_ checks>>.

[discrete]
[[migrate-monitor-rule-synthetics-rule]]
=== Uptime status check to Synthetics monitor rule

[discrete]
==== Filters

The KQL syntax that you used in the Uptime monitor status rule is also valid in the *Filter by* section
of the Synthetics monitor status rule. The Synthetics monitor status rule also offers dropdowns for several
categories for easy filtering. However, you can still use KQL syntax for these categories if you prefer.

[discrete]
==== Conditions

[NOTE]
====
If you are using the _Uptime availability condition_ refer to <<migrate-monitor-rule-synthetics-sli>>.
====

If you're using the Uptime status check condition, you can recreate similar effects using the following
Synthetics monitor status rule condition equivalents:

|===
| | Uptime | Synthetics equivalent

| *Number of times the monitor is down*
a| `ANY MONITOR IS DOWN >=` `{number}` times

*Example*: `ANY MONITOR IS DOWN >=` `5` times
a| `IS DOWN` `{number}` times

*Example*: `IS DOWN` `5` times

| *Timeframe*
a| `WITHIN last` `{number}` `{time range unit}`

*Example*: `WITHIN last` `15` `minutes`

a| `WITHIN THE LAST` `{number}` `{time range unit}`

*Example*: `WITHIN THE LAST` `15` `minutes`

|===

[discrete]
==== Actions

The default messages for the Uptime monitor status rule and Synthetics monitor status rule are different,
but you can recreate similar messages using <<synthetic-monitor-action-variables,Synthetics monitor status rule action variables>>.

[discrete]
[[migrate-monitor-rule-synthetics-sli]]
=== Uptime availability check to Synthetics availability SLI

SLOs allow you to set clear, measurable targets for your service performance, based on factors like availability.
The <<synthetics-availability-sli,Synthetics availability SLI>> is a service-level indicator (SLI) based on the
availability of your synthetic monitors.

[discrete]
==== Filters

The KQL syntax that you used in the Uptime monitor status rule is also valid in the *Query filter*
field of the Synthetics availability SLI.

[discrete]
==== Conditions

Use the following Synthetics availability SLI fields to replace the Uptime monitor status rule's
availability conditions:

|===
| | Uptime | Synthetics equivalent

| *Number of checks that are down relative to all checks run*
a| `ANY MONITOR IS UP IN <` `{percent}` of checks

*Example*: `ANY MONITOR IS UP IN <` `90%` of checks
a| *Target / SLO (%)* field

*Example*: `90%`

| *Timeframe*
a| `WITHIN THE LAST` `{number}` `{time range unit}`

*Example*: `WITHIN THE LAST` `30` `days`
a| *Time window* and *Duration* fields

*Example*: Time window: `Rolling`, Duration: `30 days`
|===

[discrete]
==== Actions

After creating a new SLO using the Synthetics availability SLI, you can use the SLO burn rate rule.
For more information about configuring the rule, see <<slo-burn-rate-alert,Create an SLO burn rate rule>>.

0 comments on commit 66ace86

Please sign in to comment.