Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating the alert-conditions document #19383

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,28 @@ For all methods except for our guided mode, the process for creating an alert co
>
The [lost signal threshold](/docs/alerts/create-alert/create-alert-condition/create-nrql-alert-conditions/#signal-loss) determines how long to wait before considering a missing signal lost. If the signal doesn't return within that time, you can choose to open a new incident or close any related ones. You can also choose to skip opening an incident when a signal is expected to terminate. Set the threshold based on your system's expected behavior and data collection frequency. For example, if a website experiences a complete loss of traffic, or throughput, the corresponding telemetry data sent to New Relic will also cease. Monitoring for this loss of signal can serve as an early warning system for such outages.
</Collapser>

<Collapser
id="alert-condition-optimization"
title="Alert condition optimization"
>
Consider removing the “filter” clause and instead adding a “loss of signal” threshold setting to close any open incidents after the signal is lost (see the screenshot below for an example). This will ensure that:

1. Any open incidents related to “count is above the threshold for too long” are closed in a timely manner.
2. The entity status color will return to green (not gray).

<Callout variant="tip">
This approach will result in the entity status color being gray until or unless the entity has ever had an incident opened against it.
</Callout>

* Avoid broad queries targeting large amounts of data; use `WHERE` filters to reduce the number of events scanned.
* Use `WHERE` filters outside of the `SELECT` statement if possible. Alert conditions match incoming data points that meet the criteria of everything except the condition's `SELECT` clause.
* Remove alert conditions that trigger issues but do not send notifications (noise/non-actionable).
* Remove noisy alert conditions (alerts that constantly notify very frequently or are open for less than 5 minutes).
* Combine duplicate conditions targeting the same entities or signals.
* Validate if a sliding window is truly needed. When enabled, this may cause data points to match multiple overlapping time windows, increasing CCU.
* In some cases, it is possible to create events-to-metrics rules that streamline data and reduce its volume. If you need to use logs for a simple up/down alert, consider converting that signal to a metric first.
</Collapser>
</CollapserGroup>
</Step>

Expand Down
Loading