From c8218c8ee89f71d4ae10b1e081de3a5539cebc38 Mon Sep 17 00:00:00 2001 From: nbaenam Date: Wed, 16 Oct 2024 12:57:24 +0200 Subject: [PATCH] fix(Alerts): Improving the alert conditions section --- .../create-alert-conditions.mdx | 383 ++++++++ .../create-nrql-conditions.mdx | 882 ++++++++++++++++++ src/nav/alerts.yml | 96 +- ...alert-policies-create-alert-condition.webp | Bin 0 -> 98374 bytes ...hot-crop_chart-create-alert-condition.webp | Bin 0 -> 69896 bytes ...screenshot-crop_condition-add-details.webp | Bin 0 -> 102000 bytes ...ot-crop_conditions-guided-mode-option.webp | Bin 0 -> 24450 bytes ...enshot-crop_conditions-set-thresholds.webp | Bin 0 -> 108618 bytes ...hot-crop_conditions-writing-own-query.webp | Bin 0 -> 24634 bytes 9 files changed, 1323 insertions(+), 38 deletions(-) create mode 100644 src/content/docs/alerts/alert-conditions/create-alert-conditions.mdx create mode 100644 src/content/docs/alerts/alert-conditions/create-nrql-conditions.mdx create mode 100644 static/images/alerts_screenshot-crop_alert-policies-create-alert-condition.webp create mode 100644 static/images/alerts_screenshot-crop_chart-create-alert-condition.webp create mode 100644 static/images/alerts_screenshot-crop_condition-add-details.webp create mode 100644 static/images/alerts_screenshot-crop_conditions-guided-mode-option.webp create mode 100644 static/images/alerts_screenshot-crop_conditions-set-thresholds.webp create mode 100644 static/images/alerts_screenshot-crop_conditions-writing-own-query.webp diff --git a/src/content/docs/alerts/alert-conditions/create-alert-conditions.mdx b/src/content/docs/alerts/alert-conditions/create-alert-conditions.mdx new file mode 100644 index 00000000000..d6612d88ccb --- /dev/null +++ b/src/content/docs/alerts/alert-conditions/create-alert-conditions.mdx @@ -0,0 +1,383 @@ +--- +title: Create alert conditions +tags: + - Alerts + - Alert conditions +translate: + - jp +metaDescription: "Use the conditions page to identify what triggers an alert policy's notification, starting with the product and type of metric or service." +redirects: + - /docs/alerts-applied-intelligence/new-relic-alerts/get-started/your-first-nrql-condition + - /docs/alerts/alert-policies/configuring-alerts/managing-your-alerts + - /docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/alert-conditions + - /docs/alerts-applied-intelligence/new-relic-alerts/advanced-alerts/advanced-techniques/select-product-targets-alert-condition + - /docs/alerts/create-alert/create-alert-condition/create-alert-conditions + - /docs/alerts/create-alert/create-alert-condition/update-or-disable-policies-conditions + - /docs/alerts/new-relic-alerts-beta/configuring-alert-policies/define-alert-conditions +freshnessValidatedDate: never +--- + +An alert condition is the core element that defines when an [incident](/docs/alerts-applied-intelligence/new-relic-alerts/advanced-alerts/understand-technical-concepts/incident-event-attributes/#definition) is created. It acts as the essential starting point for building any meaningful alert. Alert conditions contain the parameters or thresholds met before you're informed. They can mitigate excessive alerting or tell your team when new or unusual behavior appears. + +An alert condition is a continuously running query that measures a given set of events against a defined threshold and opens an [incident](/docs/alerts-applied-intelligence/new-relic-alerts/alert-policies/specify-when-alerts-create-incidents/) when the threshold is met for a specified window of time. + +There are a lot of ways to create an alert condition. You can create an alert condition from: + +* A [chart](#create-chart) +* Alert [policies](#create-policy) +* The [**Alert coverage gaps**](/docs/alerts/create-alert/alert-coverage-gaps/#create-an-alert) option +* The [**Use guided mode**](#create-guided-mode) option in the UI +* The [**Write your own query**](#create-own-query) option in the UI + + + + + + + + Create from a chart + + + + Create from alert policies + + + + Create from the guided mode + + + + Create writing your own query + + + + + + You can create an alert condition from pre-existing NRQL queries that are part of a chart. + + To create a new alert condition from a chart, follow these steps: + + 1. Go to **[one.newrelic.com > All capabilities](https://one.newrelic.com/all-capabilities) > Dashboards** and select a dashboard. + + 2. Find a chart you want to use to create your alert condition, click the icon on the right corner of the chart, and select **Create alert condition**. + + Create an alert condition from a chart + + 3. The create new alert condition page opens having a specific query. **Run**. + + 4. Review your NRQL query and click **Next**. + + + + + + You can create a new alert condition from alert policies. + + To create an alert condition from alert policies, follow these steps: + + 1. Go to **[one.newrelic.com > All capabilities](https://one.newrelic.com/all-capabilities) > Alerts**. + + 2. Select **Alert Policies** in the left navigation. + + 3. Click **+ New alert condition**. + + Create an alert condition from alert policies + + 4. Select one of these options: + + * **[Use guided mode](#create-guided-mode)** + + * **[Write your own query](#create-own-query)** + + + + + When you create a new alert condition using the guided mode, you'll have several options that you'll need to choose. From your selected options, we'll build your query. We recommend this option becauseā€¦. + + To create an alert condition using the guided mode, follow these steps: + + 1. Go to **[one.newrelic.com > All capabilities](https://one.newrelic.com/all-capabilities) > Alerts**. + + 2. Select **Alert Conditions** in the left navigation. + + 3. Click **+ New alert condition**. + + 4. Select **Use guided mode**. + + Create an alert condition using the guided mode + + 5. Select the part or parts of your system you want to include in your alert condition. + + 6. Click **Next**. + + 7. Select the entities to watch. + + 8. Select a metric to monitor. Depending on the selected parts of your system, you'll see different metrics. These are the usual: + + * [Golden metrics](/docs/apis/nerdgraph/examples/golden-metrics-entities-nerdgraph-api-tutorial/) + + * Other metrics + + If you selected Host, you'll see these metrics: + * Golden metrics + * Host metrics + * Host not reporting + * Storage metrics + * Network metrics + * Process metrics + + If you selected Synthetic monitors, you'll see these metrics: + * Median duration (s) + * Failures + + 9. Review your NRQL query. + + 10. Click **Next**. + + + + + This option allows you to use NRQL to define your alert from scratch. + + To create an alert condition writing your own query, follow these steps: + + 1. Go to **[one.newrelic.com > All capabilities](https://one.newrelic.com/all-capabilities) > Alerts**. + + 2. Select **Alert Conditions** in the left navigation. + + 3. Click **New alert condition**. + + 4. Select **Write your own query**. + + Create an alert condition writing your own query + + 5. Select the part or parts of your system you want to include in your alert condition. + + 6. Click **Next**. + + 7. Write your query and click **Run**. + + 8. Review your NRQL query and click **Next**. + + + + + + + + + ### Set thresholds for alert conditions [#thresholds] + + A [threshold](/docs/new-relic-solutions/get-started/glossary/#alert-threshold) is a value that you define in your alert condition. Thresholds are the rules each alert condition must follow. When this defined value is reached for a specified window of time, an [incident](/docs/new-relic-solutions/get-started/glossary/#alert-incident) is created. An incident means there is a problem with your system and you should investigate. + + Set thresholds for alert conditions + + + + + Setting the window duration for your alert condition tells New Relic how to group your data. If you're creating an alert condition for a data set that sends a signal to New Relic once every hour, you'd want to set the window duration to something closer to sixty minutes because it'll help spot patterns and unusual behavior. But, if you're creating an alert condition for web transaction time and New Relic collects a signal for that data every minute, we'd recommend setting the window duration to one minute. + + For your first alert we recommend sticking with our default settings, but the more you get familiar with creating an alert condition we encourage you to customize these fields based on your own experience. + + + + + Throughout the day, data streams from your application into New Relic. Instead of evaluating that data immediately for incidents, alert conditions collect the data over a period of time known as the **aggregation window**. An additional delay allows for slower data points to arrive before the window is aggregated. + + Sliding windows are helpful when you need to smooth out "spiky" charts. One common use case is to use sliding windows to smooth line graphs that have a lot of variation over short periods of time in cases where the rolling aggregate is more important than aggregates from narrow windows of time. + + We recommend using our sliding window aggregation if you're not expecting to have a steady and consistent stream of data but are expecting some dips and spikes in data. + + + + + In general, we recommend using the **event flow** streaming method. This is best for data that comes into your system frequently and steadily. There are specific cases where **event timer** might be a better method to choose, but for your first alert we recommend our default, **event flow**. To better understand which streaming method to choose, see [Streaming alerts: key terms and concepts](/docs/alerts/create-alert/fine-tune/streaming-alerts-key-terms-concepts/#aggregation-methods). + + + + + This field indicates how long we need to wait after each data point to make sure sure we've processed the entire bach. + + Note that if your timer is much shorter than your window duration and your data flow is inconsistent, your alerts may not be accurate. + + + + + + Gap filling lets you customize the values to use when your signals don't have any data. You can fill gaps in your data streams with one of these settings: + + * **None**: (Default) Choose this if you don't want to take any action on empty aggregation windows. On evaluation, an empty aggregation window will reset the threshold duration timer. For example, if a condition says that all aggregation windows must have data points above the threshold for 5 minutes, and 1 of the 5 aggregation windows is empty, then the condition won't be an incident. + + * **Custom static value**: Choose this if you'd like to insert a custom static value into the empty aggregation windows before they're evaluated. This option has an additional, required parameter of `fillValue` (as named in the API) that specifies what static value should be used. This defaults to `0`. + + * **Last known value**: This option inserts the last seen value before evaluation occurs. We maintain the state of the last seen value for a minimum of 2 hours. If the configured threshold duration is longer than 2 hours, this value is kept for that duration instead. + + + + + Evaluation delay is how long we wait before we start evaluating a signal agains the thresholds in this condition. You can enable the `Use evaluation delay` flag and set up to 120 minutes to delay the evalution of incoming signals. + + When new entities are first deployed, resource utilization on the entity is often unusually high. In autoscale environments this can easily create a lot of false alerts. By delaying the start of alert detection on signals emitted from new entities you can significantly reduce the number of false alarms associated with deployments in orchestrated or autoscale environments. + + + + + Anomaly thresholds are ideal when you're more concerned about deviations from expected patterns than specific numerical values. They enable you to monitor for unusual activity without needing to set predefined limits. New Relic's anomaly detection dynamically analyzes your data over time, adapting thresholds to reflect evolving system behavior. + + Setting up anomaly detection: + + 1. Choose upper or lower: + * **Upper and lower** to be alerted about any higher and lower deviations than expected. + * **Lower only**: + * **Upper only**: To focus solely on unusually high values. + 2. Assign security level: + * **Critical**: Set the priority level to critical for your initial alert to ensure prompt attention to potential issues. + * **Warning**: + 3. **When a query returns a value outside the threshold** + + You can learn more about priority levels in our [alert condition docs](/docs/alerts-applied-intelligence/new-relic-alerts/advanced-alerts/advanced-techniques/set-thresholds-alert-condition#threshold-levels). You can also check our documentation about [anomaly threshold and model behaviors](/docs/alerts/create-alert/set-thresholds/anomaly-detection/). + + + + Unlike anomaly thresholds, a static threshold doesn't look at your data set as a whole and determines what behavior is unusual based on your system's history. Instead, a static threshold will open an incident whenever your system behaves differently than the criteria that you set. + + You need to set the priority level for both anomaly and static thresholds. See the section above for more details. + + + + You can use the **Consider the signal lost after** option to adjust the time window from 30 seconds to 48 hours. The [lost signal threshold](/docs/alerts/create-alert/create-alert-condition/create-nrql-alert-conditions/#signal-loss) determines how long to wait before considering a missing signal lost. If the signal doesn't return within that time, you can choose to open a new incident or close any related ones. You can also choose to skip opening an incident when a signal is expected to terminate. Set the threshold based on your system's expected behavior and data collection frequency. For example, if a website experiences a complete loss of traffic, or throughput, the corresponding telemetry data sent to New Relic will also cease. Monitoring for this loss of signal can serve as an early warning system for such outages. + + + + + + ### Add alert condition details [#add-details] + + Add alert condition details + + + + A best practice for condition naming involves a structured format that conveys essential information at a glance. Include the following elements in your condition names: + + * **Priority**: Indicate the severity or urgency of the alert, like P1, P2, P3. + * **Signal**: Specify the metric or condition being monitored, like High Avg Latency or Low Throughput. + * **Entity**: Identify the affected system, application, or component, like WebPortal App or Database Server. + + + An example of a well-formed condition name following this structure would be `P2 | High Avg Latency | WebPortal App`. + + + + If you already have a policy you want to connect to an alert condition, then select the existing policy. Learn more about policies [here](/docs/alerts/organize-alerts/create-edit-or-find-alert-policy/). + + If you prefer to create a new policy, you'll have these options: + + * **Policy name**: Type a [meaningful name](/docs/alerts/organize-alerts/create-edit-or-find-alert-policy/#best-practices-policies) for the policy (maximum 64 characters). + + * **Group incidents into issues**: You have to choose an issue preference option. See [Issue preference options](/docs/alerts/organize-alerts/specify-when-alerts-create-incidents/#preference-options) for more information about it. + + + Check the box **Correlate and suppress noise** to enable [correlation](/docs/alerts/organize-alerts/change-applied-intelligence-correlation-logic-decisions/#configure-correlation) for the alert policy and only get notified when you need to take action. + + + + An incident automatically closes when the targeted signal returns to a non-breaching state for the period indicated in the condition's thresholds. This wait time is called the recovery period. + + When an incident closes automatically: + + 1. The closing timestamp is backdated to the start of the recovery period. + 2. The evaluation resets and restarts from when the previous incident ended. + + + All conditions have an incident time limit setting that automatically force-close a long-lasting incident. New Relic automatically defaults to 3 days and recommends that you use our default settings for your first alert. Another way to close an open incident when the signal does not return data is by configuring a [`loss of signal`](/docs/alerts/create-alert/create-alert-condition/create-nrql-alert-conditions/#signal-loss) threshold. + + + + + + A [title template](/docs/alerts/create-alert/condition-details/title-template) is used when incidents are opened by the condition. It overrides the default title. Your title should use handlebars for incident event attributes. For example, `{{conditionName}}` targeting `{{targetName}}` incident. + + You can use the **Description template** field to define a description template with tags and custom attributes such as host name, owning team, and product, to consistently pass useful information downstream. + + + If you'd like to link to a runbook for the condition that triggered the incident, you can add the URL in the runbook URL field. + + + + + + ### Save your alert condition + + Once you've finished, click **Save condition**. You'll see a summary of your alert condition. + + + diff --git a/src/content/docs/alerts/alert-conditions/create-nrql-conditions.mdx b/src/content/docs/alerts/alert-conditions/create-nrql-conditions.mdx new file mode 100644 index 00000000000..e83fb9eb4c3 --- /dev/null +++ b/src/content/docs/alerts/alert-conditions/create-nrql-conditions.mdx @@ -0,0 +1,882 @@ +--- +title: Create NRQL alert conditions +tags: + - Alerts + - Alert conditions +translate: + - jp + - kr +metaDescription: How to define thresholds that trigger alert notifications based on your NRQL queries. +redirects: + - /docs/new-relic-alerts-nrql-alerts + - /docs/new-relic-alerts-alert-nrql-queries + - /docs/alerts/new-relic-alerts/configuring-alert-policies/alert-conditions-nrql-queries + - /docs/alerts/new-relic-alerts/configuring-alert-policies/create-alert-conditions-nrql-queries + - /docs/alerts/new-relic-alerts/defining-conditions/create-alert-conditions-nrql-queries + - /docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-nrql-alert-conditions + - /docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/queries-nrql_screenshot-full_nrql-alert-conditions/ +freshnessValidatedDate: 2024-08-01 +--- + +We recommend the use of [New Relic Query Language (NRQL)](/docs/nrql/get-started/introduction-nrql-new-relics-query-language/) to create alert conditions. This doc will guide you through formatting and configuring your NRQL alert conditions to maximize efficiency and reduce noise. If you've just started with New Relic, or you haven't created an alert condition yet, we recommend starting with [Alert conditions](/docs/alerts/alert-conditions/create-alert-conditions). + +No matter where you begin creating an alert condition, whether through a chart, from alert policies, or by writing your own query, NRQL is the building block upon which you can define your signal and set your thresholds. + +## NRQL alert syntax [#syntax] + +Here's the basic syntax for creating all NRQL alert conditions. + +```sql +SELECT function(attribute) +FROM Event +WHERE attribute [comparison] [AND|OR ...] +``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + **Clause** + + + + **Notes** + +
+ `SELECT function(attribute)` + + + **Required** + + + Supported [functions](/docs/query-your-data/nrql-new-relic-query-language/get-started/nrql-syntax-clauses-functions/#functions) that return numbers include: + + * `apdex` + * `average` + * `count` + * `latest` + * `max` + * `min` + * `percentage` + * `percentile` + * `sum` + * `uniqueCount` + + + If you use the `percentile` aggregator in a faceted alert condition with many facets, this may cause this error: + + `An error occurred while fetching chart data.` + + If you see this error, use `average` instead. + +
+ `FROM data type` + + + **Required** + + + Multiple [data types](/docs/data-apis/understand-data/new-relic-data-types/) can be targeted. + + Supported data types: + + * Events + * `Metric` (RAW data points will be returned) +
+ `WHERE attribute [comparison] [AND|OR ...]` + + Use the `WHERE` clause to specify a series of one or more conditions. All the [operators](/docs/nrql/nrql-syntax-clauses-functions/#sel-where) are supported. It's used for filtering down the data returned in the query. +
+ `FACET` attribute + + Include an optional `FACET` clause in your NRQL syntax depending on the [threshold type](#threshold-types) (static or anomaly). + + Use the [`FACET`](/docs/query-your-data/nrql-new-relic-query-language/get-started/nrql-syntax-clauses-functions/#sel-facet) clause to separate your results by attribute and alert on each attribute independently. No `LIMIT` clause is allowed, but all queries will receive the maximum number of facets possible. + + Faceted queries can return a maximum of 5000 values for [static and anomaly](#threshold-types) conditions. + + + If the query returns more than the maximum number of values, the alert condition can't be created. If you create the condition and the query returns more than this number later, the alert will fail. Modify your query so that it returns a fewer number of values. + +
+ +## Reformatting incompatible NRQL [#reformatting] + +Some elements of NRQL used in charts don't make sense in the context of streaming alerts. Here's a list of the most common incompatible elements and suggestions for reformatting a NRQL alert query to achieve the same effect. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + **Element** + + + + **Notes** + +
+ `SINCE` and `UNTIL` + + Example: + + ```sql + SELECT percentile(largestContentfulPaint, 75) + FROM PageViewTiming + WHERE (appId = 837807) SINCE yesterday + ``` + + NRQL conditions produce a never-ending stream of windowed query results, so the `SINCE` and `UNTIL` keywords to scope the query to a point in time are not compatible. As a convenience, we automatically strip `SINCE` and `UNTIL` from a query when creating a condition from the context of a chart. +
+ `TIMESERIES` + + In NRQL queries, the `TIMESERIES` clause is used to return data as a time series broken out by a specified period of time. + + For NRQL conditions and if not using sliding window aggregation, the equivalent property to `TIMESERIES` is the data aggregation window duration. If you are using sliding window aggregation, the equivalent property is the value of the sliding window aggregation. +
+ `histogram()` + + The `histogram()` aggregation function is used to generate histograms. + + `histogram()` is not compatible with NRQL alerting: histogram aggregations can not be formatted as a time series. To create an alert from a portion of a histogram (for example, 95th percentile), use the [`percentile()`](/docs/query-your-data/nrql-new-relic-query-language/get-started/nrql-syntax-clauses-functions/#func-percentile) aggregation function. +
+ `bytecountestimate()`, `cardinality()` + + These functions are not yet supported for NRQL alerting. +
+ Multiple aggregation functions + + Each condition can only target a single aggregated value. To alert on multiple values simultaneously, you'll need to decompose them into individual conditions within the same policy. + + Original query: + + ```sql + SELECT count(foo), average(bar), max(baz) + FROM Transaction + ``` + + Decomposed: + + ```sql + SELECT count(foo) FROM Transaction + + SELECT average(bar) FROM Transaction + + SELECT max(baz) FROM Transaction + ``` +
+ `COMPARE WITH` + + The `COMPARE WITH` clause is used to compare the values for two different time ranges. This type of query is incompatible with NRQL alerting. We recommend using an [anomaly alert condition](/docs/alerts-applied-intelligence/applied-intelligence/anomaly-detection/custom-anomalies/) to dynamically detect deviations for a particular signal. +
+ `SLIDE BY` + + The `SLIDE BY` clause supports a feature known as [sliding windows](/docs/alerts/alert-conditions/create-alert-conditions/#sliding-window). With sliding windows, `SLIDE BY` data is gathered into "windows" of time that overlap with each other. These windows can help to smooth out line graphs with a lot of variation in cases where the rolling aggregate (such as a rolling mean) is more important than aggregates from narrow windows of time. + + This example creates an alert condition with a data aggregation window duration of 5 minutes and a sliding window aggregation of 1 minute: + + ```sql + SELECT count(*) + FROM Transaction + TIMESERIES 1 minute SLIDE BY 5 minutes + ``` +
+ `LIMIT` + + In NRQL queries, the `LIMIT` clause is used to control the amount of data a query returns, either the maximum number of facet values returned by `FACET` queries or the maximum number of items returned by `SELECT *` queries. + + `LIMIT` is not compatible with NRQL alerting: evaluation is always performed on the full result set. +
+ Subqueries + + [Subqueries](/docs/query-your-data/nrql-new-relic-query-language/get-started/subqueries-in-nrql) are not compatible with streaming because subquery execution requires multiple passes through data. +
+ Subquery JOINs + + [Subquery JOINS](/docs/query-your-data/nrql-new-relic-query-language/nrql-query-tutorials/subquery-joins) are not compatible with streaming alerts because subquery execution requires multiple passes through data. +
+ +## NRQL alert threshold examples [#examples] + +Here are some common use cases for NRQL conditions. These queries will work for static and anomaly [condition types](#threshold-types). + + + + Create constrained alerts that target a specific segment of your data, such as a few key customers or a range of data. Use the `WHERE` clause to define those conditions. + + ```sql + SELECT average(duration) + FROM Transaction + WHERE account_id IN (91290, 102021, 20230) + ``` + + ```sql + SELECT percentile(duration, 95) + FROM Transaction + WHERE name LIKE 'Controller/checkout/%' + ``` + + + + Create alerts when an Nth percentile of your data hits a specified threshold; for example, maintaining SLA service levels. Since we evaluate the NRQL query based on the aggregation window duration, percentiles will be calculated for each duration separately. + + ```sql + SELECT percentile(duration, 95) + FROM Transaction + ``` + + ```sql + SELECT percentile(databaseDuration, 75) + FROM Transaction + ``` + + + + Create alerts when your data hits a certain maximum, minimum, or average; for example, ensuring that a duration or response time does not pass a certain threshold. + + ```sql + SELECT max(duration) + FROM Transaction + ``` + + ```sql + SELECT min(duration) + FROM Transaction + ``` + + ```sql + SELECT average(duration) + FROM Transaction + ``` + + + + Create alerts when a proportion of your data goes above or below a certain threshold. + + ```sql + SELECT percentage(count(*), WHERE duration > 2) + FROM Transaction + ``` + + ```sql + SELECT percentage(count(*), WHERE http.statusCode = '500') + FROM Transaction + ``` + + + + Create alerts on [Apdex](/docs/apm/new-relic-apm/apdex/apdex-measuring-user-satisfaction), applying your own T-value for certain transactions. For example, get an alert notification when your Apdex for a T-value of 500ms on transactions for production apps goes below 0.8. + + ```sql + SELECT apdex(duration, t:0.5) + FROM Transaction + WHERE appName LIKE '%prod%' + ``` + + + +## NRQL conditions and query order of operations [#query-order] + +By default, the aggregation window duration is 1 minute, but you can change the window to suit your needs. Whatever the aggregation window, New Relic will collect data for that window using the function in the NRQL condition's query. The query is parsed and executed by our systems in the following order: + +1. `FROM` clause. Which event type needs to be grabbed? +2. `WHERE` clause. What can be filtered out? +3. `SELECT` clause. What information needs to be returned from the now-filtered data set? + +### Example: null value returned [#example-null] + +Let's say this is your alert condition query: + +```sql +SELECT count(*) +FROM SyntheticCheck +WHERE monitorName = 'My Cool Monitor' AND result = 'FAILED' +``` + +If there are no failures for the aggregation window: + +1. The system will execute the `FROM` clause by grabbing all `SyntheticCheck` events on your account. +2. Then it will execute the `WHERE` clause to filter through those events by looking only for the ones that match the monitor name and result specified. +3. If there are still events left to scan through after completing the `FROM` and `WHERE` operations, the `SELECT` clause will be executed. If there are no remaining events, the `SELECT` clause will not be executed. + +This means that aggregators like `count()` and `uniqueCount()` will never return a zero value. When there is a count of 0, the `SELECT` clause is ignored and no data is returned, resulting in a value of `NULL`. + +### Example: zero value returned [#example-zero] + +If you have a data source delivering legitimate numeric zeroes, the query will return zero values and not null values. + +Let's say this is your alert condition query, and that `MyCoolEvent` is an attribute that can sometimes return a zero value. + +```sql +SELECT average(MyCoolAttribute) +FROM MyCoolEvent +``` + +If, in the aggregation window being evaluated, there's at least one instance of `MyCoolEvent` and if the average value of all `MyCoolAttribute` attributes from that window is equal to zero, then a `0` value will be returned. If there are no `MyCoolEvent` events during that minute, then a `NULL` will be returned due to the order of operations. + +### Example: null vs. zero value returned [#example-null-zero] + +To determine how null values will be handled, adjust the loss of signal and gap filling settings in the [Alert conditions UI](/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-nrql-alert-conditions/#signal-loss). + +You can avoid `NULL` values entirely with a query order of operations shortcut. To do this, use a `filter` sub-clause, then include all filter elements within that sub-clause. The main body of the query should include a `WHERE` clause that defines at least one entity so, for any aggregation window where the monitor performs a check, the signal will be tied to that entity. The `SELECT` clause will then run and apply the filter elements to the data returned by the main body of the query, which will return a value of `0` if the filter elements result in no matching data. + +Here's an example to alert on `FAILED` results: + +```sql +SELECT filter(count(*), WHERE result = 'FAILED') +FROM SyntheticCheck +WHERE monitorName = 'My Favorite Monitor' +``` + +In this example, a window with a successful result would return a `0`, allowing the condition's threshold to resolve on its own. + +For more information, check out our [blog post](https://discuss.newrelic.com/t/relic-solution-how-can-i-figure-out-when-to-use-gap-filling-and-loss-of-signal/120401) on troubleshooting for zero versus null values. + +## Nested aggregation NRQL alerts [#h2-nested-aggregation-nrql-alerts] + +[Nested aggregation queries](/docs/query-your-data/nrql-new-relic-query-language/nrql-query-tutorials/nested-aggregation-make-ordered-computations-single-query) are a powerful way to query your data. However, they have a few restrictions that are important to note. + + + + Without a `FACET`, the inner query produces a single result, giving the outer query nothing to aggregate. If you're using a nested query, make sure your inner query is faceted. + + ```sql + SELECT max(cpu) + FROM + ( + SELECT min(cpuPercent) AS 'cpu' + FROM SystemSample + FACET hostname + ) + ``` + + + + With an alert aggregation window of 1 minute, the inner query would produce two smaller windows of 30 seconds. In theory, these two windows could be aggregated by the outer query. However, this is not currently supported. + + ```sql + SELECT max(cpu) + FROM + ( + SELECT min(cpuTime) AS cpu TIMESERIES 30 seconds + FROM Event + ) + ``` + + + + For more information on signal loss, see [NerdGraph API: Loss of signal and gap filling](/docs/alerts-applied-intelligence/new-relic-alerts/alerts-nerdgraph/nerdgraph-api-loss-signal-gap-filling). + + + + Nested queries for [metric timeslice](/docs/data-apis/understand-data/new-relic-data-types/#timeslice-data) isn't supported. More specifically, these terms are not allowed in the inner query of NRQL alert conditions: + + * `WITH METRIC_FORMAT` + * `metricTimesliceName` + * `keyset`, `uniques`, `nativesizeestimate`, or `bytecountestimate` called on the `Metric` type + * `newrelic.timeslice.value` + * `apm.service.*`, `apm.browser.*` , `apm.mobile.*`, `apm.key.transaction.*` + + + +## NRQL condition creation tips [#condition-tips] + +Here are some tips for creating and using a NRQL condition: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ Topic + + Tips +
+ Condition types + + NRQL condition types include [static and anomaly](#threshold-types). +
+ Create a description + + For NRQL conditions, you can create a custom [description](/docs/alerts/new-relic-alerts/defining-conditions/alert-condition-descriptions) to add to each incident. Descriptions can be enhanced with variable substitution based on metadata in the specific incident. +
+ Query results + + Queries must return a number. The condition evaluates the returned number against the thresholds you've set. +
+ Time period + + NRQL conditions evaluate data based on how it's aggregated, using aggregation windows from 30 seconds to 120 minutes, in increments of 15 seconds. For best results, we recommend using the event flow or event timer aggregation methods. + + For the cadence aggregation method, the implicit `SINCE ... UNTIL` clause specifying which minute to evaluate is controlled by your [delay/timer](#delay-timer) setting. Since very recent data may be incomplete, you may want to query data from 3 minutes ago or longer, especially for: + + * Applications that run on multiple hosts. + * `SyntheticCheck` data: Timeouts can take 3 minutes, so 5 minutes or more is recommended. + + Also, if a query will generate intermittent data, consider using the advanced signal [`slide by`](#sliding-window-aggregation) option. +
+ Lost signal threshold + (loss of signal detection) + + You can use loss of signal detection to alert on when your data (a telemetry signal) should be considered lost. A signal loss can indicate that a service or entity is no longer online or that a periodic job failed to run. You can also use this to make sure that incidents for sporadic data, such as error counts, are closed when no signal is coming in. +
+ Advanced signal settings + + These settings give you options for better handling continuous, streaming data signals that may sometimes be missing. These settings include the aggregation window duration, the delay/timer, and an option for filling data gaps. For more on using these, see [Advanced signal settings](#advanced-signal). +
+ Condition settings + + Use the **Condition settings** to: + + * Create a concise, descriptive [condition name](/docs/alerts/new-relic-alerts/configuring-alert-policies/define-alert-conditions#rename-condition). + * Provide a custom incident description for the condition on the **Add details** page that will be included in incidents and notifications. + * Add the runbook URL to include your organization's procedures for handling incidents. You may also add this information to the custom incident description. +
+ Limits on conditions + + See the [maximum values](/docs/alerts/new-relic-alerts/getting-started/minimum-maximum-values). +
+ Health status + + In order for a NRQL alert condition [health status display](/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/view-entity-health-status-find-entities-without-alert-conditions) to function properly, the query must be scoped to a single entity. To do this, either use a `WHERE` clause (for example, `WHERE appName = 'MyFavoriteApp'`) or use a `FACET` clause to scope each signal to a single entity (for example, `FACET hostname` or `FACET appName`). +
+ Examples + + For more information, see: + + * [Expected NRQL syntax](#syntax) + * [Examples of NRQL condition queries](#examples) +
+ +## Managing tags on conditions [#condition-edit] + +When you edit an existing NRQL condition, you have the option to add or remove tags associated with the condition entity. To do this, click the **Manage tags** button below the condition name. In the menu that pops up, add or delete a tag. + +## Condition edits can reset condition evaluation [#evaluation-resets] + +When you edit NRQL alert conditions in some specific ways (detailed below), their evaluations are reset, meaning that any evaluation up until that point is lost, and the evaluation starts over from that point. The two ways this will affect you are: + +* For "for at least x minutes" thresholds: because the evaluation window has been reset, there will be a delay of at least x minutes before any incidents can be reported. +* For [anomaly conditions](/docs/alerts-applied-intelligence/applied-intelligence/anomaly-detection/custom-anomalies/): the condition starts over again and all anomaly learning is lost. + +The following actions cause an evaluation reset for NRQL conditions: + +* Changing the query +* Changing the aggregation window, aggregation method, or aggregation delay/timer setting +* Changing the "close incidents on signal loss" setting +* Changing any gap fill settings +* Changing the anomaly direction (if applicable)- higher, lower, or higher/lower +* Change the threshold value, threshold window, or threshold operator +* Change the slide-by interval (on [sliding windows aggregation](/docs/alerts-applied-intelligence/new-relic-alerts/alert-conditions/create-nrql-alert-conditions/#sliding-window-aggregation) conditions only) + +The following actions (along with any other actions not covered in the above list) will **not** reset the evaluation: + +* Changing the loss of signal time window (expiration duration) +* Changing the time function (switching "for at least" to "at least once in," or vice-versa) +* Toggling the "open incident on signal loss" setting + +## Alert condition types [#threshold-types] + +When you create a NRQL alert, you can choose from different types of conditions: + + + + + + + + + + + + + + + + + + + + + + + +
+ NRQL alert condition types + + Description +
+ Static + + This is the simplest type of NRQL condition. It allows you to create a condition based on a NRQL query that returns a numeric value. + + Optional: Include a `FACET` clause. +
+ [Anomaly](/docs/alerts-applied-intelligence/applied-intelligence/anomaly-detection/custom-anomalies/) + (Dynamic anomaly) + + Uses a self-adjusting condition based on the past behavior of the monitored values. Uses the same NRQL query form as the static type, + including the optional `FACET` clause. +
+ +## Set the loss of signal threshold [#signal-loss] + + + The loss of signal feature requires a signal to be present before it can detect that the signal is lost. If you enable a condition while a signal is not present, no loss of signal will be detected and the loss of signal feature will not activate. + + +Loss of signal occurs when no data matches the NRQL condition over a specific period of time. You can set your loss of signal threshold duration and also what happens when the threshold is crossed. + +screenshot of signal loss options + +
+ Go to **[one.newrelic.com > All capabilities](https://one.newrelic.com/all-capabilities) > Alerts > Alert conditions (Policies)**, then **+ New alert condition**. Loss of signal is only available for NRQL conditions. +
+ +You may also manage these settings using the GraphQL API (recommended), or the REST API. Go here for specific [GraphQL API examples](/docs/alerts-applied-intelligence/new-relic-alerts/alerts-nerdgraph/nerdgraph-api-loss-signal-gap-filling). + + + **Loss of signal settings:** + + +Loss of signal settings include a time duration and a few actions. + +* + **Signal loss expiration time** + + * UI label: **Signal is lost after:** + * GraphQL Node: [expiration.expirationDuration](/docs/apis/nerdgraph/examples/nerdgraph-api-loss-signal-gap-filling/#loss-of-signal) + * Expiration duration is a timer that starts and resets when we receive a data point in the streaming alerts pipeline. If we don't receive another data point before your 'expiration time' expires, we consider that signal to be lost. This can be because no data is being sent to New Relic or the `WHERE` clause of your NRQL query is filtering that data out before it is streamed to the alerts pipeline. Note that when you have a faceted query, each facet is a signal. So if any one of those signals ends during the duration specified, that will be considered a loss of signal. + * The loss of signal expiration time is independent of the threshold duration and triggers as soon as the timer expires. + * The maximum expiration duration is 48 hours. This is helpful when monitoring for the execution of infrequent jobs. The minimum is 30 seconds, but we recommend using at least 3-5 minutes. +* + **Loss of signal actions** + + Once a signal is considered lost, you have a few options: + * Close all current open incidents: This closes all open incidents that are related to a specific signal. It won't necessarily close all incidents for a condition. If you're alerting on an ephemeral service, or on a sporadic signal, you'll want to choose this action to ensure that incidents are closed properly. The GraphQL node name for this is [`closeViolationsOnExpiration`](/docs/apis/nerdgraph/examples/nerdgraph-api-loss-signal-gap-filling/#loss-of-signal). + * Open new incidents: This will open a new incident when the signal is considered lost. These incidents will indicate that they are due to a loss of signal. Based on your incident preferences, this should trigger a notification. The graphQL node name for this is [`openViolationOnExpiration`](/docs/apis/nerdgraph/examples/nerdgraph-api-loss-signal-gap-filling/#loss-of-signal). + * When you enable both of the above actions, we'll close all open incidents first, and then open a new incident for loss of signal. + * Do not open "lost signal" incidents on expected termination. When a signal is expected to terminate, you can choose not to open a new incident. This is useful when you know that a signal will be lost at a certain time, and you don't want to open a new incident for that signal loss. The GraphQL node name for this is [`ignoreOnExpectedTermination`](/docs/apis/nerdgraph/examples/nerdgraph-api-loss-signal-gap-filling/#loss-of-signal). + + + In order to prevent a loss of signal incident from opening when "Do not open "lost signal" incident on expected termination", the tag `termination: expected` must be added to the entity. This tag tells us the signal was expected to terminate. See [how to add the tag directly to the entity](/docs/new-relic-solutions/new-relic-one/core-concepts/use-tags-help-organize-find-your-data/#add-tags). + + +To create a NRQL alert configured with loss of signal detection in the UI: + +1. Follow the [instructions to create a NRQL alert condition](/docs/alerts/create-alert/create-alert-condition/alert-conditions/#set-your-signal-behavior). +2. On the [Set thresholds step](/docs/alerts/create-alert/create-alert-condition/alert-conditions/#thresholds) you'll find the option to Add lost signal threshold. Click this button. +3. Set the signal expiration duration time in minutes or seconds in the **Consider the signal lost after** field. +4. Choose what you want to happen when the signal is lost. You can check any or all of the following options: **Close all current open incidents**, **Open new "lost signal" incident**, **Do not open "lost signal" incident on expected termination**. These control how loss of signal incidents will be handled for the condition. +5. You can optionally add or remove static/anomaly numeric thresholds. A condition that has only a loss of signal threshold and no static/anomaly numeric thresholds is valid, and it's considered a "stand alone" loss of signal condition. +6. Continue through the steps to save your condition. +7. If you selected "**Do not open "lost signal" incident on expected termination**", you must add the `termination: expected` tag to the entity to prevent a loss of signal incident from opening. See [how to add the tag directly to the entity](/docs/new-relic-solutions/new-relic-one/core-concepts/use-tags-help-organize-find-your-data/#add-tags). + + + You might be curious why you'd ever want to have bothOpen new "lost signal" incident and Do not open "lost signal" incident on expected termination set to true. Think of it like this: you always want to be notified when a signal is lost until the one time you know the signal is scheduled to stop and you don't want to be notified. In that case, you'd set both to true, and when you expect the signal to be lost, you'd add the `termination: expected` tag to the relevant entity. + + +Incidents open due to loss of signal close when: + +* the signal comes back. Newly opened lost signal incidents will close immediately when new data is evaluated. +* the condition they belong to expires. By default, conditions expire after 3 days. +* you manually close the incident with the **Close all current open incidents** option. + + + Loss of signal detection doesn't work on NRQL queries that use nested aggregation or sub-queries. + + +## Advanced signal settings [#advanced-signal] + +Screenshot showing advanced signal settings + +
+ When creating a NRQL alert condition, use the advanced signal settings to control [streaming alert data](/docs/alerts-applied-intelligence/new-relic-alerts/get-started/streaming-alerts-key-terms-concepts) and avoid false alarms. +
+ +When creating a NRQL condition, there are several [advanced signal settings](/docs/alerts-applied-intelligence/new-relic-alerts/get-started/your-first-nrql-condition/#advanced-signal-settings): + +* Aggregation window duration +* Sliding window aggregation +* Streaming method +* Delay/timer +* Fill data gaps +* Evaluation delay + +To read an explanation of what these settings are and how they relate to each other, see [Streaming alerts concepts](/docs/alerts-applied-intelligence/new-relic-alerts/get-started/streaming-alerts-key-terms-concepts). Below are instructions and tips on how to configure them. + +### Aggregation window duration [#window-duration] + +You can set the [aggregation window duration](/docs/alerts-applied-intelligence/new-relic-alerts/advanced-alerts/understand-technical-concepts/streaming-alerts-key-terms-concepts/#window-duration) to choose how long data is accumulated in a streaming time window before it's aggregated. You can set it to anything between 30 seconds and 120 minutes. The default is one minute. + +### Sliding window aggregation [#sliding-window-aggregation] + +You can use [sliding windows](/docs/query-your-data/nrql-new-relic-query-language/nrql-query-tutorials/create-smoother-charts-sliding-windows) to create smoother charts. This is done by creating overlapping windows of data. + +Learn how to set sliding windows in this short video (2:30 minutes): + +