Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document metrics temporal aggregation #63

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Dec 6, 2024

Document current recommended setting for metric temporal aggregation.

checklist

  • document current limitation for histograms and how to configure SDK
  • document how and when use cumulativetodelta in collector

Comment on lines -3 to -11
The Elastic Distribution of the OpenTelemetry Collector has the following limitations:

- Because of an upstream limitation, `host.network.*` metrics aren't present from the OpenTelemetry side.
- `process.state` isn't present in the OpenTelemetry host metric. It's set to a dummy value of **Unknown** in the **State** column of the host processes table.
- The Elasticsearch exporter handles the resource attributes, but **Host OS version** and **Operating system** may show as "N/A".
- The CPU scraper needs to be enabled to collect the `systm.load.cores` metric, which affects the **Normalized Load** column in the **Hosts** table and the **Normalized Load** visualization on the host detailed view.
- The [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) doesn't support CPU and disk metrics on MacOS. These values will stay empty for collectors running on MacOS.
- The console shows error Log messages when the [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) can't access some of the process information due to permission issues.
- The console shows mapping errors initially until mapping occurs.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] moved to a more general docs/limitations.md file.

@SylvainJuge SylvainJuge marked this pull request as ready for review December 9, 2024 13:49
docs/limitations.md Outdated Show resolved Hide resolved
- The console shows error Log messages when the [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) can't access some of the process information due to permission issues.
- The console shows mapping errors initially until mapping occurs.

## Metrics temporal aggregation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is looking at the issue from an SDK point of view. For example, the OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE environment variable is only relevant to SDKs. Maybe point that out a little more clearly.

docs/limitations.md Show resolved Hide resolved
docs/limitations.md Outdated Show resolved Hide resolved
docs/limitations.md Outdated Show resolved Hide resolved
docs/limitations.md Outdated Show resolved Hide resolved
docs/limitations.md Outdated Show resolved Hide resolved
Setting `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta` should allow to configure SDKs to change the default value.
(see [reference](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md#environment-variables) on supported SDKs).

In the case were the producer of `counter` or `histogram` metrics can't be configured with `delta preferred` behavior to report them with `delta`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should recommend to convert counter metrics to delta temporality. There are challenges with visualizing counter metrics today but I'm not sure if it's always worth doing a cumulative to delta conversion to avoid it. Plus, ES|QL will be enhanced with better support for counter rates. What remains is that queries need to be aware of the temporality of the metrics.
Also, some of our default dashboards for k8s do expect counters to be sent in the default cumulative temporality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histograms is the only reason we are asking for "delta preferred", because we can't store them otherwise.

Doing this change has a side effect all the counters visualizations because we don't have a nice way to handle that with a separate setting, but maybe ES|QL support would be enhanced to allow that, but so far we don't have any ETA.

If we already have dashboards that rely on cumulative counters, then we need to not apply this conversion, which means we need to instruct users to apply this conversion to some metrics and not others, which brings another layer of complexity to the end-user.

In a sense, what we need is to apply cumulativetodeltaprocessor only for cumulative histograms at the collector level, ideally close to the edge, or even better allow SDKs to apply this only to histograms without changing anything for counters because it would break the visualizations.

I wonder if contributing the ability to set the time aggregation per metric type would be less painful that the path we are trying to take here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants