-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document metrics temporal aggregation #63
base: main
Are you sure you want to change the base?
document metrics temporal aggregation #63
Conversation
The Elastic Distribution of the OpenTelemetry Collector has the following limitations: | ||
|
||
- Because of an upstream limitation, `host.network.*` metrics aren't present from the OpenTelemetry side. | ||
- `process.state` isn't present in the OpenTelemetry host metric. It's set to a dummy value of **Unknown** in the **State** column of the host processes table. | ||
- The Elasticsearch exporter handles the resource attributes, but **Host OS version** and **Operating system** may show as "N/A". | ||
- The CPU scraper needs to be enabled to collect the `systm.load.cores` metric, which affects the **Normalized Load** column in the **Hosts** table and the **Normalized Load** visualization on the host detailed view. | ||
- The [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) doesn't support CPU and disk metrics on MacOS. These values will stay empty for collectors running on MacOS. | ||
- The console shows error Log messages when the [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) can't access some of the process information due to permission issues. | ||
- The console shows mapping errors initially until mapping occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for reviewer] moved to a more general docs/limitations.md
file.
- The console shows error Log messages when the [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) can't access some of the process information due to permission issues. | ||
- The console shows mapping errors initially until mapping occurs. | ||
|
||
## Metrics temporal aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is looking at the issue from an SDK point of view. For example, the OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
environment variable is only relevant to SDKs. Maybe point that out a little more clearly.
Setting `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta` should allow to configure SDKs to change the default value. | ||
(see [reference](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md#environment-variables) on supported SDKs). | ||
|
||
In the case were the producer of `counter` or `histogram` metrics can't be configured with `delta preferred` behavior to report them with `delta` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should recommend to convert counter metrics to delta temporality. There are challenges with visualizing counter metrics today but I'm not sure if it's always worth doing a cumulative to delta conversion to avoid it. Plus, ES|QL will be enhanced with better support for counter rates. What remains is that queries need to be aware of the temporality of the metrics.
Also, some of our default dashboards for k8s do expect counters to be sent in the default cumulative temporality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Histograms is the only reason we are asking for "delta preferred", because we can't store them otherwise.
Doing this change has a side effect all the counters
visualizations because we don't have a nice way to handle that with a separate setting, but maybe ES|QL support would be enhanced to allow that, but so far we don't have any ETA.
If we already have dashboards that rely on cumulative counters, then we need to not apply this conversion, which means we need to instruct users to apply this conversion to some metrics and not others, which brings another layer of complexity to the end-user.
In a sense, what we need is to apply cumulativetodeltaprocessor
only for cumulative histograms at the collector level, ideally close to the edge, or even better allow SDKs to apply this only to histograms without changing anything for counters because it would break the visualizations.
I wonder if contributing the ability to set the time aggregation per metric type would be less painful that the path we are trying to take here.
Document current recommended setting for metric temporal aggregation.
checklist
cumulativetodelta
in collector