Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index_types for OTEL logs and metrics #3148 #3929

Merged
merged 20 commits into from
Oct 30, 2024

Conversation

juergen-walter
Copy link
Contributor

@juergen-walter juergen-walter commented Jan 9, 2024

Description

Add index_types for OTEL logs and metrics

Issues Resolved

Resolves #3148

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR. --> documentation included in this PR
    • New functionality has javadoc added --> not needed
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dlvenable
Copy link
Member

@juergen-walter , This is great, thanks for starting this draft! Have you compared this schema with the OpenSearch simple schema for observability?

I also don't see some of the fields from the logs or metrics in here. Are you using dynamic mapping for those?

We recently added support for composable index templates. There is a directly index-template under resources that has those. I meant to move the legacy templates into a new directory in the original PR for that, but didn't. I might go ahead and make that change to help make it clear that there are two sets of templates.

@juergen-walter
Copy link
Contributor Author

juergen-walter commented Jan 15, 2024

@dlvenable happy to hear your encouraging feedback. I shared early WIP to let you know I started working on it, still a lot to be improved.

I also don't see some of the fields from the logs or metrics in here. Are you using dynamic mapping for those?

Initially I just copied the index templates for trace spans, adjustments for metrics and traces still to be done. For the open source contribution in this PR I would align with the Simple Schema for Observability, thank you for the reminder.

move the legacy templates into a new directory

If you plan prepare or merge changes before this PR has been merged, it would be nice to ping me so I can update the PR

@KarstenSchnitter
Copy link
Collaborator

@dlvenable I am not so sure about the Simple Scheme for Observability. We are providing an OpenTelemetry endpoint, that should support generic data. I would expect support for the OpenTelemetry semantic conventions as a whole: https://github.com/open-telemetry/semantic-conventions.

I have run a small PoC, that transfers the YAML configuration from https://github.com/open-telemetry/semantic-conventions/tree/main/model into composable index templates. Each convention maps to one template. Certain assumptions must be made on those mappings. For the three signals traces, metrics and logs a template can be generated covering the base fields. All these templates can than be joined to form a full index pattern to be used. That allows to pick and choose, which conventions should be used. However, the full solution is not yet ready.

Copy link
Collaborator

@KarstenSchnitter KarstenSchnitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index templates should be improved. Most of the string value fields should be keywords to allow aggregations. And most of them will not have values requiring type "text". Probably all float values should be doubles as most integers should be longs.

@juergen-walter
Copy link
Contributor Author

Index templates should be improved. Most of the string value fields should be keywords to allow aggregations. And most of them will not have values requiring type "text". Probably all float values should be doubles as most integers should be longs.

I closed all the related conversations and pointed to the Simple Schema for Observability mappings I aligned with. I consider the comments by @KarstenSchnitter to be a valuable feedback/review of the simple schema mappings but I would try to avoid having the respective discussion in this PR.

@juergen-walter
Copy link
Contributor Author

Hi @dlvenable we would appreciate if we can include this into the next release, so we do not have to implement workarounds.
Can you share your plans on how to proceed?

Signed-off-by: Jürgen Walter <[email protected]>
Fixes testInstantiateSinkMetricsDefaultMetricSink

Alertnative would have been to adjust the test

Signed-off-by: Jürgen Walter <[email protected]>
Signed-off-by: Jürgen Walter <[email protected]>
Signed-off-by: Jürgen Walter <[email protected]>
Signed-off-by: Jürgen Walter <[email protected]>
Signed-off-by: Jürgen Walter <[email protected]>
Signed-off-by: Jürgen Walter <[email protected]>
This reverts commit 3fd61af.

Signed-off-by: Jürgen Walter <[email protected]>
dlvenable
dlvenable previously approved these changes Oct 7, 2024
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @juergen-walter ! The build is now passing. I think one Java 11 build has failed, but this is likely a flaky test.

Copy link
Collaborator

@KarstenSchnitter KarstenSchnitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to review the index templates. There are only minor issues. Please have a look at the text types within the instrumentation scope fields. My main question is, why the metrics and logs index templates are essentially doubled. Is there a way to have only one template for each signal?

"path": "observedTimestamp"
},
"traceId": {
"ignore_above": 256,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually hex encoded 16 bytes values, hence 32 characters. You can shorten the length to this value.

Suggested change
"ignore_above": 256,
"ignore_above": 32,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type": "keyword"
},
"spanId": {
"ignore_above": 256,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually hex encoded 8 bytes values, hence 16 characters. You can shorten the length to this value.

Suggested change
"ignore_above": 256,
"ignore_above": 16,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"kind": {
"type": "keyword",
"ignore_above": 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently only 5 kinds: SUM, GAUGE, HISTOGRAM, EXPONENTIAL_HISTOGRAM and SUMMARY. The longest has 21 characters. You could decrease this value severly.

Suggested change
"ignore_above": 128
"ignore_above": 24

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
"aggregationTemporality": {
"type": "keyword",
"ignore_above": 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently only 2 possible values for this or an empty value: AGGREGATION_TEMPORALITY_DELTA and AGGREGATION_TEMPORALITY_CUMULATIVE

Suggested change
"ignore_above": 128
"ignore_above": 40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"type": "boolean"
},
"startTime": {
"type": "date"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If observedTimestamp is data_nanos, why is this not the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"count": {
"type": "long"
},
"exemplar": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a nested field? I would also have expected some kind of value for the exemplar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exemplar": {
"properties": {
"time": {
"type": "date"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a date_nanos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"instrumentationScope": {
"properties": {
"name": {
"type": "text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, there is no benefit in having a text for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
},
"version": {
"type": "text",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, there is no benefit in having a text for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juergen-walter
Copy link
Contributor Author

Related discussion about aligning OpenTelemetry metrics index templates with Data Pepper opensearch-project/opensearch-catalog#197 related to the open points mentioned by @KarstenSchnitter

Comment on lines +198 to +217
"instrumentationScope": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 128
}
}
},
"version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Copy link
Contributor Author

@juergen-walter juergen-walter Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align with https://github.com/opensearch-project/opensearch-catalog/blob/8c8bf73a5239fb80792174eba4861b61076147cf/schema/observability/metrics/metrics-1.0.0.mapping#L227-L236

Suggested change
"instrumentationScope": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 128
}
}
},
"version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"instrumentationScope": {
"properties": {
"name": {
"type": "keyword",
"ignore_above": 256
},
"version": {
"type": "keyword",
"ignore_above": 256
},

@KarstenSchnitter KarstenSchnitter merged commit 569e3d1 into opensearch-project:main Oct 30, 2024
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add additional index_types
5 participants