Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Dangerous use of tracer in plugin/storage/grpc/factory.go #5971

Closed
yurishkuro opened this issue Sep 11, 2024 · 3 comments · Fixed by #6125
Closed

[jaeger-v2] Dangerous use of tracer in plugin/storage/grpc/factory.go #5971

yurishkuro opened this issue Sep 11, 2024 · 3 comments · Fixed by #6125
Labels
area/otel area/storage bug help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Sep 11, 2024

The factory is passed a tracer that it then passes to OTEL's ToClientConn() function, which automatically adds tracing instrumentation to gRPC connection. That means we may start generating new traces as we're trying to save spans to storage, thus causing an infinite loop of trace generation. The write path must always be devoid of tracing (or use very low sampling rate) to avoid this.

Related upstream issue open-telemetry/opentelemetry-collector#10663

@yurishkuro yurishkuro converted this from a draft issue Sep 11, 2024
@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Sep 11, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 12, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 12, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 12, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 18, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 18, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 18, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 24, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 24, 2024
ldlb9527 added a commit to ldlb9527/jaeger that referenced this issue Sep 24, 2024

This comment has been minimized.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro is this blocked on the upstream issue? or is there anything we can do right now to fix this?

@yurishkuro
Copy link
Member Author

@mahadzaryab1 there is unfinished PR 5979. It's a bit hacky solution.

I would've preferred for this to be fixed upstream, but that ticket did not see much traction. We could just propose a new hook in OTEL to be usable from binaries like Jaeger, which would allow us to customer the tracer somehow, e.g. by using it with Jaeger Remote Sampler SDK extension which would allow to configure undesired endpoints with 0 probability.

Another option we could try to use is to inject a noop span into the context when the execution comes to places that we do not want to be traced - this will at least prevent downstream tracing into storage writes, but will not help with say healthcheck endpoint tracing (tbh that one could be fixed as a configuration option in the extension itself, to say "disable tracing" - assuming it's an issue today).

yurishkuro pushed a commit that referenced this issue Oct 27, 2024
<!--
!! Please DELETE this comment before posting.
We appreciate your contribution to the Jaeger project! 👋🎉
-->

## Which problem is this PR solving?
- Fixes #5971 
- Towards #6113 and #5859

## Description of the changes
- This PR fixes an issue where the GRPC remote storage client was
provided a tracer which was resulting in an infinite loop of trace
generation. This infinite loop would happen when we would try to write a
trace to storage which would generate a new trace that needed to be
written and so on. This PR provides a fix for this by using a noop
tracer for the writer clients so that we do not generate traces on the
write paths but still do so when reading.
- This is likely just a temporary fix and we'll want to monitor
open-telemetry/opentelemetry-collector#10663
for a better long-term fix.

## How was this change tested?
- Added the healthcheck endpoint which was previously failing in #6113.

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

## Co-Authors 
This PR is a continuation of
#5979
Co-authored-by: cx <[email protected]>

---------

Signed-off-by: Mahad Zaryab <[email protected]>
@github-project-automation github-project-automation bot moved this from GA blocking to Done in Jaeger V2 Oct 27, 2024
yurishkuro added a commit that referenced this issue Oct 29, 2024
## Which problem is this PR solving?
- Resolves #5859

## Description of the changes
- Use default health check extension
- Currently it causes the test to go into infinite loop because of
recursive tracing due to #5971

---------

Signed-off-by: Yuri Shkuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/otel area/storage bug help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
Status: Done
2 participants