Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAMEL-21202 - Add a ThreadPoolFactory to propagate OpenTelemetry contexts #15496

Merged
merged 5 commits into from
Oct 3, 2024

Conversation

ammachado
Copy link
Contributor

Description

Add a ThreadPoolFactory to propagate OpenTelemetry contexts

Target

  • I checked that the commit is targeting the correct branch (note that Camel 3 uses camel-3.x, whereas Camel 4 uses the main branch)

Tracking

  • If this is a large change, bug fix, or code improvement, I checked there is a JIRA issue filed for the change (usually before you start working on it).

Apache Camel coding standards and style

  • I checked that each commit in the pull request has a meaningful subject line and body.
  • I have run mvn clean install -DskipTests locally from root folder and I have committed all auto-generated changes.

Copy link
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟

🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run

  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot.

  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.

  • Build and test logs are available in the Summary page. Only Apache Camel committers have access to the summary.

  • ⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@davsclaus
Copy link
Contributor

Do you mind create a JIRA for these kind of PRs and link to the JIRA in the commit message.

@davsclaus davsclaus marked this pull request as draft September 11, 2024 11:15
@davsclaus
Copy link
Contributor

Do you have any links/docs about otel and wrapping these threads is a best practice ?
The threads are reused by other Camel Exchanges and as such I wonder how otel manages that.

@ammachado ammachado changed the title Add a ThreadPoolFactory to propagate OpenTelemetry contexts CAMEL-21202 - Add a ThreadPoolFactory to propagate OpenTelemetry contexts Sep 11, 2024
@ammachado
Copy link
Contributor Author

Do you have any links/docs about otel and wrapping these threads is a best practice ? The threads are reused by other Camel Exchanges and as such I wonder how otel manages that.

https://opentelemetry.io/docs/languages/java/instrumentation/#context-propagation-between-threads

@davsclaus
Copy link
Contributor

Do you have any links/docs about otel and wrapping these threads is a best practice ? The threads are reused by other Camel Exchanges and as such I wonder how otel manages that.

https://opentelemetry.io/docs/languages/java/instrumentation/#context-propagation-between-threads

Yes but would that not be wrong as the thread is reused by Camel later to process other exchanges and then the thread has stale data from the previous exchanges.

@ammachado
Copy link
Contributor Author

I replaced the implementation with wrapping ExecutorServices. Should be fine now.

@davsclaus
Copy link
Contributor

I replaced the implementation with wrapping ExecutorServices. Should be fine now.

I fail to understand that, because that only works for a task/runnable that are isolated.

You do this generally for all of Camel's thread pool, and threads such as a JMS listener will then use the same task/runnable but to process all the incoming JMS messages, and each of those JMS messages is a different Exchange and span id in that otel world.

@ammachado
Copy link
Contributor Author

Yes, @davsclaus, you are correct. I understand the issue, but this PR is handling only thread pools that are created by Camel itself. I don't have enough time/resources to test all components, but I'll be happy to help as much as possible.

Regarding the exchanges and it's span ids, this is already / should be handled by org.apache.camel.opentelemetry.OpenTelemetryTracer, and org.apache.camel.tracing.SpanDecorator component-specific implementations. New spans are created by default for all components which have a SpanDecorator implementation, except for LogComponent.

@johnpoth
Copy link
Member

@davsclaus IMO we would need a new SPI for ThreadPoolFactory in Camel to allow us to grab the current Exchange the Thread is running on. This would allow camel-opentelemetry to do something like this before every thread:

    @Override  
        public Future<?> submit(Runnable task, Exchange exchange) {  
        Context context = exchange.getProperty(ExchangePropertyKey.ACTIVE_SPAN, io.opentelemetry.context.Context.Context.class)
        return delegate().submit(context.wrap(task));  
    }

That way, other libraries in the Thread know what context to grab. Most libraries do a Context.current() which relies on ThreadLocal storage...

@davsclaus
Copy link
Contributor

Hmm I think this is somewhat wrong with these thread pool.

If the purpose is to ensure the otel context is correct for the given exchange and to store/get that via thread locals, then look at MDC and the MDCUnitOfWork it is doing this kind of stuff.

It has callbacks for before/after and coupled with the Exchange processing by Camel which allows better to setup/cleanup otel context accordingly.

@johnpoth
Copy link
Member

Ah yeah the camel-opentelemetry is doing that through CamelEvents but perhaps the UnitOfWork API is more suited, will check it out

@johnpoth
Copy link
Member

@davsclaus are we guaranteed to be called on the same Thread though in the UnitOfWork hooks ? Unfortunately OpenTelemetry requires it [1]

[1] https://github.com/open-telemetry/opentelemetry-java/blob/main/context/src/main/java/io/opentelemetry/context/ThreadLocalContextStorage.java#L48

@davsclaus
Copy link
Contributor

@davsclaus are we guaranteed to be called on the same Thread though in the UnitOfWork hooks ? Unfortunately OpenTelemetry requires it [1]

[1] https://github.com/open-telemetry/opentelemetry-java/blob/main/context/src/main/java/io/opentelemetry/context/ThreadLocalContextStorage.java#L48

Yes but only when you use before/afterProcess callbacks and turn this on (like MDC does)

I think this was implemented for the same reason that MDC also needs callback for before/after on the same thread.

PS: It may be that Camel in transacted mode does not execute before/after processor (we can add this if needed) but transacted is forced executing synchronously anyway by Camel.

@davsclaus
Copy link
Contributor

Be mindful that before/after process is executed for every step (EIP i.e. Processor) in the route.

@johnpoth
Copy link
Member

Hi @davsclaus thanks for your insights ! After some testing I don't think this is enough... Ideally OpenTelemetry wants a hook when a thread starts some work and another hook when the work is done ( everything in the same thread):

     Context context = exchange.getProperty(ExchangePropertyKey.ACTIVE_SPAN, io.opentelemetry.context.Context.Context.class);
      Scope scope = context.makeCurrent();
<!---- Thread starts work ---->
   
     
<!---- Thread ends work ---->
     scope.close();

Naively using the before/after process hooks in the UnitOfWorkAPI I get some errors like :

Thread [Camel (camel-6) thread #36 - seda://c] opened scope, but thread [Camel (camel-6) thread #44 - Delay] closed it

closed here: at org.apache.camel.opentelemetry.OpenTelemetryUnitOfWork.afterProcess(OpenTelemetryUnitOfWork.java:98) ~[classes/:?]

opened here: at org.apache.camel.opentelemetry.OpenTelemetryUnitOfWork.beforeProcess(OpenTelemetryUnitOfWork.java:86) ~[classes/:?]

For MDC to achieve this when in Multicast with parallel processing, it seems some more code was added in camel-core [1]

[1] https://issues.apache.org/jira/browse/CAMEL-16378

@johnpoth
Copy link
Member

Having to use this ThreadLocal stuff seems like a tough requirement IMO so I'm investigating if we can remove it [1] [2]

[1] https://github.com/johnpoth/camel/commits/opentelemetry-rewrite/
[2] open-telemetry/opentelemetry-java#6584

@davsclaus
Copy link
Contributor

yes thread locals and MDC is poor in a modern multi threaded world. However its fine if there is a few situations like multicast + parallel its not work (just document this). We just need it to work good for 99% use-cases.
And we can then see if we can add some special hook in mutlicast parallel mode to make it work better for this use-case. otel is more important than legacy MDC logging

@davsclaus
Copy link
Contributor

We can expose beforeExecute/afterExecute from ThreadPoolExecutor that should make this easier. And its also more correct as these callbacks are invoked just before the runnable is being executed by the current thread.

Then we can find a nicer way to do this without having to wrap every thread pool executor implementation in Camel.

@davsclaus
Copy link
Contributor

We can expose beforeExecute/afterExecute from ThreadPoolExecutor that should make this easier. And its also more correct as these callbacks are invoked just before the runnable is being executed by the current thread.

Then we can find a nicer way to do this without having to wrap every thread pool executor implementation in Camel.

One caveat with that is the before and after is executed when the task is being processed, and the original thread that added the task may be doing something else, and the need for this context propagation may be too late. However this would only be in cases where the thread pool is busy and tasks and filled up in the backlog queue.

@davsclaus
Copy link
Contributor

This PR
#15679

Makes it possible to build a hook into newThread so all threads can be wrapped. However newThread is only used in a few places in Camel for auxilliary threads such as a JMX remote listener port, and for camel-main to trigger shutdown if duration was hit. So its not really in use for regular standard Camel apps. But this allows us to cover 100%, and also a hook for future use.

@ammachado
Copy link
Contributor Author

I will rebase this PR against #15679.

@davsclaus
Copy link
Contributor

@ammachado is there more work, or can this PR be marked ready

@ammachado
Copy link
Contributor Author

👍🏻 Ready to go.

@davsclaus davsclaus marked this pull request as ready for review October 2, 2024 17:14
@davsclaus
Copy link
Contributor

davsclaus commented Oct 2, 2024

/component-test camel-opentelemetry

Result ✅ The tests passed successfully

Copy link
Contributor

github-actions bot commented Oct 2, 2024

🤖 The Apache Camel test robot will run the tests for you 👍

@davsclaus davsclaus merged commit 20ad8f7 into apache:main Oct 3, 2024
4 checks passed
davsclaus pushed a commit that referenced this pull request Oct 3, 2024
…exts (#15496)

* CAMEL-21202 - Add a ThreadPoolFactory to propagate OpenTelemetry contexts

Signed-off-by: Adriano Machado <[email protected]>

* CAMEL-21202 - Adding missing implementation for SheduledExecutorService context propagation

* CAMEL-21202 - Adding the instrumented thread pool factory to the tests

* Add strict checking for OpenTelemetry contexts

* Add ThreadFactoryListener implementation for OTEL context propagation

---------

Signed-off-by: Adriano Machado <[email protected]>
davsclaus pushed a commit that referenced this pull request Oct 3, 2024
…exts (#15496)

* CAMEL-21202 - Add a ThreadPoolFactory to propagate OpenTelemetry contexts

Signed-off-by: Adriano Machado <[email protected]>

* CAMEL-21202 - Adding missing implementation for SheduledExecutorService context propagation

* CAMEL-21202 - Adding the instrumented thread pool factory to the tests

* Add strict checking for OpenTelemetry contexts

* Add ThreadFactoryListener implementation for OTEL context propagation

---------

Signed-off-by: Adriano Machado <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants