-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(containerd-shim-wasm): add OpenTelemetry tracing library and feature #582
Conversation
8a3f861
to
a00f739
Compare
4d578ab
to
d9f1fe7
Compare
Could you please take a look on this PR 🙏🏻 @devigned @jsturtevant @jprendes @cpuguy83 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, awesome to see that this has already been used to solve our startup perf!
use containerd_shim_wasmtime::WasmtimeInstance; | ||
|
||
fn main() { | ||
shim_main::<WasmtimeInstance>("wasmtime", version!(), revision!(), "v1", None); | ||
#[tokio::main] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I thought we were injecting the async runtime into the .install_batch(runtime::Tokio)
. Similar to how we do it for the containerd client code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unfortunately opentelemetry SDK in rust all require tokio runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar enough with rust async, will this cause any issues with shim or is it ok to sprinkle async in various areas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: I think it's fine, but it makes me feel a bit uncomfortable.
This could be a problem with runtimes that also use tokio. I believe it is not as it is here, but we need to be careful.
When I attempted to convert everythign to async, I had the problem that the wasm runtime was starting a tokio runtime. But that code was already running inside shim's tokio runtime. Tokio does not allow nesting runtimes. The solution to this is to spawn a new thread before calling the run_wasi
method and block/wait for that thread to finish. The new thread get a clean new stack and can initialize a new tokio runtime. This is ok because the new thread is not managed by tokio (hence, not nested tokio runtimes), and is a documented behaviour. But we pay the cost of spawning a thread.
When the shim runs in async, and when the process forked by youki, the executed code is in a state where it believes it's inside an async runtime (hence we can't start a runtime again), but this is not actually true. After the forking, the code believe everything stays the same, but it's become the entry-point of a new process and can't be suspended anymore. I think we could add support for this at a youki
level (don't quote me, I haven't looked too much into it).
Now, I believe in this case it's OK, as we are using the sync
version of containerd-shim
, which creates a thread pool to handle the ttrpc requests, which has the same result as the workaround I mentioned before.
Did a rebase of the PR and resolved all the comments. Could you please take another look, @cpuguy83 , @jsturtevant 🙏? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
this commit adds otel collector APIs and a new opentelemetry feature to the wasm shim Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
…n shim_main Signed-off-by: jiaxiao zhou <[email protected]>
this commit adds a new env var OTEL_EXPORTER_OTLP_PROTOCOL to configure different types of OTLP protocols such as grpc and http/protobuf by default, it uses http/protobuf Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
this commit adds the traces specific env vars to the otel module. It also refactors the code and adds unit tests. Signed-off-by: jiaxiao zhou <[email protected]>
this commit does a few things to refactor the opentelemetry codebase 1. rename otel funcitons 2. bump deps of otel 3. move tokio runtime to the crate Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
this commit removes ConfigBuilder and adds build_from_env to Config and renames shim_main_with_otel to shim_main Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Signed-off-by: jiaxiao zhou <[email protected]>
Going to merge this PR in and I will work on the follow ups
Thanks everyone for reviewing and commenting! 🥂 |
This PR introduces OpenTelemetry feature and new APIs on the core crate to add tracing capabilities to the shim.
OTEL_EXPORTER_OTLP_ENDPOINT
environment variable to determine if the shim should be started with OpenTelemetry tracing.It builds on top of #564