-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dynamically disabled instrumentation capability #422
base: main
Are you sure you want to change the base?
Conversation
try { | ||
InstrumentationScopeInfo instrumentationScopeInfo = | ||
getInstrumentationScopeInfo(sdkTracer); | ||
TracerConfig tConfig = getTracerConfig(finalProvider, instrumentationScopeInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the tools we have available upstream, I can see why we need to do the config update this way, though I was wondering (not sure if it has been discussed) if instead of evaluating a field here, we could instead change that line by something like if (!tracerConfig.isEnabled()) {
instead? That way we might not need to keep a background thread alive constantly checking for config changes and applying them one by one by getting a new config and then resetting the tracerEnabled field as we could do the check reactively whenever spanBuilder
is called.
I guess a downside might be that, depending on what's done in the TracerConfig.isEnabled()
impl, it might cause some delays, though a good implementation shouldn't take long to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the delay concerns are a blocker, I think another alternative might be to make TracerConfig
observable and then make the SdkTracer
impl subscribe to changes in it, updating its tracerEnabled
field whenever TracerConfig notifies about a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delay concerns are a blocker
They are not a blocker. It does not need to be immediate, it's fine for the instrumentation to be disabled but that not take effect for a while. Also, adding an observer-subscription model doesn't change the delay, because the delay is from SdkTracer.tracerEnabled
being non-volatile. callbacks wouldn't change anything because the state change needs to propagate across threads and only making it volatile would speed that. I originally had it volatile in the PR that made it non-final, but the maintainers decided to go with a non-volatile implementation, at least for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
background thread alive constantly checking for config changes
The background thread checker is no on by default. I expect normal operation for something else - eg an OpAMP client - to do regular checking from central config (and maybe property and config file, to be decided) and update the instrumentation enablement from that. The background thread checker is there mainly so that testing can be done cleanly, though it is fully operational if someone wanted to use it in production
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not a blocker. It does not need to be immediate
Sorry I think I didn't properly explain what I meant by delay, I was more talking about the possibility that by changing this condition to always query the config by calling TracerConfig.isEnabled()
instead of evaluating a field, then it could cause a performance overhead to the instrumented app in case that the provided TracerConfig
impl was poorly created, for example, let's say a TracerConfig
impl reads a file every time its isEnabled
method is called, then that could block the span creation process for a little while, causing the host app performance to get affected every time it creates a span. On the other hand, if we decide to keep the field then I agree we should make it volatile, though I was more of the opinion of removing that field and instead evaluating the config on every span creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it makes sense what you mentioned about the OpAMP client triggering the update process when it gets a new config. I guess with the alternatives I mentioned I was also trying to make the "update process" something that couldn't be triggered from the outside since I'm not sure if it'd be desirable for any part of the code that might have access to the Otel instance (including via the GlobalOpenTelemetry singleton) to make config changes. However, if it's only a matter of adding the updateTracerConfigurations()
method without params, but only as a sort of "notify changes" signal to get the tracer to re-evaluate its existing config provider, then it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case that the provided
TracerConfig
impl was poorly created,
I tried to open the TracerConfig to be implementable by the user, but the maintainers preferred to keep it as is, essentially a boolean. So atm it's not possible for it to be inefficient
} | ||
} | ||
|
||
private static Object call(String methodname, Object target, Object arg1, Class<?> arg1Class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static Object call(String methodname, Object target, Object arg1, Class<?> arg1Class) { | |
private static <T> Object call(String methodname, Object target, T arg1, Class<? super T> arg1Class) { |
[Nit] introduces a minimal bit of type safety
static { | ||
if ("true".equals(System.getProperty(INSTRUMENTATION_DISABLE_OPTION + ".checker")) | ||
|| "true".equals(System.getenv("ELASTIC_OTEL_JAVA_DISABLE_INSTRUMENTATIONS_CHECKER"))) { | ||
new Thread(new OptionChecker(), "Elastic dynamic_instrumentation checker").start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will start a non daemon thread, which is a bad Idea IMO.
I'd suggest to do the following:
- Move the checker start logic into
DynamicInstrumentation.setTracerConfigurator
- Add the
ConfigProperties
as argument to that method: This means you don't have to check env variables / proeprties and to the conversion yourself - There, create a single-threaded daemon executor service with your periodic checker
- For a proper shutdown, I've used a trick before: On the
TracerProviderBuilder
, register a SpanProcessor which does nothing. It will still receive a shutdown event though, which can be used to properly shutdown the executor
} | ||
|
||
static class OptionChecker implements Runnable { | ||
private Map<String, Boolean> alreadyDisabled = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm is stateful due to carrying over alreadyDisabled
over the loop iterations.
I think this can be greatly simplified because we have that state already in another place, namely UpdatableConfigurator.map
:
What I would do is something like adding a boolean setOnlyDisabled(Set<String> scopeNames)
method to UpdateableConfigurator
which:
- Reenables all
TraceConfigs
inmap
which are not mentioned inscopeNames
- Makes all elements in
scopeNames
disabled inmap
As a result, you can now just pass the the desired target state (the list of disabled instrumentations coming form the config) to it periodically and dependent on the return value update the tracers.
The trace scoping in the SDK is still experimental at this point, so the relevant methods are non-public, hence the use of reflection to access them. Also, dynamically changing scope enablement is additionally experimental with the only additional support in SdkTracer.tracerEnabled being mutable (non-final). As it turns out, this is all that is needed to enable dynamically disabling instrumentation by scope, using reflection.