Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding disk buffering, part 4 #221

Merged
merged 15 commits into from
Mar 4, 2024

Conversation

LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented Jan 12, 2024

This is the final PR to get the first iteration of the disk buffering feature fully working.

The changes added here do the following:

  • Providing config option to set custom ExportScheduleHandlers.
  • Using DefaultExportScheduleHandler if none is set.
  • Wrapping the final version (after customizers) of the SpanExporter into a SpanDiskExporter when disk buffering is enabled.
  • Created SignalDiskExporter to handle the reading and exporting processes of all signals.
  • Exporting all signals inside DefaultExportScheduler.

@@ -413,6 +436,7 @@ private SdkTracerProvider buildTracerProvider(SessionId sessionId, Application a
.addSpanProcessor(new SessionIdSpanAppender(sessionId));

SpanExporter spanExporter = buildSpanExporter();
initializationEvents.spanExporterInitialized(spanExporter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having issues to verify that the right type of exporter (the disk exporter) was properly set in the unit tests, so one of the least ugly ways I could manage to do so was by adding this new function to the InitializationEvents interface and also a package-private setter for it below in this class (to set a mock impl from the tests). I'm open to better alternatives I might have missed 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe initializationEvents takes spanExporter in the ctor? (L331) so if you can provide/mock the spanExporter, you don't need a setter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, instead of creating an init event for the span exporter, there was an init event for the disk buffering being initialized? That's really the feature here, and then you can assert that it was called I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, also cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, instead of creating an init event for the span exporter, there was an init event for the disk buffering being initialized? That's really the feature here, and then you can assert that it was called I guess?

I see what you mean, although the initialization of the disk buffering tool doesn't say anything about the feature actually being used in the OpenTelemetry instance that has been built here, which I believe should be what we verify to guarantee that the RUM config is honored, and all those pieces are out of the scope of the disk buffering feature.

Copy link
Contributor Author

@LikeTheSalad LikeTheSalad Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe initializationEvents takes spanExporter in the ctor? (L331) so if you can provide/mock the spanExporter, you don't need a setter.

Maybe a setter could work, the thing about the constructor is that it needs to be called way before the exporter is created. But also it would mean that we need to cast the impl to SdkInitializationEvents which is essentially just dead code rn so I'm not sure we should rely on that impl for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see the issue, since GitHub collapsed the code diff, I thought that these changes and the changes on L331 were a single method, so you could have used a ctor, my bad.

@@ -43,4 +44,9 @@ public void slowRenderingDetectorInitialized() {
public void crashReportingInitialized() {
// TODO: Build me "crashReportingInitialized"
}

@Override
public void spanExporterInitialized(SpanExporter spanExporter) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR though when I was changing this class I thought that maybe we don't need this implementation of InitializationEvents and instead we can just add a public setter in this class so that, if a user needs to enable initialization events, they can just set their own implementation in there? That way we wouldn't need to define this config flag either, as the default impl would be the noop one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I set this up initially with the expectation that most users will want the InitializationEvents to be emitted by default. This can help o11y practitioners to understand how slowly/quickly the otel android sdk starts up and is also super important in understanding which features were enabled for a user session after the fact.

It probably doesn't help that the existing implementation is just a stub. I think I would not want 20 different users to have 20 different ways of formulating these various events. In other words, ideally one day in the future we will have semconv for these events, and I'd hate for opentelemetry-android users to have to go read the semconv in order to build an implementation themselves. We should provide that implementation for them, even if we do decide to default these to "off" (no-op). Users that want them shouldn't have to build it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't help that the existing implementation is just a stub. I think I would not want 20 different users to have 20 different ways of formulating these various events. In other words, ideally one day in the future we will have semconv for these events, and I'd hate for opentelemetry-android users to have to go read the semconv in order to build an implementation themselves. We should provide that implementation for them, even if we do decide to default these to "off" (no-op). Users that want them shouldn't have to build it.

Definitely having a default impl should be the way to go, although it's quite pointless having a stub for now that also cannot be overridden (since it can also be a great tool for unit tests). I would prefer to provide a default implementation returned by a InitializationEvents.getDefault() static function and a setter in OpenTelemetryRumBuilder that users (and ourselves from unit tests) could call to enable this feature, since it would be set to the noop impl by default, which essentially disables it. Rather than having to create a config flag for it, generally speaking I don't like flags, they can be very useful for small things but are not good for scalability. I also haven't seen them used in OTel Java for example, is mostly interfaces, setters for their implementations, and implementations already created in the same repo that are useful for most use cases and ready to be used so that users can take advantage of them easily.

@LikeTheSalad LikeTheSalad marked this pull request as ready for review January 12, 2024 16:53
@LikeTheSalad LikeTheSalad requested a review from a team January 12, 2024 16:53
return atLeastOneWorked
}

class Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, Builders in Kotlin are a bit odd (and have downsides) since the language provides named and default parameters.
https://www.baeldung.com/kotlin/builder-pattern for context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there are a lot of cases where Kotlin's features avoid the need to use builders. In this case, I needed a way to construct the SignalDiskExporter instance by "setting" each signal exporter as needed along the RUM initialization process, so the builder was helpful to have a mutable object and change it in different places before constructing the final one. Though I still reckon is not too Kotlin-y so I'm open to suggestions that allow the same 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems related to this comment from @breedx-splk, I haven't taken a deep look yet but probably what he suggests might avoid the need to use a builder altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at createInstance, all you need is a data class with optional parameters, so you create the instance with all the parameters at once, if they are null or not it's not important, the final object is created at the very end and immutable, it's only important to construct all the parameters beforehand which you have to do with the builder anyway.

val instance =
            createInstance(
                spanDiskExporter,
                metricDiskExporter,
                logRecordDiskExporter,
                timeoutInMillis,
            )

becomes

val instance = SignalDiskExporter(
  spanDiskExporter = spanDiskExporter,
  metricDiskExporter = metricDiskExporter,
  ..
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Java OTel SDK has builders everywhere, maybe still using builders makes sense, so it's not different everywhere, your call here, just wanted to share the Builders is not a Kotlin-friendly pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at createInstance, all you need is a data class with optional parameters, so you create the instance with all the parameters at once, if they are null or not it's not important, the final object is created at the very end and immutable, it's only important to construct all the parameters beforehand which you have to do with the builder anyway.

Are you referring to the function createInstance within SignalDiskExporterTest? Because if that's the case, I agree, a builder is unnecessary there. My concern is with OpenTelemetryRumBuilder where the span exporter is created in buildSpanExporter which is called from buildTracerProvider, which is called from build where the OpenTelemetry instance is created, and the setup of SignalDiskExporter is done right after. The way the OpenTelemetry instance is built is by setting each signal up independently, which involves a couple of steps, so for the spans one, we need to set a tracer provider with a span processor with the span exporter, and the last one in that chain (the exporter setup) is where we can attach a disk exporter. There are many functions involved in setting the span signal up, and I'm guessing the same will happen with the other 2 signals, so in this scenario is handy to have a place to collect each disk exporter until they're all ready to create the SignalDiskExporter instance. It doesn't necessarily need to be called a builder though, but it's essentially what it is so it made sense. I'm not sure if Kotlin has another name for it or a more Kotlin-y way to handle these cases. I'm thinking that maybe after merging this PR we might get other options to address this without a builder though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, yeah I was looking at the createInstance as an example, just that you could replicate the same approach but got your point now, thanks.
I don't know the OTel codebase at all, just getting used to it by doing reviews and skimming the code once in a while so feel free to ignore me when I tell nonsense :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I might miss a lot of things when creating these PRs too, especially recent features added to Kotlin so I appreciate any feedback, in this case, we could actually get rid of the builder is just that there might be different ways to create this instance and there's not a clear "best one" to me, plus it might change after this PR gets merged. I like these kinds of conversations, although I'm looking forward to be able to go through them in an Android SIG otherwise it's a lot of typing 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Haha

Comment on lines 24 to 27
var didExport = exporter.exportBatchOfEach()
while (didExport) {
didExport = exporter.exportBatchOfEach()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can ever be an infinite loop due to a bug, not a big fan of such a while (possibleAlwaysTrue) but if there are tests in place, all good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one reason I see it might run for a long time is if there are a lot of signals in the disk and the internet connectivity is very slow, so each loop might take some time which would cause the whole data to take quite long to finish getting exported, but it shouldn't run forever as exportBatchOfEach only returns true if there's data available and it also gets exported successfully, so it's actually quite difficult for this loop to run uninterrupted.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exciting to have things getting so close....have some questions along the way tho.

@@ -43,4 +44,9 @@ public void slowRenderingDetectorInitialized() {
public void crashReportingInitialized() {
// TODO: Build me "crashReportingInitialized"
}

@Override
public void spanExporterInitialized(SpanExporter spanExporter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I set this up initially with the expectation that most users will want the InitializationEvents to be emitted by default. This can help o11y practitioners to understand how slowly/quickly the otel android sdk starts up and is also super important in understanding which features were enabled for a user session after the fact.

It probably doesn't help that the existing implementation is just a stub. I think I would not want 20 different users to have 20 different ways of formulating these various events. In other words, ideally one day in the future we will have semconv for these events, and I'd hate for opentelemetry-android users to have to go read the semconv in order to build an implementation themselves. We should provide that implementation for them, even if we do decide to default these to "off" (no-op). Users that want them shouldn't have to build it.

Comment on lines 459 to 465
SpanDiskExporter diskExporter =
createDiskExporter(defaultExporter, diskBufferingConfiguration);
if (signalDiskExporterBuilder == null) {
signalDiskExporterBuilder = SignalDiskExporter.builder();
}
signalDiskExporterBuilder.setSpanDiskExporter(diskExporter);
spanExporter = diskExporter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated. It introduces a new field so that we can optionally create it in order to set the diskExporter here, which is really just a side effect that gets leveraged inside of the setUpDiskBuffering() method.

I think you should be able to get away with removing the new field and then factor out this chunk into a new method called scheduleDiskReader() or something, which does what the setUpDiskBuffering() method basically does. I think it will simplify some things, and this part will read more like a story:

  • If the disk buffering is turned on
    • create the disk exporter
    • schedule the disk reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the thing is that, by the time setUpDiskBuffering is called (now renamed to scheduleDiskTelemetryReader), the OpenTelemetry instance has already been created, so we'd lose the opportunity to set this disk exporter during the OTel initialization because it's immutable once it's all done (afaik), so the reason why I'm currently setting it up here is so that we can intercept the final, "normal" exporter before it's set into the span processor.

Maybe after this PR gets merged it won't need to be so complicated though. However, there's one more thing to be aware of regarding this part: If the disk buffering is turned on which is that even though the flag is enabled there's no guarantee that the buffering will actually work because there might be an issue when initializing the disk exporters (due to IO errors), causing the feature to get disabled even if it was requested to get enabled. I think @marandaneto mentioned something about this in a previous PR where we were pondering what would happen if some feature fails to get initialized during the RUM setup.

I guess a workaround that would allow to initialize the disk buffering exporters after the OpenTelemetry instance is created could be to create an exporter implementation that always wrapped the final exporter and would dynamically delegate to either the disk exporter or the "normal" exporter provided during the RUM setup depending on the config.

@@ -413,6 +436,7 @@ private SdkTracerProvider buildTracerProvider(SessionId sessionId, Application a
.addSpanProcessor(new SessionIdSpanAppender(sessionId));

SpanExporter spanExporter = buildSpanExporter();
initializationEvents.spanExporterInitialized(spanExporter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, instead of creating an init event for the span exporter, there was an init event for the disk buffering being initialized? That's really the feature here, and then you can assert that it was called I guess?

/** Configuration for disk buffering. */
public final class DiskBufferingConfiguration {
private final boolean enabled;
private final int maxCacheSize;
private final ExportScheduleHandler exportScheduleHandler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is problematic because it introduces an action class as a field of a configuration data class. It mixes up dumb/simple config with things that do work....and I think that's a troublesome path.

What are the use cases that you can think of for allowing a user to pass in their own ExportScheduleHandler instance to the DiskBufferingConfiguration? I thought that maybe it was to specify their own schedule/period, but the way the code is structured now I'm not even sure that's possible.

If that was the main intent, I'd rather that the period be the configurable thing, and the builders for the action classes can do the building by using the config, not by having the config do the building itself. Make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the use cases that you can think of for allowing a user to pass in their own ExportScheduleHandler instance to the DiskBufferingConfiguration?

The default "scheduler" only works while the app is running and I'm guessing some users would like to attempt exporting data even if their app isn't running by using WorkManager for example.

Though I agree that the config should generally be made of data instead of functions or the like, maybe since the disk buffering feature is needed to initialize RUM, we could categorize it as an "initialization" feature, which will allow us to get rid of DiskBufferingConfiguration altogether as we would only use OtelRumConfig to set all initialization params, and then we could move the ExportScheduleHandler setter over to OpenTelemetryRumBuilder if we decided to go with the proposed structure in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with moving this around after this PR, possibly as part of dealing with the larger config structure questions.

@@ -14,11 +18,20 @@ class DefaultExportScheduler : PeriodicRunnable() {
}

override fun onRun() {
// TODO for next PR.
val exporter = SignalDiskExporter.get() ?: return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass an instance to this class instead of reaching out to a global static singleton?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we could, though there are still benefits of having it as a singleton which I mentioned here. I think we should address that comment first before deciding how to handle this scenario.

@LikeTheSalad
Copy link
Contributor Author

LikeTheSalad commented Jan 30, 2024

What if, instead of creating an init event for the span exporter, there was an init event for the disk buffering being initialized? That's really the feature here, and then you can assert that it was called I guess?

This has been answered here (not sure why GH shows it duplicated).

@LikeTheSalad
Copy link
Contributor Author

So I set this up initially with the expectation that most users will want the InitializationEvents to be emitted by default. This can help o11y practitioners to understand how slowly/quickly the otel android sdk starts up and is also super important in understanding which features were enabled for a user session after the fact.

It probably doesn't help that the existing implementation is just a stub. I think I would not want 20 different users to have 20 different ways of formulating these various events. In other words, ideally one day in the future we will have semconv for these events, and I'd hate for opentelemetry-android users to have to go read the semconv in order to build an implementation themselves. We should provide that implementation for them, even if we do decide to default these to "off" (no-op). Users that want them shouldn't have to build it.

Answered here.

@breedx-splk
Copy link
Contributor

@LikeTheSalad let's get the build passing!

@LikeTheSalad
Copy link
Contributor Author

@LikeTheSalad let's get the build passing!

@breedx-splk
Copy link
Contributor

So the disk buffering changes are now available in contrib https://github.com/open-telemetry/opentelemetry-java-contrib/releases/tag/v1.33.0. @LikeTheSalad are you going to pull that in here now?

@LikeTheSalad
Copy link
Contributor Author

So the disk buffering changes are now available in contrib https://github.com/open-telemetry/opentelemetry-java-contrib/releases/tag/v1.33.0. @LikeTheSalad are you going to pull that in here now?

Awesome, it's updated now.

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we made it! Thanks for all the good work on this @LikeTheSalad and for sticking with it.

@breedx-splk
Copy link
Contributor

@LikeTheSalad needs a rebase. 🙏🏻

# Conflicts:
#	gradle/libs.versions.toml
@LikeTheSalad LikeTheSalad merged commit 3632f6a into open-telemetry:main Mar 4, 2024
2 checks passed
@LikeTheSalad LikeTheSalad deleted the disk-buffering-part-4 branch March 5, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants