-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer PreferencesService to lazy/later and/or worker thread #149
Comments
I think the solution should be to run all the initialization in a background thread and provide callbacks for the user to know when OTel is ready to be used. The reason I mention this is because |
In theory, it's ok but in practice, this has downsides, for example:
Maybe an option is to init. the "core" parts of the SDK in the main thread and defer everything else to a worker. Another option is just to defer the blocking bits to a worker (preferred IMO), and await the worker whenever needed even if that blocks for a tiny bit, an example of that is the storage init., it's gonna be init. in a worker, but if a trace or error has to be cached in storage and the worker is still ongoing, it's ok to For the NTP bits, I don't know any other solution than an NTP sync, there are libs that do that async instead, that doesn't need to be a blocking call, the very first few events before the NTP sync is finished might be off, but that's fine if all the following events are accurate. Another option is doing a clock drift on the server side, for example, event A has 12:00 as a time when calling the API to send the event, you also send the current timestamp (request time), let's say, 12:10, the server compares with its current timestamp and do a clock drift, if the server is 12:15 when it receives the request, and the request was done at 12:10 device time, the server knows that the device is 5 minutes late, so the event that was at 12:00 is 12:05. |
I agree, although it seems like having a lazy init for when disk caching is enabled could also mean losing errors during app start, since they won't be cached in disk before the lazy disk set-up work is done. I guess we could have some exporter wrapper that's init-aware and sends the data straight to the server if the disk set-up is not done yet... Though I'm wondering also what would be a nice amount of "opinionated" features we should add by default to this lib before someone complains about unexpected behaviors.
I think this is the ideal, balanced choice 👍 although the signal exporters, which are the ones in charge of caching data in disk, are unfortunately part of that "core", so it wouldn't be that easy, there's probably something we could do such as what I mentioned about having a wrapper exporter that's init-aware and knows where to delegate the data depending on the init state, though that's only an assumption of mine, I'll need to check how feasible it is in code.
If I understood correctly, this part suggests blocking the main thread for a tiny bit, not long enough to cause any harm to the user experience, while ensuring that no data gets lost even if some signals happen during the app init. This is what we're currently doing in the Elastic agent to avoid losing signals, also because, in my experience, the SharedPref's init part (done in
I agree, and I'm glad to hear this as this is the way we're currently doing so in the Elastic agent so it's a nice feedback 👍 The downside is, as you mentioned, that the first couple of signals will be off, but I think it's still worth it, besides, we're making TrueTime to use SharedPrefs as its cache system to avoid unnecessary extra network calls. To summarize, I think all options will have pros and cons, I don't see a "perfect" one, but generally speaking, I like the idea of having OTel ready in the main thread to make sure no data gets lost during the app's initialization, and in order to achieve that, I think we should be able to do certain disk operations (reading ones, not writing) in the main thread without causing issues to the end user, and leave all the writing and network related operations to be initialized in a worker one. I think that'd be the most straightforward approach at least based on my own experience with disk-reading operations, though I'm open to any opinions on why reading from sharedprefs could cause important issues to the app's initialization if done in the main thread for this use case, and if it's really an issue, then I guess the alternative would be to try and add special behaviors to the agent, such as bypassing the disk buffering exporter if the disk-related setup is not ready before some data gets created. |
Another option is to offer async init as an option (but not the default) with the known issues that that may bring. Having the SDK ready and instrumenting after the init method is invoked is powerful and gives us some guarantees about when telemetry is expected to roll in, but not everyone requires that level of guarantee, so it may make sense to offer that as an option for those who don't want disk reads of simply don't want to the app to be blocked on startup. And FWIW, having certain services' init be deferred or done async is an attractive compromise to improve the init time and do more expensive work off the startup blocking path, but there's a level of complexity and overhead that it will create to make it just a bit annoying to maintain. It's less so if it's instrumentation that collects data passively, as you just have to make clear that the data collection lifecycle for that instrumentation is decoupled from the core SDK. But anything that activates abilities that affect manual telemetry recording by an app will have to come with API that reports back whether that feature is ready, which is cumbersome to use. So unless there a really really good reason, it's best if that's avoided. |
Thanks @bidetofevil - I agree, it might be useful to provide that as an option just in case some projects would prefer losing some data over making reading disk operations or adding any kind of overhead during app launch 👍 So to summarize, based on what's been discussed in this thread, it seems like this would be the overall conclusion:
|
Just wanting to bump this up to see if there will be a future release where some of the initialization work will be moved off main. We are seeing a few ANRs on lower end devices in relation to the android.os.BinderProxy.transact (BinderProxy.java:624)
android.os.storage.IStorageManager$Stub$Proxy.getAllocatableBytes (IStorageManager.java:2819)
android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2298)
android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2288)
io.opentelemetry.android.internal.services.CacheStorage.getAvailableSpace (CacheStorage.java:66)
io.opentelemetry.android.internal.services.CacheStorage.ensureCacheSpaceAvailable (CacheStorage.java:50)
io.opentelemetry.android.internal.features.persistence.DiskManager.getMaxFolderSize (DiskManager.kt:58)
io.opentelemetry.android.OpenTelemetryRumBuilder.createStorageConfiguration (OpenTelemetryRumBuilder.java:338)
io.opentelemetry.android.OpenTelemetryRumBuilder.build (OpenTelemetryRumBuilder.java:286)
io.opentelemetry.android.OpenTelemetryRumBuilder.build (OpenTelemetryRumBuilder.java:271) In the documentation for |
As part of the first disk buffering PR #145, it was raised by @marandaneto that the
PreferencesService
is loading shared preferences from the context, which does perform IO and will result in a warning when StrictMode is used. This suggests that this work should either be deferred until needed or moved to a worker thread.See:
#145 (comment)
The text was updated successfully, but these errors were encountered: