Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes as logs #237

Merged
merged 5 commits into from
Mar 6, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,6 @@ public Throwable getCause() {
return cause;
}

String spanName() {
return getCause().getClass().getSimpleName();
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,17 @@
package io.opentelemetry.android.instrumentation.crash;

import io.opentelemetry.android.instrumentation.InstrumentedApplication;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.common.AttributesBuilder;
import io.opentelemetry.api.logs.Logger;
import io.opentelemetry.api.logs.LoggerProvider;
import io.opentelemetry.context.Context;
import io.opentelemetry.instrumentation.api.instrumenter.AttributesExtractor;
import io.opentelemetry.instrumentation.api.instrumenter.Instrumenter;
import io.opentelemetry.semconv.SemanticAttributes;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.List;
import java.util.function.Consumer;

/** Entrypoint for installing the crash reporting instrumentation. */
public final class CrashReporter {
Expand Down Expand Up @@ -38,16 +45,47 @@ public void installOn(InstrumentedApplication instrumentedApplication) {
Thread.getDefaultUncaughtExceptionHandler();
Thread.setDefaultUncaughtExceptionHandler(
new CrashReportingExceptionHandler(
buildInstrumenter(instrumentedApplication.getOpenTelemetrySdk()),
instrumentedApplication.getOpenTelemetrySdk().getSdkTracerProvider(),
buildInstrumenter(
instrumentedApplication
.getOpenTelemetrySdk()
.getSdkLoggerProvider()),
instrumentedApplication.getOpenTelemetrySdk().getSdkLoggerProvider(),
existingHandler));
}

private Instrumenter<CrashDetails, Void> buildInstrumenter(OpenTelemetry openTelemetry) {
return Instrumenter.<CrashDetails, Void>builder(
openTelemetry, "io.opentelemetry.crash", CrashDetails::spanName)
.addAttributesExtractor(new CrashDetailsAttributesExtractor())
.addAttributesExtractors(additionalExtractors)
.buildInstrumenter();
private void emitCrashEvent(Logger crashReporter, CrashDetails crashDetails) {
Throwable throwable = crashDetails.getCause();
Thread thread = crashDetails.getThread();
AttributesBuilder attributesBuilder =
Attributes.builder()
.put(SemanticAttributes.EXCEPTION_ESCAPED, true)
.put(SemanticAttributes.THREAD_ID, thread.getId())
.put(SemanticAttributes.THREAD_NAME, thread.getName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth considering adding a new attribute if it is the main thread or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm actually not sure if the concept of "main thread" is as relevant in every other platform as it is in Android in order to have it as a general thread attr, but I guess we can always try and check what people think about it. If you have some use cases in mind that would benefit from this attr, I think you could mention them in an issue in this repo, or maybe just create a draft adding the new attr in this file, is usually easier to convey ideas with code changes rather than only with a description in an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will do, just for context, exception triage, IMO main thread errors are most likely more important than background errors (if the thread is a daemon) maybe the app won't even crash, but not sure if in this case the default signal handler would be called anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the same thing but here there's thread id already.

.put(SemanticAttributes.EXCEPTION_MESSAGE, throwable.getMessage())
.put(SemanticAttributes.EXCEPTION_STACKTRACE, stackTraceToString(throwable))
.put(SemanticAttributes.EXCEPTION_TYPE, throwable.getClass().getName());

for (AttributesExtractor<CrashDetails, Void> extractor : additionalExtractors) {
extractor.onStart(attributesBuilder, Context.current(), crashDetails);
}

crashReporter.logRecordBuilder().setAllAttributes(attributesBuilder.build()).emit();
breedx-splk marked this conversation as resolved.
Show resolved Hide resolved
}

private String stackTraceToString(Throwable throwable) {
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);

throwable.printStackTrace(pw);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the printStackTrace also stringify all the inner exceptions? if not, we'd need to iterate the cause recursively and append it in a string builder maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it prints the "Caused by: ..." parts too.

pw.flush();

return sw.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason Timber allocates only 256 and flush manually, see https://github.com/JakeWharton/timber/blob/dfc4e70bd1a4b4e1e136647b96fc9c9dbcb41f65/timber/src/main/java/timber/log/Timber.kt#L175-L181
not sure there's any performance gain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite interesting, the 256 initial buffer size seems related to performance indeed. I'm usually not keen on doing premature optimizations, though I think, based on how popular Timber is, I guess it's worth following their approach.

Regarding the and flush manually part I'm a bit confused because it seems like that's the default behavior of PrintWriter, so I'm not sure why they had to use that constructor if they didn't need to change the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the 256 initial size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably because you call flush manually anyway? so makes sense that the autoFlush=false, unless the default is false already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the default is false, that's why I'm a little confused about the code in Timber 😅

}

private Consumer<CrashDetails> buildInstrumenter(LoggerProvider loggerProvider) {
return crashDetails ->
emitCrashEvent(
loggerProvider.loggerBuilder("io.opentelemetry.crash").build(),
breedx-splk marked this conversation as resolved.
Show resolved Hide resolved
crashDetails);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,23 @@
package io.opentelemetry.android.instrumentation.crash;

import androidx.annotation.NonNull;
import io.opentelemetry.context.Context;
import io.opentelemetry.instrumentation.api.instrumenter.Instrumenter;
import io.opentelemetry.sdk.common.CompletableResultCode;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.logs.SdkLoggerProvider;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;

final class CrashReportingExceptionHandler implements Thread.UncaughtExceptionHandler {

private final Instrumenter<CrashDetails, Void> instrumenter;
private final SdkTracerProvider sdkTracerProvider;
private final Consumer<CrashDetails> crashSender;
private final SdkLoggerProvider sdkLoggerProvider;
private final Thread.UncaughtExceptionHandler existingHandler;

CrashReportingExceptionHandler(
Instrumenter<CrashDetails, Void> instrumenter,
SdkTracerProvider sdkTracerProvider,
Consumer<CrashDetails> crashSender,
SdkLoggerProvider sdkLoggerProvider,
Thread.UncaughtExceptionHandler existingHandler) {
this.instrumenter = instrumenter;
this.sdkTracerProvider = sdkTracerProvider;
this.crashSender = crashSender;
this.sdkLoggerProvider = sdkLoggerProvider;
this.existingHandler = existingHandler;
}

Expand All @@ -32,7 +31,7 @@ public void uncaughtException(@NonNull Thread t, @NonNull Throwable e) {
reportCrash(t, e);

// do our best to make sure the crash makes it out of the VM
CompletableResultCode flushResult = sdkTracerProvider.forceFlush();
CompletableResultCode flushResult = sdkLoggerProvider.forceFlush();
flushResult.join(10, TimeUnit.SECONDS);

// preserve any existing behavior
Expand All @@ -43,7 +42,6 @@ public void uncaughtException(@NonNull Thread t, @NonNull Throwable e) {

private void reportCrash(Thread t, Throwable e) {
CrashDetails crashDetails = CrashDetails.create(t, e);
Context context = instrumenter.start(Context.current(), crashDetails);
instrumenter.end(context, crashDetails, null, e);
crashSender.accept(crashDetails);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,18 @@

import static io.opentelemetry.api.common.AttributeKey.stringKey;
import static io.opentelemetry.instrumentation.api.instrumenter.AttributesExtractor.constant;
import static org.awaitility.Awaitility.await;
import static org.assertj.core.api.Assertions.assertThat;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

import io.opentelemetry.android.instrumentation.InstrumentedApplication;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.testing.assertj.TraceAssert;
import io.opentelemetry.sdk.logs.data.LogRecordData;
import io.opentelemetry.sdk.testing.assertj.OpenTelemetryAssertions;
import io.opentelemetry.sdk.testing.junit5.OpenTelemetryExtension;
import io.opentelemetry.sdk.trace.data.StatusData;
import io.opentelemetry.semconv.SemanticAttributes;
import java.time.Duration;
import java.util.function.Consumer;
import java.util.List;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
Expand Down Expand Up @@ -56,7 +54,8 @@ void integrationTest() throws InterruptedException {
.build()
.installOn(instrumentedApplication);

RuntimeException crash = new RuntimeException("boooom!");
String exceptionMessage = "boooom!";
RuntimeException crash = new RuntimeException(exceptionMessage);
Thread crashingThread =
new Thread(
() -> {
Expand All @@ -66,26 +65,18 @@ void integrationTest() throws InterruptedException {
crashingThread.start();
crashingThread.join();

Attributes expectedAttributes =
Attributes.builder()
.put(SemanticAttributes.EXCEPTION_ESCAPED, true)
.put(SemanticAttributes.THREAD_ID, crashingThread.getId())
.put(SemanticAttributes.THREAD_NAME, crashingThread.getName())
.put(stringKey("test.key"), "abc")
.build();
assertTrace(
trace ->
trace.hasSpansSatisfyingExactly(
span ->
span.hasName("RuntimeException")
.hasKind(SpanKind.INTERNAL)
.hasStatus(StatusData.error())
.hasException(crash)
.hasAttributes(expectedAttributes)));
}
List<LogRecordData> logRecords = testing.getLogRecords();
assertThat(logRecords).hasSize(1);

private static void assertTrace(Consumer<TraceAssert> assertion) {
await().atMost(Duration.ofSeconds(30))
.untilAsserted(() -> testing.assertTraces().hasTracesSatisfyingExactly(assertion));
Attributes crashAttributes = logRecords.get(0).getAttributes();
OpenTelemetryAssertions.assertThat(crashAttributes)
.containsEntry(SemanticAttributes.EXCEPTION_ESCAPED, true)
.containsEntry(SemanticAttributes.EXCEPTION_MESSAGE, exceptionMessage)
.containsEntry(SemanticAttributes.EXCEPTION_TYPE, "java.lang.RuntimeException")
.containsEntry(SemanticAttributes.THREAD_ID, crashingThread.getId())
.containsEntry(SemanticAttributes.THREAD_NAME, crashingThread.getName())
.containsEntry(stringKey("test.key"), "abc");
assertThat(crashAttributes.get(SemanticAttributes.EXCEPTION_STACKTRACE))
.startsWith("java.lang.RuntimeException: boooom!");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,13 @@

package io.opentelemetry.android.instrumentation.crash;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.ArgumentMatchers.isNull;
import static org.mockito.Mockito.inOrder;
import static org.mockito.Mockito.when;

import io.opentelemetry.context.Context;
import io.opentelemetry.instrumentation.api.instrumenter.Instrumenter;
import io.opentelemetry.sdk.common.CompletableResultCode;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.logs.SdkLoggerProvider;
import java.util.concurrent.TimeUnit;
import java.util.function.Consumer;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InOrder;
Expand All @@ -25,29 +21,27 @@
@ExtendWith(MockitoExtension.class)
class CrashReportingExceptionHandlerTest {

@Mock Instrumenter<CrashDetails, Void> instrumenter;
@Mock SdkTracerProvider sdkTracerProvider;
@Mock Consumer<CrashDetails> crashSender;
@Mock SdkLoggerProvider sdkLoggerProvider;
@Mock Thread.UncaughtExceptionHandler existingHandler;
@Mock CompletableResultCode flushResult;

@Test
void shouldReportCrash() {
when(sdkTracerProvider.forceFlush()).thenReturn(flushResult);
when(sdkLoggerProvider.forceFlush()).thenReturn(flushResult);

CrashReportingExceptionHandler handler =
new CrashReportingExceptionHandler(
instrumenter, sdkTracerProvider, existingHandler);
new CrashReportingExceptionHandler(crashSender, sdkLoggerProvider, existingHandler);

NullPointerException oopsie = new NullPointerException("oopsie");
Thread crashThread = new Thread("badThread");

handler.uncaughtException(crashThread, oopsie);

CrashDetails crashDetails = CrashDetails.create(crashThread, oopsie);
InOrder io = inOrder(instrumenter, sdkTracerProvider, flushResult, existingHandler);
io.verify(instrumenter).start(Context.current(), crashDetails);
io.verify(instrumenter).end(any(), eq(crashDetails), isNull(), eq(oopsie));
io.verify(sdkTracerProvider).forceFlush();
InOrder io = inOrder(crashSender, sdkLoggerProvider, flushResult, existingHandler);
io.verify(crashSender).accept(crashDetails);
io.verify(sdkLoggerProvider).forceFlush();
io.verify(flushResult).join(10, TimeUnit.SECONDS);
io.verify(existingHandler).uncaughtException(crashThread, oopsie);
io.verifyNoMoreInteractions();
Expand Down