feat: expose error fingerprints as an HTTP header #796

rowanmanning · 2023-11-17T10:14:49Z

This sets a new header in the error rendering middleware which exposes the fingerprint of an error if it has one. This allows us to better debug in production environments - our logs also include this fingerprint so that you can map our generic error pages to the type of error that occurred and where in the code it was thrown.

How it works

Complete: the serializeError method, used in our logging and error rendering, now adds a fingerprint property to an error object. This fingerprint is a hash of the first two lines of the error stack trace, meaning that the fingerprint can locate an error in our codebase without giving away error information to end-users or attackers
Complete: the latest version Reliability Kit error logging middleware already logs this alongside the rest of the error. This allows us to easily group errors that occur in the same part of the source code
The change in this PR: we now expose this fingerprint as an error-fingerprint HTTP header in the error rendering middleware. This is a safe way of exposing error information to the end user and we can match it to the error in our logs to extract the full error details
Future: once this header is present on our error pages, we'll be able to use it to aid debugging production issues. We have a couple of ideas:
- Expose the error fingerprint in the UI: we already do this with "customer care codes", giving customers a code to quote to Customer Care which then speeds up their diagnosis of which team to contact. If we expose the error fingerprint then the engineer who picks up the bug report will be able to find the exact error being thrown via our logs
- Read the error fingerprint in a browser extension: if the header is present in the browser then we could use it in an internal-facing browser extension. We could allow Operations and Customer Care to navigate directly from an error page to a Splunk search for the error fingerprint, reducing the time required to identify the responsible system

Some considerations

If an app is behind a CDN and error pages are cached, then we will need to vary on the error-fingerprint header in order for this to be useful - otherwise the first fingerprint will get cached
Not all apps will get this functionality at the same time. They'll need to either start using the error rendering middleware or bump the major version, because this is landing as part of a breaking change

This sets a new header in the error rendering middleware which exposes the fingerprint of an error if it has one. This allows us to better debug in production environments - our logs also include this fingerprint so that you can map our generic error pages to the type of error that occurred and where in the code it was thrown. Some considerations: * If an app is behind a CDN and error pages are cached, then we will need to vary on the `x-error-fingerprint` header in order for this to be useful - otherwise the first fingerprint will get cached * Not all apps will get this functionality at the same time. They'll need to either start using the error rendering middleware or bump the major version, because this is landing as part of a breaking change

wheresrhys · 2023-11-17T10:31:43Z

ooh - sweet idea. One question - why the first 2 lines? How confident are you that this is sufficient information to generate unique enough identifiers. Also, does it need to be a hash; could it not just be a randomly generated unique identifier?

rowanmanning · 2023-11-17T10:38:53Z

Maybe missing a bit of context: this is not a unique identifier for the specific instance of an error that occurred, it's also designed to help us group errors of the same type so that engineers can identify the parts of their codebase where the most errors occur. So:

Why the first two lines: the first line is the error name/message, the second line is the line of code where the error was thrown which can change if the file is changed. The later lines contain the full trace through the code and the more lines we look at the more likely it is that the hash will change when changes are made to the codebase. So the first line is like a compromise - it's unique enough that we can group changes without being so unique that a deep dependency update changes the hash.
Why not a random ID: this would identify the individual instance of an error rather than the line of code where it's thrown, we already have this with request IDs which is another tool we could use. I see it as:
- error fingerprint = the file / line of code where the error occurred
- request ID = the exact instance of the error

alexmuller · 2023-11-20T09:14:21Z

suggestion: Would you be happy to do this without the X- prefix? It's deprecated. We could either do it unprefixed or make it an Ft-/FT-/ft-/fT- prefix?

rowanmanning · 2023-11-20T09:16:03Z

Yeah I can do that. I'd rather go unprefixed then as Using FT- could be an annoying future issue if, for example, one of the specialist titles wants to use this package.

packages/middleware-render-error-info/README.md

This way of defining headers is deprecated: https://datatracker.ietf.org/doc/html/rfc6648 Co-Authored-By: Alex Muller <[email protected]> Co-Authored-By: Cynthia Mbulu <[email protected]>

CyntiBinti

Nice one Rowan! Exciting times 😸

rowanmanning requested a review from a team as a code owner November 17, 2023 10:14

CyntiBinti reviewed Nov 20, 2023

View reviewed changes

packages/middleware-render-error-info/README.md Outdated Show resolved Hide resolved

fix: stop using deprecated x- header prefix

fbcfc3a

This way of defining headers is deprecated: https://datatracker.ietf.org/doc/html/rfc6648 Co-Authored-By: Alex Muller <[email protected]> Co-Authored-By: Cynthia Mbulu <[email protected]>

CyntiBinti approved these changes Nov 20, 2023

View reviewed changes

rowanmanning force-pushed the expose-error-fingerprints branch from 298622c to fbcfc3a Compare November 20, 2023 09:44

rowanmanning merged commit 5de9864 into main Nov 20, 2023
3 checks passed

rowanmanning deleted the expose-error-fingerprints branch November 20, 2023 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose error fingerprints as an HTTP header #796

feat: expose error fingerprints as an HTTP header #796

rowanmanning commented Nov 17, 2023 •

edited

Loading

wheresrhys commented Nov 17, 2023

rowanmanning commented Nov 17, 2023

alexmuller commented Nov 20, 2023

rowanmanning commented Nov 20, 2023

CyntiBinti left a comment

feat: expose error fingerprints as an HTTP header #796

feat: expose error fingerprints as an HTTP header #796

Conversation

rowanmanning commented Nov 17, 2023 • edited Loading

How it works

Some considerations

wheresrhys commented Nov 17, 2023

rowanmanning commented Nov 17, 2023

alexmuller commented Nov 20, 2023

rowanmanning commented Nov 20, 2023

CyntiBinti left a comment

Choose a reason for hiding this comment

rowanmanning commented Nov 17, 2023 •

edited

Loading