Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose error fingerprints as an HTTP header #796

Merged
merged 2 commits into from
Nov 20, 2023

Conversation

rowanmanning
Copy link
Member

@rowanmanning rowanmanning commented Nov 17, 2023

This sets a new header in the error rendering middleware which exposes the fingerprint of an error if it has one. This allows us to better debug in production environments - our logs also include this fingerprint so that you can map our generic error pages to the type of error that occurred and where in the code it was thrown.

How it works

  • Complete: the serializeError method, used in our logging and error rendering, now adds a fingerprint property to an error object. This fingerprint is a hash of the first two lines of the error stack trace, meaning that the fingerprint can locate an error in our codebase without giving away error information to end-users or attackers

  • Complete: the latest version Reliability Kit error logging middleware already logs this alongside the rest of the error. This allows us to easily group errors that occur in the same part of the source code

  • The change in this PR: we now expose this fingerprint as an error-fingerprint HTTP header in the error rendering middleware. This is a safe way of exposing error information to the end user and we can match it to the error in our logs to extract the full error details

  • Future: once this header is present on our error pages, we'll be able to use it to aid debugging production issues. We have a couple of ideas:

    • Expose the error fingerprint in the UI: we already do this with "customer care codes", giving customers a code to quote to Customer Care which then speeds up their diagnosis of which team to contact. If we expose the error fingerprint then the engineer who picks up the bug report will be able to find the exact error being thrown via our logs

    • Read the error fingerprint in a browser extension: if the header is present in the browser then we could use it in an internal-facing browser extension. We could allow Operations and Customer Care to navigate directly from an error page to a Splunk search for the error fingerprint, reducing the time required to identify the responsible system

Some considerations

  • If an app is behind a CDN and error pages are cached, then we will need to vary on the error-fingerprint header in order for this to be useful - otherwise the first fingerprint will get cached

  • Not all apps will get this functionality at the same time. They'll need to either start using the error rendering middleware or bump the major version, because this is landing as part of a breaking change

This sets a new header in the error rendering middleware which exposes
the fingerprint of an error if it has one. This allows us to better
debug in production environments - our logs also include this
fingerprint so that you can map our generic error pages to the type of
error that occurred and where in the code it was thrown.

Some considerations:

  * If an app is behind a CDN and error pages are cached, then we will
    need to vary on the `x-error-fingerprint` header in order for this
    to be useful - otherwise the first fingerprint will get cached

  * Not all apps will get this functionality at the same time. They'll
    need to either start using the error rendering middleware or bump
    the major version, because this is landing as part of a breaking
    change
@rowanmanning rowanmanning requested a review from a team as a code owner November 17, 2023 10:14
@wheresrhys
Copy link
Contributor

ooh - sweet idea. One question - why the first 2 lines? How confident are you that this is sufficient information to generate unique enough identifiers. Also, does it need to be a hash; could it not just be a randomly generated unique identifier?

@rowanmanning
Copy link
Member Author

Maybe missing a bit of context: this is not a unique identifier for the specific instance of an error that occurred, it's also designed to help us group errors of the same type so that engineers can identify the parts of their codebase where the most errors occur. So:

  • Why the first two lines: the first line is the error name/message, the second line is the line of code where the error was thrown which can change if the file is changed. The later lines contain the full trace through the code and the more lines we look at the more likely it is that the hash will change when changes are made to the codebase. So the first line is like a compromise - it's unique enough that we can group changes without being so unique that a deep dependency update changes the hash.

  • Why not a random ID: this would identify the individual instance of an error rather than the line of code where it's thrown, we already have this with request IDs which is another tool we could use. I see it as:

    • error fingerprint = the file / line of code where the error occurred
    • request ID = the exact instance of the error

@alexmuller
Copy link
Member

suggestion: Would you be happy to do this without the X- prefix? It's deprecated. We could either do it unprefixed or make it an Ft-/FT-/ft-/fT- prefix?

@rowanmanning
Copy link
Member Author

Yeah I can do that. I'd rather go unprefixed then as Using FT- could be an annoying future issue if, for example, one of the specialist titles wants to use this package.

This way of defining headers is deprecated:
https://datatracker.ietf.org/doc/html/rfc6648

Co-Authored-By: Alex Muller <[email protected]>
Co-Authored-By: Cynthia Mbulu <[email protected]>
Copy link
Contributor

@CyntiBinti CyntiBinti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one Rowan! Exciting times 😸

@rowanmanning rowanmanning force-pushed the expose-error-fingerprints branch from 298622c to fbcfc3a Compare November 20, 2023 09:44
@rowanmanning rowanmanning merged commit 5de9864 into main Nov 20, 2023
3 checks passed
@rowanmanning rowanmanning deleted the expose-error-fingerprints branch November 20, 2023 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants