Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libp2p randomly throws unhandled error #2055

Open
EmiM opened this issue Nov 10, 2023 · 13 comments
Open

Libp2p randomly throws unhandled error #2055

EmiM opened this issue Nov 10, 2023 · 13 comments
Assignees
Labels
bug Something isn't working libp2p

Comments

@EmiM
Copy link
Contributor

EmiM commented Nov 10, 2023

Libp2p sometimes throws exception which is visible for user as "abnormal backend termination" (as all other unhandled backend errors).

Our hypothesis is that it's because of our websocketOverTor and that tor changes the behavior of transport. Libp2p is not prepared for that and does not handle the error properly.

We can probably just ignore this particular error for now.

Errors I got:

node:internal/event_target:1006
  process.nextTick(() => { throw err; });
                           ^

Error: stream ended before 1 bytes became available
    at eval (webpack://@quiet/backend/./node_modules/it-reader/dist/src/index.js?:59:33)
    at async Object.next (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/dist/src/multistream.js?:63:27)
    at async abortable (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/node_modules/abortable-iterator/dist/src/index.js?:38:26)
    at async decoder (webpack://@quiet/backend/./node_modules/it-length-prefixed/dist/src/decode.js?:37:26)
    at async first (webpack://@quiet/backend/./node_modules/it-first/index.js?:11:20)
    at async eval (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/dist/src/multistream.js?:75:243)
    at async read (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/dist/src/multistream.js?:75:17)
    at async Module.readString (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/dist/src/multistream.js?:85:17)
    at async Module.select (webpack://@quiet/backend/./node_modules/@libp2p/multistream-select/dist/src/select.js?:38:20)
    at async ConnectionImpl.newStream [as _newStream] (webpack://@quiet/backend/./node_modules/libp2p/dist/src/upgrader.js?:336:50) {
  code: 'ERR_UNDER_READ',
  buffer: Uint8ArrayList { bufs: [], length: 0 }
  node:internal/event_target:1006
  process.nextTick(() => { throw err; });
                           ^

AbortError: The operation was aborted
    at nextAbortHandler (webpack://@quiet/backend/./node_modules/libp2p/node_modules/abortable-iterator/dist/src/index.js?:34:32)
    at EventTarget.abortHandler (webpack://@quiet/backend/./node_modules/libp2p/node_modules/abortable-iterator/dist/src/index.js?:21:17)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:731:20)
    at EventTarget.dispatchEvent (node:internal/event_target:673:26)
    at abortSignal (node:internal/abort_controller:308:10)
    at TimeoutController.abort (node:internal/abort_controller:338:5)
    at TimeoutController.abort (webpack://@quiet/backend/./node_modules/timeout-abort-controller/index.js?:26:18)
    at eval (webpack://@quiet/backend/./node_modules/timeout-abort-controller/index.js?:16:38)
    at Retimer._timerWrapper (webpack://@quiet/backend/./node_modules/retimer/retimer.js?:21:18)
    at listOnTimeout (node:internal/timers:564:17) {
  type: 'aborted',
  code: 'ABORT_ERR'
}
@EmiM EmiM added this to Quiet Nov 10, 2023
@EmiM EmiM converted this from a draft issue Nov 10, 2023
@holmesworcester
Copy link
Contributor

If we ignore the error does libp2p continue to work properly?

@EmiM
Copy link
Contributor Author

EmiM commented Nov 13, 2023

Ok, ignoring the error is not recommended. Unhandled exception will leave the process in undefined state.
https://nodejs.org/api/process.html#warning-using-uncaughtexception-correctly

Probably the only real solution would be forking libp2p and handling those errors ourselves.

Workarounds:

  • restart backend without closing the app. IOS already does that so that's possible.
  • add (bring back?) error modal which would appear on 'abnormal backend termination' with a button for restarting application. This would be better from UX perspective than throwing js error in user's face.

Moving this task to blocked until we decide what to do.

@EmiM EmiM moved this from In progress to Blocked in Quiet Nov 13, 2023
@holmesworcester
Copy link
Contributor

Restarting backend seems like a good workaround.

All errors would trigger a restart?

@holmesworcester holmesworcester moved this from Blocked to Next Sprint in Quiet Nov 13, 2023
@holmesworcester
Copy link
Contributor

Moved out of blocked since it seems like we have a plan.

@EmiM
Copy link
Contributor Author

EmiM commented Nov 13, 2023

All errors would trigger a restart?

All unhandled errors

@leblowl
Copy link
Contributor

leblowl commented Nov 13, 2023

Thanks for finding this! Some initial questions come to mind. Do we feel fairly confident in the cause of the error?

Then looking at this part of the error:

Error: stream ended before 1 bytes became available
...
ConnectionImpl.newStream

I'm not entirely sure if I understand what is happening, but I think it makes sense that sometimes a stream wouldn't product data quickly enough because Tor is slow. If a stream isn't receiving data, then can we simply recreate the stream? And retry until we do receive data? Does libp2p recreate the stream for us?

Other thoughts:

Restarting the backend could be a good quick fix, but it also seems like an expensive thing to do. We might be able to see if there is a fix closer to where the error occurs. To me, I see a couple layers of solutions:

  1. prevent the error
  2. recover from the error by restarting a stream or connection
  3. recover from the error by restarting libp2p
  4. recover from the error by restarting backend

Of course there are a lot more options, but that is an example of some layers to present an idea.

@holmesworcester
Copy link
Contributor

Our decision: we will restart orbitdb/libp2p when we have an unhandled exception and we will spend 1 day investigating the errors we see to understand them and possibly find a fix.

@EmiM
Copy link
Contributor Author

EmiM commented Jan 9, 2024

I was thinking if maybe debugging it or handling it in a special way makes sense right now:

  • We are using older version of libp2p so I feel like debugging may be the waste of time.
  • Restarting services may require to handle some edge cases when something could go wrong.
  • We are in a process of planning architecture which may lead to serious changes in the backend anyway.
  • The error occurs relatively rarely

So maybe just show the pretty error-modal to user (with a stacktrace) and ask to restart the app? @holmesworcester what do you think?

@holmesworcester
Copy link
Contributor

For me the error occurs very consistently when I leave Quiet running. It takes a while but it always happens eventually.

Could we restart the backend automatically when it happens?

@EmiM
Copy link
Contributor Author

EmiM commented Jan 9, 2024

If you are talking about the whole backend then maybe? I don't know if this is already handled on frontend - case when backend is being restarted and user performs actions at the same time, e.g tries to send a message or tries to create a channel.

@holmesworcester
Copy link
Contributor

holmesworcester commented Jan 9, 2024

It happens rarely enough that if we could temporarily freeze the frontend and show some "Restarting backend..." message, that would not be disruptive.

Fixing the problem is preferable, but there might be other problems that emerge in the future and it would be great if our backend is "self healing" when it encounters an error.

Also, does orbitdb need ipfs to be running to add and remove data? Can we sit around ipfs or libp2p and restart when we catch an error?

What is the cost of attempting to upgrade OrbitDB now? Should we try it, see how hard it is, and choose to restart the backend if it's too hard?

@siepra siepra added bug Something isn't working and removed 2.1.x labels Jan 11, 2024
@holmesworcester holmesworcester moved this from In progress to Next Sprint in Quiet Jan 23, 2024
@holmesworcester
Copy link
Contributor

Let's revisit this after we upgrade libp2p.

@holmesworcester holmesworcester moved this from Sprint to Backlog - Desktop & Backend in Quiet Feb 23, 2024
@kingalg
Copy link
Collaborator

kingalg commented Jan 30, 2025

After discussing this with Isla, we will resolve it in an issue (#2708), but we will keep this issue as the information included in it will be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libp2p
Projects
Status: Backlog - Desktop & Backend
Development

No branches or pull requests

6 participants