-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libp2p randomly throws unhandled error #2055
Comments
If we ignore the error does libp2p continue to work properly? |
Ok, ignoring the error is not recommended. Unhandled exception will leave the process in undefined state. Probably the only real solution would be forking libp2p and handling those errors ourselves. Workarounds:
Moving this task to blocked until we decide what to do. |
Restarting backend seems like a good workaround. All errors would trigger a restart? |
Moved out of blocked since it seems like we have a plan. |
All unhandled errors |
Thanks for finding this! Some initial questions come to mind. Do we feel fairly confident in the cause of the error? Then looking at this part of the error:
I'm not entirely sure if I understand what is happening, but I think it makes sense that sometimes a stream wouldn't product data quickly enough because Tor is slow. If a stream isn't receiving data, then can we simply recreate the stream? And retry until we do receive data? Does libp2p recreate the stream for us? Other thoughts: Restarting the backend could be a good quick fix, but it also seems like an expensive thing to do. We might be able to see if there is a fix closer to where the error occurs. To me, I see a couple layers of solutions:
Of course there are a lot more options, but that is an example of some layers to present an idea. |
Our decision: we will restart orbitdb/libp2p when we have an unhandled exception and we will spend 1 day investigating the errors we see to understand them and possibly find a fix. |
I was thinking if maybe debugging it or handling it in a special way makes sense right now:
So maybe just show the pretty error-modal to user (with a stacktrace) and ask to restart the app? @holmesworcester what do you think? |
For me the error occurs very consistently when I leave Quiet running. It takes a while but it always happens eventually. Could we restart the backend automatically when it happens? |
If you are talking about the whole backend then maybe? I don't know if this is already handled on frontend - case when backend is being restarted and user performs actions at the same time, e.g tries to send a message or tries to create a channel. |
It happens rarely enough that if we could temporarily freeze the frontend and show some "Restarting backend..." message, that would not be disruptive. Fixing the problem is preferable, but there might be other problems that emerge in the future and it would be great if our backend is "self healing" when it encounters an error. Also, does orbitdb need ipfs to be running to add and remove data? Can we sit around ipfs or libp2p and restart when we catch an error? What is the cost of attempting to upgrade OrbitDB now? Should we try it, see how hard it is, and choose to restart the backend if it's too hard? |
Let's revisit this after we upgrade libp2p. |
After discussing this with Isla, we will resolve it in an issue (#2708), but we will keep this issue as the information included in it will be helpful. |
Libp2p sometimes throws exception which is visible for user as "abnormal backend termination" (as all other unhandled backend errors).
Our hypothesis is that it's because of our websocketOverTor and that tor changes the behavior of transport. Libp2p is not prepared for that and does not handle the error properly.
We can probably just ignore this particular error for now.Errors I got:
The text was updated successfully, but these errors were encountered: