Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bot stops working for seemingly no reason (socket connection memory leaks?) #384

Closed
MarcusOtter opened this issue Jan 31, 2023 · 6 comments
Labels
confirmed bug 🦋 Something isn't working help wanted 🙏 Extra attention is needed

Comments

@MarcusOtter
Copy link
Owner

MarcusOtter commented Jan 31, 2023

Describe the bug

Every so often (everything from 1 day between 1 month) the bot will randomly stop working. It is running in https://github.com/Unitech/pm2 so it's not the entire process that crashes (as that would automatically restart the bot) - the bot just isn't connected to Discord anymore and goes offline. The process is still running and there is nothing in the logs - which could be because we're not listening to the right events (see #299).

My theory is that this has to do with pm2 not sending the proper termination signals when restarting the bot, which I guess might leave some socket connections or something open and eventually exahust all the available connections -> eventually we cannot connect. When I was setting up pm2 I had issues with trying to see the "Destroyed client" message that we log when the bot shuts down, I just couldn't see it happen ever. I tried to fix it with 4c28360 but I don't think it worked. At least in the logs, I don't see the "Destroyed client" message that I see when running it normally in node, for example.

This theory is further supported by that it takes an unspecified amount of time to happen. But last time it happened, I just did pm2 restart needle, and then it happened almost immediately again after 24h (i.e., perhaps restarting with pm2 does not clear socket connections like it should?). When I manually did pm2 stop needle and pm2 start needle instead of a restart, it worked fine again for 14-31 days.

Steps to reproduce the bug

  1. Run the bot at scale for ~14-31 days
  2. Bot stops working
  3. Restart bot with pm2 restart needle
  4. Bot stops working after 1 day again (could also be coincidence - but indicates that it might have to do with socket connections being busy?)
  5. Stop needle with pm2 stop needle
  6. Start needle with pm2 start needle
  7. Repeat from step 1

Expected behavior

No downtime - we shouldn't have to manually stop and start the bot every so often.

@MarcusOtter MarcusOtter added confirmed bug 🦋 Something isn't working help wanted 🙏 Extra attention is needed labels Jan 31, 2023
@MarcusOtter
Copy link
Owner Author

Would love some help in debugging this and figuring out how we can properly disconnect the old instances of needle when pm2 restarts.

A first step of figuring out if this is happening could be to implement #299.

@MarcusOtter MarcusOtter pinned this issue Jan 31, 2023
@MarcusOtter
Copy link
Owner Author

@nchristopher mentioned that we should listen to SIGKILL too, not just SIGINT like now. I believe I briefly tried this before (and with a bunch of other termination signals, but I couldn't get anything to print out 🤔

Could be because my development machine was Windows and then that falls under other flags (https://pm2.keymetrics.io/docs/usage/signals-clean-restart/#windows-graceful-stop)

@MarcusOtter
Copy link
Owner Author

MarcusOtter commented Mar 4, 2023

It may also be rate limit related? It seems to be happening more frequently now. It can also happen on specific shards even though other shards are fine, which seems to indicate that it's not related to rate limits.

@MarcusOtter
Copy link
Owner Author

Another idea I had was that it may be Discord just being fed up with the amount of API errors we are getting, probably around 1 every 5 seconds - most are related to #308

@MarcusOtter
Copy link
Owner Author

It should definitely not be rate limit, because I seem to be creating less than 1 thread every 5 seconds (seems very low for the server count!). I just removed some of the Discord errors, so we'll see if that improves uptime. I also have more logs now that hopefully reveal more information when Needle crashes.

@MarcusOtter
Copy link
Owner Author

This has not happened again after https://github.com/MarcusOtter/discord-needle/releases/tag/v3.3.0 which is 2 months of "uptime". The bot still crashes every week or so, but fully (which means pm2 restarts it automatically and all is well). I don't know exactly what happened but I think Discord.js changed some implementation in their sockets so maybe that could be it. Either way, I will close this issue and re-open if it happens again.

(Can't reproduce anymore)

@MarcusOtter MarcusOtter closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
@MarcusOtter MarcusOtter unpinned this issue May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed bug 🦋 Something isn't working help wanted 🙏 Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant