Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated mr to include latest feautures with load balanced sockets #3223

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

varunbajpai
Copy link

@varunbajpai varunbajpai commented Jun 7, 2024

This MR is a followup to this issue mentioned here : #2938
Why this MR : So that we can use the latest release with our requirement of load balancing across workers


As previously noted in issues https://github.com/benoitc/gunicorn/issues/2100, https://github.com/benoitc/gunicorn/issues/1984, https://github.com/benoitc/gunicorn/issues/2465, https://github.com/benoitc/gunicorn/issues/2095, https://github.com/benoitc/gunicorn/issues/2192, and https://github.com/benoitc/gunicorn/issues/2168 - Gunicorn isn't correctly load balancing requests across processes, even if you flip on reuse_port.

An aside to really spell out what the problem is:
As @tilgovi noted in https://github.com/benoitc/gunicorn/pull/2101 and https://github.com/benoitc/gunicorn/pull/2102, the root of the problem is that when the workers fork, the socket data structure from the master process is reused for each worker. Fundamentally, this means SO_REUSEPORT doesn't work as designed, since there's really just one socket object being bound to the address.

We tried https://github.com/benoitc/gunicorn/pull/2101 as a fix for this - but soon ran into a variety of bizarre crashes including “socket operation on non-socket” and “Bad file descriptor”. After quite a bit of debugging, we found a use after free in Gunicorn's gevent worker https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L41-L45.

In this code, the original objects in self.sockets (before line 45) are eventually garbage collected and thus closed, leading to "Bad file descriptor". If the file descriptor gets reused, it can lead to "socket operation on non-socket". If you call socket.detach as in this patch - the socket is preserved by the process and it doesn't matter if socket gets garbage collected.

I ran some experiments to quantify the effects of this patch - here are some results.
All these processes are running gunicorn --workers=8 -k gevent --reuse-port test:app.
I collate the number of requests handled by each PID:

@benoitc benoitc self-requested a review July 4, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant