Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compose hangs at [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini #730

Open
BearGFR opened this issue Nov 1, 2024 · 8 comments

Comments

@BearGFR
Copy link

BearGFR commented Nov 1, 2024

Attempting to bring up containerized mailman/mailman-web following the instructions at https://asynchronous.in/docker-mailman/

Everything seems to progress more or less normally, then everything stops/hangs after the message:

             [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini

No matter how long I let it sit, it never gets past that.

@maxking
Copy link
Owner

maxking commented Nov 2, 2024

The next step is the container waiting for the database to be up and running, is your database container and up and running fine?

@BearGFR
Copy link
Author

BearGFR commented Nov 2, 2024

Yes, they are. I can see that the database and tables have been initialized and loaded.
Here are some snippets from the container log.

`Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: MySQL is up - continuing
Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Creating log file for mailman web
Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Creating log file for uwsgi..
Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: settings_local.py not found, it is highly recommended that you provide one
Nov 02 08:25:05 RHEL01.(mydomain) (mycontainer)[479382]: Using default configuration to run.
Nov 02 08:25:06 RHEL01.(mydomain) (mycontainer)[479382]: Compiling locale files in /usr/lib/python3.8/site-packages
Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/udm/LC_MESSAGES
Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/pt_BR/LC_MESSAGES
Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/cy/LC_MESSAGES
Nov 02 08:25:07 RHEL01.(mydomain) (mycontainer)[479382]: processing file django.po in /usr/lib/python3.8/site-packages/django/contrib/flatpages/locale/zh_Hans/LC_MESSAGES
.
.
.
.
Nov 02 08:25:12 RHEL01.(mydomain) (mycontainer)[479382]: /opt/mailman-web
Nov 02 08:25:14 RHEL01.(mydomain) (mycontainer)[479382]: Compressing... done
Nov 02 08:25:14 RHEL01.(mydomain) (mycontainer)[479382]: Compressed 2 block(s) from 126 template(s) for 0 context(s).
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Operations to perform:
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Apply all migrations: account, admin, auth, contenttypes, django_mailman3, django_q, hyperkitty, openid, postorius, sessions, sites, socialaccount
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Running migrations:
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying contenttypes.0001_initial... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0001_initial... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying account.0001_initial... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying account.0002_email_max_length... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0001_initial... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0002_logentry_remove_auto_add... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying admin.0003_logentry_add_action_flag_choices... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying contenttypes.0002_remove_content_type_name... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0002_alter_permission_name_max_length... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0003_alter_user_email_max_length... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0004_alter_user_username_opts... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0005_alter_user_last_login_null... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0006_require_contenttypes_0002... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying auth.0007_alter_validators_add_error_messages... OK
.
.
.

Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0011_auto_20191109_1219... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0012_auto_20200420_2136... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying postorius.0013_auto_20201116_0058... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying sessions.0001_initial... OK
Nov 02 08:25:54 RHEL01.(mydomain) (mycontainer)[479382]: Applying sites.0002_alter_domain_unique... OK
Nov 02 08:25:55 RHEL01.(mydomain) (mycontainer)[479382]: Creating admin user admin ...
Nov 02 08:25:56 RHEL01.(mydomain) (mycontainer)[479382]: Superuser created successfully.
Nov 02 08:25:56 RHEL01.(mydomain) (mycontainer)[479382]: Setting mmweb.(mydomain) as the default domain ...
Nov 02 08:25:57 RHEL01.(mydomain) (mycontainer)[479382]: [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini
`

...and that's where everything hangs. There are no more messages after that.

@BearGFR
Copy link
Author

BearGFR commented Nov 2, 2024

Perusing the logs after I persisted them, I found a message in uwsgi-error.log:

       MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')

I don't understand how that could be because the other components of mailmanweb that ran ahead of this were clearly able to access and update the database?

@BearGFR
Copy link
Author

BearGFR commented Nov 4, 2024

No ideas I guess?

@maxking
Copy link
Owner

maxking commented Nov 8, 2024

I don't have any great ideas why something like this would happen. Can you try it out with some other versions of mysql to see if this is an isolated issue?

@paulej
Copy link

paulej commented Jan 2, 2025

I was running 0.4 on Fedora 40. I upgraded to Fedora 41 and, not realizing it, the CPU was being pegged at nearly 100% for a while by the process uwsgi. I went searching for a reason, but I could not find one.

If I do a docker compose up, I get this as the last line of output:

mailman-web   | [uWSGI] getting INI configuration from /opt/mailman-web/uwsgi.ini

Mailman does appear to be working, but I wonder if the process is getting stuck at this point and thrashing trying to do something, while other processes are doing what they should do? I'm using MariaDB, but I did not see any errors in the logs about database access timeouts or failures. And, it's definitely able to access the database.

I tried moving to 0.5.2 hoping that might resolve it, but I get the same result.

I'm not sure if this is related. @BearGFR, were you also seeing high CPU usage from uwsgi?

EDIT: I should note that my database is outside of docker natively on the host.

EDIT2: I ran "strace" to see what system calls the stuck process was running. It spewed these lines faster than I think the display could render:

close(269438681)                        = -1 EBADF (Bad file descriptor)

I am not sure what it's doing, but definitely one of those uwsgi processes is doing something naughty.

I attached to the offending process and I could see the call stack. Best I can tell, it might be this loop that is closing sockets: https://github.com/unbit/uwsgi/blob/master/core/utils.c#L2542. If that uwsgi.max_fd value is set to some crazy large number, that might explain what I see. (I've not investigated further to see, yet.)

EDIT 3: Indeed, that line of code is the issue. I'm not sure where "max_fd" is set, but I build the code from the current source repository and printed it out the logging function in the code. The value produced was 1073741816. So, it's in that loop trying to close a ton of file descriptors. Next... to find where that is being set.

EDIT 4: It appears to be set in uwsgi_start(). There is also a log corresponding to it:

uwsgi-error.log:detected max file descriptor number: 1073741816

Specifically, this line of code: https://github.com/unbit/uwsgi/blob/master/core/uwsgi.c#L2776

I added this to the docker-compose file for the web service:

    ulimits:                                                                                                            
      nofile:                                                                                                           
        soft: 1024                                                                                                      
        hard: 2048

This seems to have tamed it. I know it's still trying to close maybe 1024 or 2048 sockets, but it doesn't seem to do it constantly. I did not investigate further, but this at least keeps the CPU from pegging at 100%.

@BearGFR
Copy link
Author

BearGFR commented Jan 3, 2025 via email

@paulej
Copy link

paulej commented Jan 3, 2025

ACK. Sounds like the issue I uncovered and yours may not be the same. It still might have been, as high CPU usage can cause weird things to happen. But, I felt it was useful to document what I found since I figured others were quite likely to encounter the same thing. In my case, there's nothing wrong with the Docker config kindly provided by @maxking. Rather, it has something to do with the combination of Docker on Fedora 41 + that uwbgsi process is handling sockets. I did open an issue at that the uwbgsi repository, as I think they ought not attempt to close 1,073,741,816 sockets -- even if the OS tells them they can :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants