Error handling when swarm gets in an odd state #143

Wildcarde · 2017-02-23T23:12:43Z

I've run into 2 nodes in my swarm ending up in a state where the containers running on the node were labelled status 'dead', which based on my configuration in jupyterhub means they should have been removed. However the removal of these containers failed with a 'resource busy' error (similar to this moby/moby#31195).

Under this condition when the user attempts to reconnect, instead of launching a new container under a new name it finds the old one and attempts to relaunch it. Docker will not allow this because the container is flagged for removal. So instead the end user gets a 500 error and can't interact with the hub interface at all.

Would it be possible to add some checks for this so either errors make more sense or better a new container gets launched with an _ instead so the user can continue to work while the backend issue is resolved?

edit: This was resolved for the impacted users on the backend by sshing into the swarm nodes and issuing a 'docker rm -f' for each dead container. After that users could get new containers created again. If we could make the remove portion of dockerspawner do a forced removal that may resolve the issue on it's own.

minrk · 2017-03-01T15:13:25Z

Adding some extra error handling would indeed be great. If you have a traceback, we can make the changes in the right place. A PR would be very welcome!

willingc added enhancement help wanted labels Jul 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling when swarm gets in an odd state #143

Error handling when swarm gets in an odd state #143

Wildcarde commented Feb 23, 2017 •

edited

Loading

minrk commented Mar 1, 2017

Error handling when swarm gets in an odd state #143

Error handling when swarm gets in an odd state #143

Comments

Wildcarde commented Feb 23, 2017 • edited Loading

minrk commented Mar 1, 2017

Wildcarde commented Feb 23, 2017 •

edited

Loading