-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ordered cluster shutdown #200
Comments
provide notification to any stateful protocols as we onboard them. (CIFS) and figure out how to block new incoming connections on the protocols during the shutdown timeout period (waiting for users to logoff). |
A shutdown needs to go through a number of stages:
Step 1 is the biggest hurdle here, since Ceph itself doesn't offer any help. There is client eviction for CephFS, but up until now no cleaning up is done (feature is tracked here http://tracker.ceph.com/issues/9754). Considering the number of gateway protocols this will be hard to do cleanly. The other steps will be straight forward after this. So how do we fence off all client gateways. CephFS might have a way soon. Samba, ganesha, rgw, rbd, igw I don't know. |
Considering that each service will likely have its own challenges on preventing client connections and terminating clients gracefully, is the first step to return a list of existing connections to the admin? That could serve as a validation check that prevents the remaining steps unless the admin specifically overrides and forces the shutdown. I would think this is in the same spirit of shutting down a host. The admin ultimately decides whether to visit and stop each remote connection or hold the power button. If the default behavior of this orchestration returns a list of active connections with enough information about their sources, I believe the admin may have other options in disconnecting clients. Otherwise, the shutdown proceeds when there is no active connections. Eventually, this would evolve into every service preventing/terminating clients gracefully, but would you think this is sufficient until then? |
What I have seen in other storage systems is a timeout value. You issue the shutdown command and it then asks how many seconds to wait for connections to close. If they aren't all closed at the timeout, too bad. The shutdown can be interrupted by Ctrl-c.
As for fencing, you can use the firewall in a worst case. Basically at time of command issue, create a rule set that only allows active connections. Recheck the active connections every 5 seconds and remove allowed hosts that are no longer connected.
Sent from my iPhone. Typos are Apple's fault.
… On Apr 13, 2017, at 6:55 AM, Eric Jackson ***@***.***> wrote:
Considering that each service will likely have its own challenges on preventing client connections and terminating clients gracefully, is the first step to return a list of existing connections to the admin? That could serve as a validation check that prevents the remaining steps unless the admin specifically overrides and forces the shutdown. I would think this is in the same spirit of shutting down a host. The admin ultimately decides whether to visit and stop each remote connection or hold the power button.
If the default behavior of this orchestration returns a list of active connections with enough information about their sources, I believe the admin may have other options in disconnecting clients. Otherwise, the shutdown proceeds when there is no active connections.
Eventually, this would evolve into every service preventing/terminating clients gracefully, but would you think this is sufficient until then?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Is there any progress on this feature request for DeepSea? FYI - customers that have a UPS/USV and want to shutdown the cluster in a controlled way would really like a deepsea command for doing that... In the meantime this process is also documented here: Could we get a command added that does this using DeepSea? |
add orderly shutdown of all ceph and gateway services.
The text was updated successfully, but these errors were encountered: