Improve NooBaa pods recovery in the case of a node failure. Find NooBaa pods running on the failing node and force to delete them to speed up rescheduling in the healthy node.
- In cases where a node is in
NotReady
state, it takes 5 minutes for deployment pods (kubernetes Deployment to failover on a different node. - Statefulset pods, such as noobaa-core and noobaa-db pods, are not restarting automatically until the old pod is explicitly force deleted.
- For pods that are connected to a PV, such as noobaa-db, after the pod is force deleted, it takes more time (~8 minutes) for the PV to detached from the old pod so that the new pod can attach to the PV.
To make the pods failover faster to a new node, noobaa-operator watches the cluster node states. When a node transitions from Ready
to NotReady
status, the HA Controller looks for NooBaa pods on that node, these pods will be force deleted. Once deleted the pods will restart on a new Ready
node.
+-----------------+
| HA Controller |
+--------+--------+
|
|
|
+--------+ +---------------------+------------------------+
| ETCD +-----+ API Server |
+--------+ +----+----------------+-----------------+------+
| | |
| | |
| | |
+----+-----+ +----+-----+ +----+-----+
| Node | | Node | | Node |
+----------+ +----------+ +----------+
High Availability (HA) controller is a controller defining kubernetes Nodes as the source of its events.
- Communication between K8S API server and kubelet running on a worker node is severed
- API server marks the worker node state as
NotReady
- HA Controller (HAC) watching cluster nodes states, detect a worker node state transition, reconciliation is initiated.
- The HAC Reconciler lists NooBaa pods on the failing node and requests API server to delete those pods. The new pod state is committed into ETCD
- The pod controller (Deployment, StatefulSet, etc) reacts to pod deletion and reschedules the pod on a healthy node
Node is ready, if there is a NodeReady
node condition in node's status. A worker node becomes not ready if the connection between the worker and the master node was broken, the node rebooted, or any other communication error between the K8S API Server and the kubelet process.
Predicates allow controllers to filter events before they are provided to EventHandlers. There are several kinds of events, such as CreateEvents, GenericEvent, DeleteEvent and UpdateEvent. Update event where old Node state is Ready
and current state is NotReady
indicates a node goes down event. All other events are filtered out.
Reconcile()
is called when a node in the cluster transitions from Ready
to NotReady
state. High Availability (HA) controller lists all NooBaa pods in the failing node filtering using pods label, namespace, and name of the node conditions:
- Pod is labeled with
app=noobaa
- Pod runs in the watched namespace
- Pod runs on the failed node
All the pods matching the above are force-deleted to allow fast rescheduling on a healthy node.
For noobaa-db pod which is attached to a PV it may take more time until the new pod can attach to the PV. Add noobaa-db PV handling (?) detach db PV from the failing node.