The operator will wait forever for kube-apiserver to update in case the domain or apicert in the CR is replaced #100

eranco74 · 2023-09-12T15:45:25Z

I updated the domain and apiCert of an existing clusterrelocation CR.

The operator keeps logging this:

2023-09-12T15:17:00Z	INFO	controllers/clusterrelocation_controller.go:398	Waiting for kube-apiserver to update	{"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "daa97d73-2824-4080-8eea-c8431e4c665a"}
^[[B^[[D2023-09-12T15:17:10Z	INFO	controllers/clusterrelocation_controller.go:398	Waiting for kube-apiserver to update	{"controller": "clusterrelocation", "controllerGroup": "rhsyseng.github.io", "controllerKind": "ClusterRelocation", "ClusterRelocation": {"name":"cluster"}, "namespace": "", "name": "cluster", "reconcileID": "daa97d73-2824-4080-8eea-c8431e4c665a"}

Seems that the operator will not move to progressing because there's no need to update the kube-apiserver deployment (the secret name stay the same...)
so it just hangs here:

cluster-relocation-operator/internal/util/util.go

Line 17 in e01ce59

    
           func WaitForCO(ctx context.Context, c client.Client, logger logr.Logger, operator string) error {

Expected Behavior

Expected the update to work, I geuss a better check we have here is good enough, after that we can just wait for the operator status to be available, why do we need to wait for progressing??

Current Behavior

the opeartor is stuck waiting for apiserver to update (move to progressing=true) although it's already updated...

Possible Solution

Steps to Reproduce (for bugs)

Context

Regression

Your Environment

Version used (cluster-relocation-operator):
Environment name and version (e.g. OCP v1.12.20):
Server type and version:
Operating System and version (uname -a):
Link to your deployment file:

The text was updated successfully, but these errors were encountered:

loganmc10 · 2023-09-12T19:00:35Z

I'm not sure how it could get stuck, this is the code for that function:

func WaitForCO(ctx context.Context, c client.Client, logger logr.Logger, operator string) error {
	logger.Info(fmt.Sprintf("Waiting for %s Progressing to be %s", operator, configv1.ConditionFalse))
	if err := waitStatus(ctx, c, logger, operator, configv1.OperatorProgressing, configv1.ConditionFalse); err != nil {
		return err
	}

	logger.Info(fmt.Sprintf("Waiting for %s Available to be %s", operator, configv1.ConditionTrue))
	if err := waitStatus(ctx, c, logger, operator, configv1.OperatorAvailable, configv1.ConditionTrue); err != nil {
		return err
	}
	return nil
}

It waits for OperatorProgressing to be False and for OperatorAvailable to be True. It doesn't wait for Progressing to become True, so if the operator is good, it should return right away. Are you sure that the kube-apiserver operator was reporting the desired status?

loganmc10 · 2023-09-12T19:06:54Z

I think it is actually getting stuck here:

	for _, v := range urls {
		updated := false
		for {
			conn, err := tls.Dial("tcp", v["url"], &tls.Config{InsecureSkipVerify: true})
			if err != nil {
				return err
			}
			certs := conn.ConnectionState().PeerCertificates
			conn.Close()
			for _, cert := range certs {
				if cert.Subject.CommonName == v["commonName"] {
					updated = true
				}
			}
			if updated {
				// ensure that ClusterOperator has settled
				if err := util.WaitForCO(ctx, r.Client, logger, v["type"]); err != nil {
					return err
				}
				break
			} else {
				logger.Info(fmt.Sprintf("Waiting for %s to update", v["type"]))
				time.Sleep(time.Second * 10)
			}
		}
	}

It is waiting for a certificate with a commonName of api.newDomain, whatever certificate you're using doesn't have a commonName that matches that name. The default API cert that comes with the cluster has this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The operator will wait forever for kube-apiserver to update in case the domain or apicert in the CR is replaced #100

The operator will wait forever for kube-apiserver to update in case the domain or apicert in the CR is replaced #100

eranco74 commented Sep 12, 2023

loganmc10 commented Sep 12, 2023

loganmc10 commented Sep 12, 2023

The operator will wait forever for kube-apiserver to update in case the domain or apicert in the CR is replaced #100

The operator will wait forever for kube-apiserver to update in case the domain or apicert in the CR is replaced #100

Comments

eranco74 commented Sep 12, 2023

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Regression

Your Environment

loganmc10 commented Sep 12, 2023

loganmc10 commented Sep 12, 2023