Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #5955: Persist TLS certificate and key of antrea-controller (#5955) #6004: Fix race condition in pkg/apiserver/certificate unit tests #6068

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Mar 6, 2024

Cherry pick of #5955 #6004 on release-1.15.

#5955: Persist TLS certificate and key of antrea-controller (#5955)
#6004: Fix race condition in pkg/apiserver/certificate unit tests

For details on the cherry pick process, see the cherry pick requests page.

tnqn and others added 2 commits March 6, 2024 11:05
In clusters where upgrade is performed with rolling update of Nodes and
images of new versions are only available on new Nodes, the deployment
strategy of antrea-controller is set to RollingUpdate to prevent
antrea-controller Pod from being deleted immediately when the deployment
is updated, leading to a period in which no antrea-controller is
running. However, it also causes two instances of antrea-controller to
run simultaneously in a short time, making it possible that the old
instance overrides the CA bundle stored in antrea-ca ConfigMap,
APIServices, and Webhooks, while the new instance won't update them
again.

The commit makes two changes to fix the problem:

1. CACertController will periodically sync the CA cert to improve the
   robustness.

2. Self-signed TLS certificate and key of antrea-controller will be
   stored in a Secret and will be reused after restarting controller.
   This makes running multiple antrea-controller instances
   simultaneously possible and makes restart of antrea-controller
   smoother as antrea-agents don't need to retrieve a new CA bundle most
   of the time.

Besides, the change is helpful for implementing high-availability of
antrea-controller in the future.

Signed-off-by: Quan Tian <[email protected]>
…#6004)

There was some "interference" between TestSelfSignedCertProviderRotate
and TestSelfSignedCertProviderRun. The root cause is that the
certutil.GenerateSelfSignedCertKey does not support a custom clock
implementation and always calls time.Now() to determine the current
time. It then adds a year to the current time to set the expiration time
of the certificate. This means that when rotateSelfSignedCertificate()
is called as part of TestSelfSignedCertProviderRotate, the new
certificate is already expired, and rotateSelfSignedCertificate() will
be called immediately a second time. By this time however,
TestSelfSignedCertProviderRotate has already exited, and we are already
running the next test, TestSelfSignedCertProviderRun. This creates a
race condition because the next test will overwrite
generateSelfSignedCertKey with a mock version, right as it is called by
the second call to rotateSelfSignedCertificate() from the previous
test's provider.

To avoid this race condition, we make generateSelfSignedCertKey a member
of selfSignedCertProvider.

Fixes antrea-io#5977

Signed-off-by: Antonin Bas <[email protected]>
Co-authored-by: Quan Tian <[email protected]>
@tnqn tnqn added the kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release label Mar 6, 2024
@tnqn
Copy link
Member Author

tnqn commented Mar 6, 2024

Backporting this as it's required by users who have set antrea-controller's deployment strategy to RollingUpdate. The risk discussed in #5955 (comment) doesn't exist as the next release 2.0 will have the patch.

cc @antoninbas @edwardbadboy

@tnqn
Copy link
Member Author

tnqn commented Mar 7, 2024

/skip-all

@tnqn tnqn merged commit da00d00 into antrea-io:release-1.15 Mar 7, 2024
49 of 53 checks passed
@tnqn tnqn deleted the automated-cherry-pick-of-#5955-#6004-upstream-release-1.15 branch March 7, 2024 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants