Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

feat: QuarksJob to rotate all secrets #1346

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mook-as
Copy link
Contributor

@mook-as mook-as commented Sep 17, 2020

Description

This adds a job to trigger secret rotation for all generated secrets.

This is useful for administrators to rotate all secrets (generally as a precaution); attempting to document manual instructions is likely to be error-prone.

Motivation and Context

This is in support of #703 — trying to get the administrator/operator to type in a script correctly is going to be painful; much better to automate it instead.

How Has This Been Tested?

Triggered the QuarksJob locally to see that secrets are rotated.

Generally speaking though, my experiences has tended towards this actually breaking the cluster (because the database password gets out of sync). That will be filed as a bug afterwards, because the user can already manually rotate the secrets; this just automates the process.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code has security implications.
    (new service account + role + role binding; all minimally scoped)
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
    (That will be a follow-up)

This adds a job to trigger secret rotation for all generated secrets.

This is useful for administrators to rotate all secrets (generally as a
precaution); attempting to document manual instructions is likely to be
error-prone.
@mook-as mook-as added the Type: Enhancement New feature or request label Sep 17, 2020
# secret-rotation is not a BOSH release. It requires ruby & kubectl.
image:
repository: splatform/fissile-stemcell-sle
tag: SLE_15_SP1-27.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really happy with this; suggestions welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mook-as create an issue for creating such a base (we'll likely have to do it on OBS)

name: secret-rotation
namespace: {{ .Release.Namespace | quote }}
labels:
{{- list $ "secret-rotation" | include "component.labels" | nindent 4 }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| component.labels (list $ $component)

Copy link
Member

@viovanov viovanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If running this breaks the cluster because of the db password and/or encryption keys, please exclude them from rotation.

# secret-rotation is not a BOSH release. It requires ruby & kubectl.
image:
repository: splatform/fissile-stemcell-sle
tag: SLE_15_SP1-27.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mook-as create an issue for creating such a base (we'll likely have to do it on OBS)

@mook-as
Copy link
Contributor Author

mook-as commented Sep 21, 2020

Hmm, doing more testing, I seem to have issues with rotating secrets:

While I'm rotating everything, I'll focus on the locket database password:

  • The secret var-locket-database-password is correctly rotated (as in, it gets a new randomly generated value).
  • ig-resolved.diego-api-v1 is not updated; it has the old value. There is no v2.
  • desired-manifest-v1 also has the old value. There is also no v2.
  • I've confirmed that the diego-api QuarksStatefulSet does have updateOnConfigChange: true.

This leads to database connection errors (because the database-seeder is re-run, changing all the passwords). Not sure yet why the ig job isn't being re-started.

@mook-as
Copy link
Contributor Author

mook-as commented Sep 21, 2020

This is now blocked by cloudfoundry-incubator/quarks-secret#55 — it turns out that rotating the CA certificate does not rotate the certificate that was made with that CA. This means that if we rotate everything, there's a chance the CA certificate is rotated after the certificates it signs, causing those to no longer verify with the (new) CA certificate.

We also have an issue with cloudfoundry-incubator/quarks-operator#1136 where the various QuarksStatefulSets do not get updated when the secrets they depend on change; this means that we end up not being able to restart things on connection loss. Fortunately that could be worked around by adding annotations to the various things (to cause them to get updated correctly), even though there's a feature that should make the annotations unnecessary.

@mook-as mook-as marked this pull request as draft September 21, 2020 23:20
Attempting to get things in a state where rotation will not break the
deployment.
We need the instance groups to be updated when their dependent secrets
are updated, so that the pods can pick up the new secrets.  Otherwise
the instance groups in flight will not match the secrets stored in
Kubernetes, and can cause issues later.
Print out the names of the databases we are updating in the
database-seeder job, as otherwise we have no output and it is difficult
to determine if we were doing anything useful.
It requires it now and falls over if it can't contact the server.
@mook-as mook-as added the Status: Blocked Dependencies on other issues and/or pull requests label Sep 23, 2020
@viovanov
Copy link
Member

@mook-as this originated as a documentation story;
can this still be done manually? we can work on automation later.

@mook-as
Copy link
Contributor Author

mook-as commented Sep 29, 2020

@viovanov I do not believe attempting to get users to do this manually is reasonable:

  • Finding all the secrets will be error-prone
  • Even if they did, it will still break because quarks-operator doesn't update dependent secrets correctly; this means rotating the wrong secret will still break the cluster some of the time.

@fargozhu fargozhu added this to the 2.6.0 milestone Oct 11, 2020
@fargozhu fargozhu removed this from the 2.6.0 milestone Oct 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Status: Blocked Dependencies on other issues and/or pull requests Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants