Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment gating #54

Open
jku opened this issue Feb 23, 2024 · 6 comments
Open

Deployment gating #54

jku opened this issue Feb 23, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@jku
Copy link
Member

jku commented Feb 23, 2024

Currently there is no (human) deployment gating: if the tests pass, a new repository is published whenever there are changes whether those changes are timestamp updates or more meaningful changes.

Some questions to answer (for both staging and production):

  • Do we want gating?
  • For all changes? Or maybe only for changes that cover something else than timestamp.json alone?
  • is GitHub release environment (with reviewers) an appropriate choice?
@jku jku added the enhancement New feature or request label Feb 23, 2024
@haydentherapper
Copy link
Contributor

Assuming our tests are sufficient, and we test at least one newer Sigstore client and Cosign, no gating long-term seems reasonable.

In the short-term, while we're gaining confidence over the process, I would gate for root updates only. Timestamp updates seem safe enough to automatically deploy.

Separate question, do we expect target file updates outside of root updates? Does tuf-on-ci allow for that? Previously, these would always happen at the same time, given that we had to gather root key holders. If we do expect that, maybe we also gate on target file updates, so that we can confirm that the target file works in whatever context it's used in.

@jku
Copy link
Member Author

jku commented Feb 26, 2024

Separate question, do we expect target file updates outside of root updates? Does tuf-on-ci allow for that?

Yes this is allowed. Multiple roles metadata can be changed in one signing event but that is not guaranteed in any way.

(a future feature request is likely an easy way to force role resigning: currently you'll get separate signing events when a role is about to expire -- it would make sense if maintainers could easily decide that "let's resign targets now as well since we're all signing root already") theupdateframework/tuf-on-ci#198

@jku
Copy link
Member Author

jku commented Mar 7, 2024

I've got a POC running in https://github.com/jku/tuf-on-ci-sigstore-test:

  • My original idea was to use gcloud storage rsync --dry-run to find out if the deployment can be automatic or not: unfortunately the gcloud storage "API" is not great: We'll need to use an actual API if we want to do this without downloading the files. So current implementation downloads all current GCS content to do the comparison locally (like prod root-signing does)
  • The POC does not actually use the environment even though it is defined: This is likely because deploy-to-gcs.yml is a reusable workflow which does not have its own "workflow identity": so it can't use an environment either. I will have to refactor the deployment to two separate pieces:
    1. deploy-to-gcs-stage-1 (figures out if this is a timestamp change only, deploys timestamp if yes)
    2. deploy-to-gcs-stage-2 (only runs if this is not only a timestamp change, is gated behind deploy env, deploys full repo)

@jku
Copy link
Member Author

jku commented Mar 7, 2024

I will have to refactor the deployment to two separate pieces:

this did not work either: environments cannot be used with a "uses:" job.

There are now two working approaches:

@jku
Copy link
Member Author

jku commented Mar 27, 2024

Documenting the state of things:

  • The draft PR is functional
  • there has been quite a lot of discussion and it is still not clear if adding deployment gating is worth the complexity: with the improved automated client testing there does not seem to be much reason for it
    • just about the only thing I can think of is issues in test workflows: test workflows could be buggy and could result in green checkmarks even though something failed
  • I am still willing to implement this if there are strong opinions ...

@jku jku mentioned this issue Mar 29, 2024
@jku
Copy link
Member Author

jku commented May 8, 2024

Copying some content from the draft PR before I close it :

The flow currently looks like this:

graph TD;
    merge{"signing event<br/>(merges to 'main')"}-->online-sign;
    online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
    online-sign-->publish-pages[publish to Pages];
    publish-pages-->test-pages[test Pages with clients];
    test-pages--this is the critical point where we can add manual review-->publish-gcs[publish to GCS];
    publish-gcs-->test-gcs[test GCS with clients];    
Loading

The takeaways on branches and deployments are

  • both main and publish branch merges as well as pages deployment are done before tests
  • automated (pages) tests prevent deployment to GCS if tests fail
  • GCS publish is also tested but that is informational: the tests do not prevent anything from happening

So the question is this: will maintainers do additional manual testing using the Pages-published repository before publish to GCS if we give them the chance?

Potential human review flow

  • Rhombus is used to signify human interaction in the diagram
  • The downside of this design is some additional complexity in the publishing workflow
  • The draft pr implements this
graph TD;
    merge{"signing event<br/>(merges to 'main')"}-->online-sign;
    online-sign-period[online role in signing period]-->online-sign["online signing<br/>(merges to 'main' and 'publish')"];
    online-sign-->publish-pages[publish to Pages];
    publish-pages-->test-pages[test Pages with clients];
    test-pages--if only timestamp changes-->publish-gcs-light[publish timestamp to GCS];
    test-pages--if additional metadata or targets changes-->deployment{Deployment review<br/>or delay};
    deployment-->publish-gcs-full[publish full repository to GCS];
    publish-gcs-light-->test-gcs[test GCS with clients];    
    publish-gcs-full-->test-gcs[test GCS with clients];    
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants