Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace IBM Cloud with Google Cloud Platform as 3rd preservation endpoint #2237

Open
julianmorley opened this issue May 26, 2023 · 1 comment

Comments

@julianmorley
Copy link
Member

Headline: Stanford's cloud security posture has changed and we must now use only UIT-approved cloud storage providers (Cardinal Cloud). IBM is not an approved cloud storage provider and is not on the roadmap to become one, therefore we must cease using it as soon as practical.

When Prescat was first implemented we needed two cloud providers with S3 APIs and archival storage. At the time the only 3 providers who met that requirement were IBM, Oracle and AWS (and Cardinal Cloud did not exist). AWS was an obvious choice. IBM won over Oracle because we required (non-archival) S3 storage for our cloud-based IBM Tivoli install, so going with IBM for preservation as well allowed us to leverage an existing account structure.

Much has changed since then. We have moved our TSM storage to Wasabi and both Azure and GCP have implemented S3-compatible APIs and archival storage to their offerings. And most critically, Stanford's cloud security posture has also changed, requiring us to cease using non-approved cloud storage providers.

This may benefit being broken up into 3 distinct phases:

  • Add GCP as an endpoint (QA/test & prod)
  • Replicate content to GCP (will take "a while")
  • Remove IBM as an endpoint (stop replicating content, remove code references, delete from cloud)

A GCP project has already been created and is ready for use. Ping me on a non-public forum for details and access.

@jmartin-sul
Copy link
Member

discussed briefly with @vivnwong today, and this is the first pass TODO list that i slacked to her afterwards. it gets into more detail, but ultimately i think it agrees with @julianmorley's higher level description above? corrections welcome, of course:

  1. Ops provisions the GCP S3 buckets for QA, stage, prod. As part of that, they provision credentials (access_key_id and secret_access_key) and grant those credentials appropriate write access to the new S3 buckets (write-once or something like that, iirc?).
  2. Small pres cat code change: add a new delivery class for the new endpoint, update the ZipEndpoint class to add that new class to its delivery_class enum, and add a new S3 provider for GCP (to be used by the new GCP delivery class). This should be a straightforward change, as the delivery classes and providers are a relatively thin wrapper around some centralized code of ours and around some classes from the AWS SDK that work with any S3 compatible vendor (so existing code should make for good examples).
  3. We add the new stage and QA endpoints to shared_configs, and run the db:seed rake task to update the config in the DB (a deploy should do that automatically, fwiw).
  4. We backfill the stage and QA endpoints. We'll probably have to push some stuff through manually if we want to test any significant volume quickly (as opposed to just waiting for the scheduled replication audit runs to automatically backfill, since the audit will do a small chunk of the least recently audited druids every night, and all druids that are 90 days or more since last check every week).
  5. We audit the stage and QA replications once they're done (proactively, all at once). We might also want to randomly pull down some GCP-replicated content to spot check the archived Moabs with checksum validation, just to be extra sure things are working (since automated full checksum validation isn't something we got around to implementing for replicated content, just on prem content).
  6. Once we're satisfied with those audit results, we clean up the old endpoints on stage and QA: remove them from shared_configs, delete the zipped_moab_versions and zip_parts rows for those endpoints from the database, remove the database entry for the IBM endpoint from zip_endpoints.
  7. Once we're satisfied that the stage/QA practice run went well, we repeat steps 3 through 6 for prod.
  8. Once we're satisfied with the prod migration, we remove the couple of small IBM specific classes from the pres cat codebase (the delivery class, its reference in the ZipEndpoint.delivery_class enum, the IBM S3 provider class).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants