Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade platform Kubernetes to latest stable version #17

Open
rkalyanapurdue opened this issue Apr 10, 2024 · 6 comments
Open

Upgrade platform Kubernetes to latest stable version #17

rkalyanapurdue opened this issue Apr 10, 2024 · 6 comments
Assignees
Labels
upgrade Version upgrade of dependencies

Comments

@rkalyanapurdue
Copy link
Collaborator

The production Kubernetes cluster on Jetstream2 is running a very old version of Kubernetes - v1.22. This needs to be upgraded ASAP.

@rkalyanapurdue rkalyanapurdue added the upgrade Version upgrade of dependencies label Apr 10, 2024
@rkalyanapurdue
Copy link
Collaborator Author

This needs to happen soon since we are unable to add new nodes to the Kubernetes cluster as v1.22 cannot be installed from package repos any more.

@rkalyanapurdue
Copy link
Collaborator Author

Had a repeat of the NFS access issues (#11) following Jetstream2 outage, requiring new Kubernetes worker nodes to be created with the old v1.22

@fbaig fbaig added this to the August 2024 release milestone May 22, 2024
@rkalyanapurdue
Copy link
Collaborator Author

Tentatively plan for week of Jul 7

@rkalyanapurdue
Copy link
Collaborator Author

Steps:

  1. Setup a Kubernetes cluster using the TerraForm/Ansible recipe
  2. Make sure to retain current floating IP for the public JupyterHub since it is tied to the DNS entry
  3. Modify NFS server exports to add the right fixed IPs for the new cluster nodes

@rkalyanapurdue
Copy link
Collaborator Author

We will do this after the Summer School since we don't want any disruptions then.

@rkalyanapurdue
Copy link
Collaborator Author

  • Determine latest stable Kubernetes version
  • Deploy a new Kubernetes cluster in the dev allocation using TF and Ansible
  • For now, the MetalLB stuff will still be manual (including moving over the floating IP mapped to the domain)
  • (optional) test integrating a GPU VM into the cluster and benchmark time taken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upgrade Version upgrade of dependencies
Projects
None yet
Development

No branches or pull requests

3 participants