Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: Breaking models weights out of model images #752

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

YrrepNoj
Copy link
Member

@YrrepNoj YrrepNoj commented Jul 10, 2024

Related to #623

Copy link

netlify bot commented Jul 10, 2024

Deploy Preview for leapfrogai-docs canceled.

Name Link
🔨 Latest commit dc555c6
🔍 Latest deploy log https://app.netlify.com/sites/leapfrogai-docs/deploys/668eb115515668000880c04a

Copy link
Collaborator

@barronstone barronstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this approach implemented!

Proposed

## Context
GenerativeAI models are big. Because LeapfrogAI is designed to be deployable into AirGapped environments, we need to ensure that we are bringing the big GenerativeAI models with us. Currently, we are brining the AI models with us by backing them into our container images. For example, [we download synthia into our llama-cpp-python image](https://github.com/defenseunicorns/leapfrogai/blob/d1e42d9296f6e014ffbbcec2ba295443b1675567/packages/llama-cpp-python/Dockerfile#L15) and here we [download whisper](https://github.com/defenseunicorns/leapfrogai/blob/d1e42d9296f6e014ffbbcec2ba295443b1675567/packages/whisper/Dockerfile#L14) into our whisper image. Some of the models we are trying to use are large (several GBs).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mispelled "baking"

- The initialization time of pods is increased because of time spent moving the containers OCI layers into the pod.

## Decision
While no decision has been made yet, I am leaning towards proposing we go with the simplest solution of using PVCs to manage our GenAI models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still in "proposed" status, I would state PVCs as the decision rather than "leaning towards proposing" ...you are proposing!



## Rationale
N/A as no [Decision](#Decision) has been made yet.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide (proposed) rationale for PVC path forward

### Raw PVC Attachments
[k8s PVC Docs](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)

Maybe the simplest solution is the best solution? We can create a PersistantVolume for each model that gets populated during deploy time. This PersistantVolume will be mounted by all of the Pods that want to use that model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say PersistantVolume here, but then use the acronym PVC (PersistentVolumeClaim) elsewhere, which is a request for the PV resources. Be consistent and/or clarify the relationship in a way non-k8s folks like me can understand easily when reading.

Would also be good to define the PVC acronym somewhere in this doc. You currently spell out PersistantVolume, but jump straight to using PVC as an acronym.



Cons of PVC:
- Hard to optimize re-deploys (If the model weights don't change) and benefit from caching.
Copy link
Collaborator

@barronstone barronstone Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why it's hard to optimize re-deploys? (Maybe it's obvious to k8s folks, but I don't understand.)

also nit: the "I" in "If" is capitalized

@justinthelaw justinthelaw added the ADR 🧐 Architecture Decision Record label Sep 4, 2024
@jalling97 jalling97 linked an issue Sep 5, 2024 that may be closed by this pull request
@barronstone
Copy link
Collaborator

Greetings @YrrepNoj

@barronstone
Copy link
Collaborator

Howdy @YrrepNoj

@barronstone
Copy link
Collaborator

Good morning @YrrepNoj

@barronstone
Copy link
Collaborator

Happy Friday @YrrepNoj

@barronstone
Copy link
Collaborator

Hello @YrrepNoj?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADR 🧐 Architecture Decision Record
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spike: model files pvc image hardening example
4 participants