-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ephemeral volume claims to kubernetes/argo #2103
base: master
Are you sure you want to change the base?
Conversation
Also, can you help me with the scenarios you have been able to test (across @kubernetes, @parallel - locally as well as with argo-workflows and airflow) and the outputs. I am particularly curious about the unhappy paths - particularly what happens when a lot of data (TBs) is written to the EBS volume - how does that impact workload termination. Additionally, the UX can be potentially simplified significantly - in line with the UX for persistent_volume_claims. |
I'll work on running through the scenarios below:
I don't have access to airflow, so that one will be harder to test re: impact workload termination: is your concern that the step will fail to terminate if a lot of data is being written during the request to terminate?
Are you thinking just having UPDATE: I added the tests I've run through to the PR description |
tested this with |
I'm going back and forth regarding the UX with On one hand exposing the controls seems like a valid use case for fine-tuning the ephemeral storage. So far I'm slightly leaning towards passing the raw |
At a minimum, I think you'd need:
FWIW this API has been "stable" since 1.25 |
As defined here: https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes
This would allow a single step to have a dynamically attached "ephemeral volume" dedicated to itself (rather than a pvc which needs to be created before running the job)
Tested:
with
hello_cloud.py
example:I've tested:
python metaflow/tutorials/05-hello-cloud/hello-cloud.py run
python metaflow/tutorials/05-hello-cloud/hello-cloud.py run --with kubernetes:ephemeral_volume_claims='{"my-temp-volume":{"path":"/my_temp_volume"}}'
python metaflow/tutorials/05-hello-cloud/hello-cloud.py argo-workflows create + trigger
And verified that the flow runs successfully, and that the ephemeral volume is created / destroyed as intended.
Also, I tested:
python metaflow/tutorials/05-hello-cloud/hello-cloud.py airflow create my_flow.py
returns an errorI haven't yet tested
@parallel
because I don't have access to a cluster with JobSet installed.. If I did, I would run:python test/parallel/parallel_test_flow.py run --with kubernetes:ephemeral_volume_claims='{"my-temp-volume":{"path":"/my_temp_volume"}}'