-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take env vars from .env or shell vars #543
base: main
Are you sure you want to change the base?
Conversation
RAY_WORKER_GPUS ?= "0.0" | ||
RAY_WORKER_GPUS_FRACTION ?= "0.0" | ||
INFERENCE_PIP_REQS ?= ../jobs/inference/requirements_cpu.txt | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think including the .env
here is perfect , but, shouldn't we put this vars that you just added in the .env
as well? you can put defaults in the .env.example
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this affect cloud/hybrid settings? On those, the user should be able to use a GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to #453 (comment) from @aittalam. If we keep the PR as is, then I'll include the .env
entries. We may use some other, more flexible, per-job way of specifying non-gpu requirements at a later stage.
@ividal this is resolved at lumigator startup time currently. Whoever creates the cloud service would be responsible to set this value appropriately (e.g. the helm charts provided with lumigator). AFAICT these Makefiles would not be used in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! FYI, explicitly setting RAY_WORKER_GPUS and RAY_WORKER_GPUS_FRACTION solved the integration test problems on #519 .
My only concern is how test/regular use differ from each other and how that affects GPU users.
RAY_WORKER_GPUS ?= "0.0" | ||
RAY_WORKER_GPUS_FRACTION ?= "0.0" | ||
INFERENCE_PIP_REQS ?= ../jobs/inference/requirements_cpu.txt | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this affect cloud/hybrid settings? On those, the user should be able to use a GPU.
-include .env | ||
# GPU related settings | ||
# | ||
RAY_WORKER_GPUS ?= "0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to avoid issues with users that do have a GPU: check if "nvidia-smi" is present, and if it isn't force cpu mode.
This would not work with AMD GPU users, but my assumption is the README request to set the vars via .env file, coupled with your "?=" takes care of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to avoid issues with users that do have a GPU: check if "nvidia-smi" is present, and if it isn't force cpu mode.
My impression is that we have a few settings (like the Ray GPU env vars and the requirements.txt
repo url prefix lines) that we need to coordinate. I don't really like auto-detection since imho the tools themselves (vendor tools like this, python settings, ray itself...) don't seem so well coordinated, but issuing a warning or some logs seems ok to me ("GPU selected, but no nvidia-smi present" or the like).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So...
- is the user responsible for setting GPUs? [Y/n]
- is the default behaviour is CPU? [Y/n]
- how exactly does a user force GPU usage? [On the .env via RAY_WORKER_GPU*/...]
we just need to be explicit in our guide as to how the user can run lumigator with GPU support. - TODO:
- here: add log "setting GPU/CPU" so the user knows what's going on
- separate issue: document GPU/CPU usage in our user docs.
Does this sound about right?
@@ -0,0 +1,11 @@ | |||
--extra-index-url https://download.pytorch.org/whl/cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is only used in tests? What gets installed for regular usage and what happens to people who do not have a GPU in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somehow addressed in
* INFERENCE_PIP_REQS (default value `../jobs/inference/requirements_cpu.txt`) |
What's changing
The Makefile will now incorporate the
RAY_WORKER_GPUS
,RAY_WORKER_GPUS_FRACTION
andINFERENCE_PIP_REQS
environment variables from the shell or the.env
file (if it exists). This allows selective use of GPU and prevents tasks from being stuck in pending for cpu-only tests.How to test it
From the base directory:
Tests shall pass, but the Ray dashboard will show failed jobs (instead of pending).
Additional notes for reviewers
N/A
I already...
/docs
)