Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Support Intel GPU #36493

Closed
wants to merge 7 commits into from
Closed

Conversation

harborn
Copy link
Contributor

@harborn harborn commented Jun 16, 2023

Why are these changes needed?

This PR aim to support Intel GPU for Ray.
This PR make minimum changes to support the Intel GPU, and commonly, Intel GPU is called XPU

Upgrade list:

  1. Define GPU is common accelerator type. CUDA is GPU, XPU is also GPU
  2. Define an environment variable RAY_ACCELERATOR to specific accelerator type
  3. Using the same scheduling policy of CUDA device

Usage compare:

CUDA XPU
1. define the environment variable RAY_ACCELERATOR="CUDA", or not define this environment variable
2. ray init with cuda: ray.init(num_gpus=3)
3. use CUDA resource in remote function
1. define the environment variable RAY_ACCELERATOR="XPU"
2. ray init with xpu: ray.init(num_gpus=3)
3. use XPU resource in remote function

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

harborn added 2 commits June 20, 2023 10:05
Signed-off-by: Wu, Gangsheng <[email protected]>
Signed-off-by: Wu, Gangsheng <[email protected]>
@harborn
Copy link
Contributor Author

harborn commented Jun 20, 2023

Can you help to remove additional reviewers? @jjyao

@rkooo567 rkooo567 self-assigned this Jun 20, 2023
Signed-off-by: harborn <[email protected]>
@harborn harborn reopened this Aug 18, 2023
@harborn harborn closed this Aug 18, 2023
@harborn
Copy link
Contributor Author

harborn commented Aug 18, 2023

please switch to PR https://github.com/ray-project/ray/pull/38553

@harborn
Copy link
Contributor Author

harborn commented Aug 18, 2023

Thanks for the contribution and the answers to our questions. I have some high-level questions (see comments) to get out of the way; once those are answered I will go through and do a code-review pass.

Not sure how GPU related code got tested in the CI. Does it have a real GPU to run or just mock the test?

To say we have full XPU support we should have a real XPU test. But since this PR seems to just implement logical scheduling, for now we just need tests which mock out the XPU detection and validates that Ray can schedule XPUs. This PR should include XPU versions of the following tests:

@xwu99 yes we are in process of creating a composite CI for xpu tests ( this is in development and collaboration with other orgs such as huggingface, pt lightning to name a few). This will only happen post the pr I raise,( and also this pr which gets merged) . I maintain the pt ecosystem( huggingface for xpus as mentioned) so ci and all would be post full support. I'll raise the pr next week.

Nice! Can I read more about this effort?

Also added 3 test on PR https://github.com/ray-project/ray/pull/38553

@harborn harborn mentioned this pull request Aug 24, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants