Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade jucx to 1.18 #12058

Merged
merged 4 commits into from
Feb 11, 2025
Merged

upgrade jucx to 1.18 #12058

merged 4 commits into from
Feb 11, 2025

Conversation

zpuller
Copy link
Collaborator

@zpuller zpuller commented Feb 3, 2025

Closes #11985

Upgrades jucx libs from 1.16 to 1.18.

Also upgrades shuffle example dockerfiles to 12.8 CUDA images, which support fabric memory handles and have support for grace-blackwell, which will be supported in UCX 1.18.1

Based off of #11147

Tested building docker images for ubuntu and rocky, running ucx_perftest in container, and running test queries with ucx shuffle enabled (on test clusters).

Signed-off-by: Zach Puller <[email protected]>
@zpuller zpuller requested a review from a team as a code owner February 3, 2025 22:18
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
@abellina
Copy link
Collaborator

build

@sameerz sameerz added the feature request New feature or request label Feb 10, 2025
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't move the CI to 12.8, just the example docker files.

@abellina
Copy link
Collaborator

build

@@ -16,15 +16,15 @@
# Sample Dockerfile to install UCX in a Rocky Linux 8 image.
#
# The parameters are:
# - CUDA_VER: 11.8.0 by default
# - CUDA_VER: 12.8.0 by default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description does not cover this change, is it needed?

@NvTimLiu
Copy link
Collaborator

LGTM, +1

@NvTimLiu NvTimLiu self-requested a review February 11, 2025 03:36
@GaryShen2008
Copy link
Collaborator

The original issue seems targeting for 25.04. Should we retarget this PR to branch-25.04?

@zpuller zpuller merged commit 5d8ab9d into NVIDIA:branch-25.02 Feb 11, 2025
51 checks passed
@zpuller
Copy link
Collaborator Author

zpuller commented Feb 11, 2025

The original issue seems targeting for 25.04. Should we retarget this PR to branch-25.04?

No we decided to explicitly get this one into 25.02, but good question

zpuller added a commit that referenced this pull request Feb 11, 2025
zpuller added a commit that referenced this pull request Feb 12, 2025
<!--

Thank you for contributing to RAPIDS Accelerator for Apache Spark!

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
   work on top of it, then remove the `[WIP]` label (if present).

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please avoid
rebasing your branch during the review process, as this causes the
context
of any comments made by reviewers to be lost. If conflicts occur during
review then they should be resolved by merging into the branch used for
   making the pull request.

Many thanks in advance for your cooperation!

-->
Reverts #12058

The above PR was found to break `ucx_perftest` with `-m cuda` in the
ubuntu rdma container in a CI test. This is caused specifically by the
CUDA version upgrade in the PR.

---------

Signed-off-by: Zach Puller <[email protected]>
zpuller added a commit that referenced this pull request Feb 12, 2025
<!--

Thank you for contributing to RAPIDS Accelerator for Apache Spark!

Here are some guidelines to help the review process go smoothly.

1. Please write a description in this text box of the changes that are
being
   made.

2. Please ensure that you have written units tests for the changes
made/features
   added.

3. If you are closing an issue please use one of the automatic closing
words as
noted here:
https://help.github.com/articles/closing-issues-using-keywords/

4. If your pull request is not ready for review but you want to make use
of the
continuous integration testing facilities please label it with `[WIP]`.

5. If your pull request is ready to be reviewed without requiring
additional
   work on top of it, then remove the `[WIP]` label (if present).

6. Once all work has been done and review has taken place please do not
add
features or make changes out of the scope of those requested by the
reviewer
(doing this just add delays as already reviewed code ends up having to
be
re-reviewed/it is hard to tell what is new etc!). Further, please avoid
rebasing your branch during the review process, as this causes the
context
of any comments made by reviewers to be lost. If conflicts occur during
review then they should be resolved by merging into the branch used for
   making the pull request.

Many thanks in advance for your cooperation!

-->
Reverts #12058

The above PR was found to break ucx_perftest with -m cuda in the ubuntu
rdma container in a CI test. This is caused specifically by the CUDA
version upgrade in the PR.

---------

Signed-off-by: Zach Puller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] upgrade to ucx 1.18
6 participants