-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hermetic CUDA Toolkit #283
Comments
@cloudhan, super excited for this! Thanks for starting the work on this. Roughly when do you see this to be done? |
I am currently on my way of jumping ship, that is, I am joining NVIDIA ;). It may take sometime for me to settle down so it might take a little bit longer time. I'd hope I can have a working version by the end of next month. |
First of all, I wish you all the best in your work! Thank you for your efforts on this. We're eagerly looking forward to seeing progress on this feature, as it’s something we truly need. Please let us know if there’s any way we can assist or contribute. |
@cloudhan, I hope your time at NVIDIA is going well! We're really excited about the possibilities of using hermetic CUDA in RBE. We're currently facing a decision about whether to build a temporary non-hermetic solution for RBE or wait for this issue to be resolved. Could you give us an update on your plan here? Any information you can share would help us make the best decision for our project's roadmap. +1 to @honeway and we'll be happy to assist some way too. |
One this effort start, I can recommend using a rule based toolchain which was announced on the last BazelCon to be the modern way of writing toolchains. |
@udaya2899 The cloudhan/hermetic-ctk-2 branch is actually working months ago with. Better test on it and provide some feedback. |
@hofbi Seem to be very interesting. But it seems to be in a very early stage. Better just wait now. |
For a preview, https://github.com/cloudhan/cuda-samples/blob/bazel-cuda-components/WORKSPACE.bazel shows how a manually configured repo will be. Branch cloudhan/hermetic-ctk-2 contains the related feature. https://github.com/cloudhan/cuda-samples/blob/bazel-cuda-redist-json/WORKSPACE.bazel shows how a automatically configured repo will be for WORKSPACE based project. Branch cloudhan/hermetic-ctk-3 contains the related feature. |
Thanks for working on this now. We're on holiday season and I couldn't get time to experiment with your dev branch until now. Expect to hear from me by the second week of January. Unfortunately, we don't support WORKSPACE in our setup, and only use MODULE.bazel. Which is the most recent branch to try from? Is MODULE.bazel considered working in the tmp branch? Or is it hermetic-ctk-2 branch? |
Happy New Year 2025! I'm just back from vacation and trying to try out your branch locally using git_override or local_path_override for giving some earlier feedback if any. Which branch has a possible working solution for MODULE.bazel? I see hermetic-ctk-2, hermetic-breaking-changes as well as tmp. Let me know what's the best way to try this out on our RBE setup. |
@udaya2899 I updated previous comment. The branchs are stacked one by one, so blindly pick the last one should be OK. |
You can also find MODULE base config in the referenced cuda-samples repo. |
Auto config with redistrib.json in MODULE based project is not implemented at the moment. Maybe in future PRs. Another unsolved feature is how can we make switch cuda version easier. Say export or maybe a flag to build against different releases of cuda. A possible solution is to extend the current alias mapping to a versioned mapping with |
Hi @cloudhan ! Thank you for your efforts on this! |
@vdittmer I think once I can confirm there is a non-breaking path toward multi-version deliverable toolchain, then I can proceed to marge those PRs. |
Seem using a selected alias in config_setting(name="version")
constraint_setting(...)
constraint_value(...) # version1
constraint_value(...) # version1
alias(name="cublas", actual =select({
":<some_label_for_version_12_2>": "@local_cuda_cublas_v12.2.y",
"//conditions:default": "@local_cuda_cublas_v12.8.x",
})) should do the trick here. |
Thanks to your awesome work, I was able to make progress. A small potential bug is the I was able to skip that error by changing to |
For now, leaving #324 out and installing clang on the RBE system, and retrying this setup, I get:
I now understand that this is expected with the current setup since the This seems okay for the nvcc compiler since the individual components are there already, and it doesn't expect a single How to proceed here? Since the CUDA code we have and want to build with Bazel is all clang-based and nvcc hermetic ctk isn't enough unfortunately. An alternative proposal I have is to gather all the files, either through symlinks or copies, in a synthetic root directory and pass that to cuda-path. For example (not verified to work):
and then use this rule's output to set cuda_path org in Thanks in advance! |
I think
is the only reasonable way to go. We don't want to be coupled with their abstraction. rules_cuda/cuda/private/repositories.bzl Lines 89 to 94 in 27d7499
nvcc, cccl, and cudart are all required for nvcc toolchain. Generate a special repo for clang with all files colocated seems fine to me. |
I kinda made a dirty hack gathering all those files at one place. There were two problems:
I don't know if this problem is with our setup here (or even related to bazel/rules_cuda), but I get:
I get the same error even when using clang directly to compile the |
My finding with the Is this enough to compile with a flag |
For collecting those component files in a single place to pass to clang as What I did as a prototype is copy those files from the corresponding
And in
With this method, the
Although I verified that inside the execution_root, the path it mentions indeed has the cuda toolkit collected files. Is there a better "bazel rule" way of doing this? Sorry for the multiple comments. I'm posting my findings as I progress through this. |
@jsharpe I'd like move on to 0.3.x, and merge the first two changes with minor fix for |
This issue track the progress of Hermetic CUDA Toolkit implementation.
In 0.2.x
:compiler_deps
tocuda_toolchain
#301In 0.3.x
load("@rules_cuda_redist_json//:redist.bzl", "rules_cuda_components")
#286The text was updated successfully, but these errors were encountered: