Hermetic CUDA Toolkit #283

cloudhan · 2024-10-16T15:28:50Z

udaya2899 · 2024-11-21T15:30:58Z

@cloudhan, super excited for this! Thanks for starting the work on this. Roughly when do you see this to be done?

cloudhan · 2024-11-22T15:57:29Z

I am currently on my way of jumping ship, that is, I am joining NVIDIA ;). It may take sometime for me to settle down so it might take a little bit longer time. I'd hope I can have a working version by the end of next month.

honeway · 2024-12-05T12:09:33Z

First of all, I wish you all the best in your work! Thank you for your efforts on this. We're eagerly looking forward to seeing progress on this feature, as it’s something we truly need. Please let us know if there’s any way we can assist or contribute.

udaya2899 · 2024-12-12T11:49:43Z

@cloudhan, I hope your time at NVIDIA is going well! We're really excited about the possibilities of using hermetic CUDA in RBE. We're currently facing a decision about whether to build a temporary non-hermetic solution for RBE or wait for this issue to be resolved.

Could you give us an update on your plan here? Any information you can share would help us make the best decision for our project's roadmap. +1 to @honeway and we'll be happy to assist some way too.

hofbi · 2024-12-12T12:00:33Z

One this effort start, I can recommend using a rule based toolchain which was announced on the last BazelCon to be the modern way of writing toolchains.

cloudhan · 2024-12-12T12:47:26Z

@udaya2899 The cloudhan/hermetic-ctk-2 branch is actually working months ago with. Better test on it and provide some feedback.

cloudhan · 2024-12-12T14:20:10Z

@hofbi Seem to be very interesting. But it seems to be in a very early stage. Better just wait now.

cloudhan · 2024-12-23T16:40:14Z

For a preview,

https://github.com/cloudhan/cuda-samples/blob/bazel-cuda-components/WORKSPACE.bazel shows how a manually configured repo will be. Branch cloudhan/hermetic-ctk-2 contains the related feature.

https://github.com/cloudhan/cuda-samples/blob/bazel-cuda-redist-json/WORKSPACE.bazel shows how a automatically configured repo will be for WORKSPACE based project. Branch cloudhan/hermetic-ctk-3 contains the related feature.

udaya2899 · 2024-12-23T18:12:02Z

Thanks for working on this now. We're on holiday season and I couldn't get time to experiment with your dev branch until now.

Expect to hear from me by the second week of January.

Unfortunately, we don't support WORKSPACE in our setup, and only use MODULE.bazel. Which is the most recent branch to try from? Is MODULE.bazel considered working in the tmp branch? Or is it hermetic-ctk-2 branch?

udaya2899 · 2025-01-10T12:27:26Z

Happy New Year 2025! I'm just back from vacation and trying to try out your branch locally using git_override or local_path_override for giving some earlier feedback if any. Which branch has a possible working solution for MODULE.bazel? I see hermetic-ctk-2, hermetic-breaking-changes as well as tmp. Let me know what's the best way to try this out on our RBE setup.

cloudhan · 2025-01-10T14:53:39Z

@udaya2899 I updated previous comment. The branchs are stacked one by one, so blindly pick the last one should be OK.

cloudhan · 2025-01-10T14:58:40Z

You can also find MODULE base config in the referenced cuda-samples repo.

cloudhan · 2025-01-10T15:06:33Z

Auto config with redistrib.json in MODULE based project is not implemented at the moment. Maybe in future PRs. Another unsolved feature is how can we make switch cuda version easier. Say export or maybe a flag to build against different releases of cuda. A possible solution is to extend the current alias mapping to a versioned mapping with select in between.

vdittmer · 2025-01-29T08:44:15Z

Hi @cloudhan ! Thank you for your efforts on this!
Do you have an estimate on when version 0.3 with these changes could be released?

cloudhan · 2025-01-29T13:46:53Z

@vdittmer I think once I can confirm there is a non-breaking path toward multi-version deliverable toolchain, then I can proceed to marge those PRs.

cloudhan · 2025-02-13T09:09:57Z

Seem using a selected alias in @local_cuda can solve the problem.

config_setting(name="version")
constraint_setting(...)
constraint_value(...)  # version1
constraint_value(...)  # version1

alias(name="cublas", actual =select({
    ":<some_label_for_version_12_2>":  "@local_cuda_cublas_v12.2.y",
    "//conditions:default": "@local_cuda_cublas_v12.8.x",
}))

should do the trick here.

udaya2899 · 2025-02-14T16:48:58Z

Thanks to your awesome work, I was able to make progress. A small potential bug is the endswith("~") check here which fails with bazel v8.x as Bazel 8 and above uses + as the terminator. In general, it's not recommended to depend on this. Maybe we drop the check?

I was able to skip that error by changing to endswith("+") .

udaya2899 · 2025-02-17T16:22:15Z

For now, leaving #324 out and installing clang on the RBE system, and retrying this setup, I get:

(Exit 1): clang failed: error executing CudaCompile command (from target //intrinsic/gpu/gpu_adder:gpu_adder) /usr/bin/clang -x cu '--cuda-path=cuda-not-found' '-frandom-seed=bazel-out/k8-opt/bin/intrinsic/gpu/gpu_adder/_objs/gpu_adder/gpu_adder.cu.pic.o' -iquote . ... (remaining 124 arguments skipped)

clang: error: cannot find libdevice for sm_35; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice
clang: error: cannot find CUDA installation; provide its path via '--cuda-path', or pass '-nocudainc' to build without CUDA includes

I now understand that this is expected with the current setup since the path is scattered and we don't set it explicitly.

This seems okay for the nvcc compiler since the individual components are there already, and it doesn't expect a single cuda_path arg. For clang, whcih expects a --cuda-path, the official documentation only suggests to make them available at a single place and there is no way I see to let clang know they're all scattered and available at different repo paths.

How to proceed here? Since the CUDA code we have and want to build with Bazel is all clang-based and nvcc hermetic ctk isn't enough unfortunately.

An alternative proposal I have is to gather all the files, either through symlinks or copies, in a synthetic root directory and pass that to cuda-path.

For example (not verified to work):

load("@aspect_bazel_lib//lib:copy_to_directory.bzl", "copy_to_directory")

copy_to_directory(
    name = "create_cuda_root",
    srcs = [
        "@@rules_cuda//toolchain+local_cuda_cccl:cccl_all_files",
        "@@rules_cuda//toolchain+local_cuda_cudart:cudart_all_files",
        "@@rules_cuda//toolchain+local_cuda_nvcc:nvcc_all_files",
    ],
    out = "cuda_root",
    replace_prefixes = {
        "@rules_cuda//toolchain+local_cuda_cccl:cccl_all_files": "",
        "@rules_cuda//toolchain+local_cuda_cudart:cudart_all_files": "",
        "@rules_cuda//toolchain+local_cuda_nvcc:nvcc_all_files": "",
    },
    hardlink = "off",
)

and then use this rule's output to set cuda_path org in _detect_deliverable_cuda_toolkit's returning struct. Please let me know if there's a better solution in your mind.

Thanks in advance!

cloudhan · 2025-02-18T02:33:02Z

I think

gather all the files

is the only reasonable way to go. We don't want to be coupled with their abstraction.

rules_cuda/cuda/private/repositories.bzl

Lines 89 to 94 in 27d7499

    
           def _detect_deliverable_cuda_toolkit(repository_ctx): 
        
               # NOTE: component nvcc contains some headers that will be used. 
        
               required_components = ["cccl", "cudart", "nvcc"] 
        
               for rc in required_components: 
        
                   if rc not in repository_ctx.attr.components_mapping: 
        
                       fail('component "{}" is required.'.format(rc))

nvcc, cccl, and cudart are all required for nvcc toolchain. Generate a special repo for clang with all files colocated seems fine to me.

udaya2899 · 2025-02-21T10:32:35Z

I kinda made a dirty hack gathering all those files at one place. There were two problems:

The downloaded repos don't have a version.txt or version.json file.

This also doesn't let clang think the path is valid since I understand it tries to read the version.json file to validate the presence of a valid cuda-path

clang expects libcurand and throws up and error: fatal error: 'curand_mtgp32_kernel.h' file not found

I see we are installing libcurand-dev as a separate step in github-action in tests exclusively for clang.
I was able to fix this by adding curand as a component in MODULE.bazel and adding @local_cuda//:curand explicitly to my cuda_library target. Can we explicitly add this to compiler_deps or similar conditionally if compiler is clang?

I don't know if this problem is with our setup here (or even related to bazel/rules_cuda), but I get:

error: cannot specify -o when generating multiple output files

I get the same error even when using clang directly to compile the .cu file with clang and passing a cuda-path with the collected files.

udaya2899 · 2025-02-21T11:45:19Z

My finding with the error: cannot specify -o when generating multiple output files is that by default clang sets --cuda-compile-host-device flag which compiles code for both host and device generating two .o files but clang.bzl is only configured to expect one.

Is this enough to compile with a flag --cuda-device-only? If not, what's another alternative? We declare multiple .o outputs in the clang.bzl and link against both?

udaya2899 · 2025-02-21T13:10:32Z

For collecting those component files in a single place to pass to clang as --cuda-path, I haven't made it to work reliably. It works on my local build but not on RBE.

What I did as a prototype is copy those files from the corresponding local_cuda_<component>/<component>/{bin, include, lib, nvvm} paths to create a new folder called clang inside @local_cuda directly.

def config_clang(repository_ctx, cuda, clang_path):
    """Generate `@local_cuda//toolchain/clang/BUILD`

    Args:
        repository_ctx: repository_ctx
        cuda: The struct returned from `detect_cuda_toolkit`
        clang_path: Path to clang executable returned from `detect_clang`
    """
    is_local_ctk = None

    if len(repository_ctx.attr.components_mapping) != 0:
        is_local_ctk = False

    # for deliverable ctk, clang needs the toolkit as cuda_path
    if not is_local_ctk:
        nvcc_repo = components_mapping_compat.repo_str(repository_ctx.attr.components_mapping["nvcc"])
        cudart_repo = components_mapping_compat.repo_str(repository_ctx.attr.components_mapping["cudart"])
        cccl_repo = components_mapping_compat.repo_str(repository_ctx.attr.components_mapping["cccl"])

        libpath = "lib"  # any special logic for linux/windows difference?
        generate_version_json(repository_ctx)

        clang_cuda_path = repository_ctx.path("clang_cuda_toolkit")
        repository_ctx.execute(["mkdir", "-p", "clang_cuda_toolkit"])  # non-hermetic mkdir call

        source_paths = [
            repository_ctx.path(Label(nvcc_repo + "//:nvcc/bin")),
            repository_ctx.path(Label(nvcc_repo + "//:nvcc/include")),
            repository_ctx.path(Label(cudart_repo + "//:cudart/include")),
            repository_ctx.path(Label(cccl_repo + "//:cccl/include")),
            repository_ctx.path(Label(nvcc_repo + "//:nvcc/" + libpath)),
            repository_ctx.path(Label(cudart_repo + "//:cudart/" + libpath)),
            repository_ctx.path(Label(cccl_repo + "//:cccl/" + libpath)),
            repository_ctx.path(Label(nvcc_repo + "//:nvcc/nvvm")),
        ]

        for source_path in source_paths:
            repository_ctx.execute(["cp", "-r", str(source_path), clang_cuda_path])  # non-hermetic cp call
    
    # Generate @local_cuda//toolchain/clang/BUILD
    template_helper.generate_toolchain_clang_build(repository_ctx, cuda, clang_path)

And in detect_deliverable_cuda_toolkit before returning the struct with path as None, I change it to this clang_cuda_toolkit path:

cuda_path = str(Label("@local_cuda//:clang_cuda_toolkit"))
    return struct(
        path = str(cuda_path),  # scattered components
        version_major = cuda_version_major,
        version_minor = cuda_version_minor,
        nvcc_version_major = nvcc_version_major,
        nvcc_version_minor = nvcc_version_minor,
        nvcc_label = nvcc,
        nvlink_label = nvlink,
        link_stub_label = link_stub,
        bin2c_label = bin2c,
        fatbinary_label = fatbinary,
    )

With this method, the @local_cuda/clang_cuda_toolkit folder has the necessary include, lib, bin, nvvm paths and bazel build <some cuda_library target works locally (with no cuda installed on machine) but not in our RBE and fails as if it never found the cuda_toolkit:

clang-cpp: error: cannot find libdevice for sm_70; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice
clang-cpp: error: cannot find CUDA installation; provide its path via '--cuda-path', or pass '-nocudainc' to build without CUDA includes

Although I verified that inside the execution_root, the path it mentions indeed has the cuda toolkit collected files. Is there a better "bazel rule" way of doing this?

Sorry for the multiple comments. I'm posting my findings as I progress through this.

cloudhan · 2025-02-21T13:47:38Z

@jsharpe I'd like move on to 0.3.x, and merge the first two changes with minor fix for endswith("~"). I think I can just drop the check as we don't rely on the presumed repo name format anymore, we use the explicit mapping.

cloudhan mentioned this issue Dec 20, 2024

Move runtime to be inside the toolchain #299

Open

cloudhan mentioned this issue Dec 29, 2024

refactor!: change for hermetic with minor user interface breaking #300

Merged

udaya2899 mentioned this issue Feb 14, 2025

Using rules_cuda with hermetic clang #324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermetic CUDA Toolkit #283

Hermetic CUDA Toolkit #283

cloudhan commented Oct 16, 2024 •

edited

Loading

udaya2899 commented Nov 21, 2024

cloudhan commented Nov 22, 2024

honeway commented Dec 5, 2024

udaya2899 commented Dec 12, 2024

hofbi commented Dec 12, 2024 •

edited

Loading

cloudhan commented Dec 12, 2024

cloudhan commented Dec 12, 2024

cloudhan commented Dec 23, 2024 •

edited

Loading

udaya2899 commented Dec 23, 2024 •

edited

Loading

udaya2899 commented Jan 10, 2025

cloudhan commented Jan 10, 2025 •

edited

Loading

cloudhan commented Jan 10, 2025

cloudhan commented Jan 10, 2025

vdittmer commented Jan 29, 2025

cloudhan commented Jan 29, 2025 •

edited

Loading

cloudhan commented Feb 13, 2025

udaya2899 commented Feb 14, 2025

udaya2899 commented Feb 17, 2025

cloudhan commented Feb 18, 2025

udaya2899 commented Feb 21, 2025

udaya2899 commented Feb 21, 2025

udaya2899 commented Feb 21, 2025

cloudhan commented Feb 21, 2025

Hermetic CUDA Toolkit #283

Hermetic CUDA Toolkit #283

Comments

cloudhan commented Oct 16, 2024 • edited Loading

In 0.2.x

In 0.3.x

udaya2899 commented Nov 21, 2024

cloudhan commented Nov 22, 2024

honeway commented Dec 5, 2024

udaya2899 commented Dec 12, 2024

hofbi commented Dec 12, 2024 • edited Loading

cloudhan commented Dec 12, 2024

cloudhan commented Dec 12, 2024

cloudhan commented Dec 23, 2024 • edited Loading

udaya2899 commented Dec 23, 2024 • edited Loading

udaya2899 commented Jan 10, 2025

cloudhan commented Jan 10, 2025 • edited Loading

cloudhan commented Jan 10, 2025

cloudhan commented Jan 10, 2025

vdittmer commented Jan 29, 2025

cloudhan commented Jan 29, 2025 • edited Loading

cloudhan commented Feb 13, 2025

udaya2899 commented Feb 14, 2025

udaya2899 commented Feb 17, 2025

cloudhan commented Feb 18, 2025

udaya2899 commented Feb 21, 2025

udaya2899 commented Feb 21, 2025

udaya2899 commented Feb 21, 2025

cloudhan commented Feb 21, 2025

cloudhan commented Oct 16, 2024 •

edited

Loading

hofbi commented Dec 12, 2024 •

edited

Loading

cloudhan commented Dec 23, 2024 •

edited

Loading

udaya2899 commented Dec 23, 2024 •

edited

Loading

cloudhan commented Jan 10, 2025 •

edited

Loading

cloudhan commented Jan 29, 2025 •

edited

Loading