Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex AI CUDA Toolkit CTK mismatch for pip packages #516

Open
jacobtomlinson opened this issue Feb 10, 2025 · 0 comments
Open

Vertex AI CUDA Toolkit CTK mismatch for pip packages #516

jacobtomlinson opened this issue Feb 10, 2025 · 0 comments
Labels
bug Something isn't working cloud/gcp Google Cloud

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Feb 10, 2025

As of writing there is a mismatch between the NVIDIA Driver CUDA version, CUDA Toolkit and packages in the default Python environment on Vertex AI.

Driver CUDA: 12.4
CUDA Toolkit: 11.8
Python packages: cupy-cuda12x

$ nvidia-smi
Mon Feb 10 14:51:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:04.0 Off |                    0 |
| N/A   41C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
$ ls -ld /usr/local/cuda*
lrwxrwxrwx  1 root root   21 Dec 25 04:27 /usr/local/cuda -> /usr/local/cuda-11.8/
drwxr-xr-x 17 root root 4096 Dec 25 04:28 /usr/local/cuda-11.8
$ pip freeze | grep cuda
cupy-cuda12x==13.3.0

Challenges

Due to the system CTK being 11.8 it's not possible to use pip to install packages like cudf-cu12 into the default environment as they will raise errors when looking the CTK 12. However, if you install cudf-cu11 into the environment it will being in cupy-cuda11x and will then conflict with cupy-cuda12x.

Workarounds

We can work around this by creating a new conda environment and registering it as an ipykernel for Jupyter to use.

There are two upsides to this:

  1. There are no conflicts with existing packages or CTK versions.
  2. Environments managed by conda can bring in their own CTK packages, which means we can use CUDA 12 because there is already a new enough driver.
@jacobtomlinson jacobtomlinson added bug Something isn't working cloud/gcp Google Cloud labels Feb 10, 2025
@jacobtomlinson jacobtomlinson marked this as a duplicate of #517 Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cloud/gcp Google Cloud
Projects
None yet
Development

No branches or pull requests

1 participant