Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Torch.dynamo is not working on H100 due to obsolete triton & pytorch #330

Open
1 of 2 tasks
Artyom17 opened this issue Jul 25, 2023 · 0 comments
Open
1 of 2 tasks

Comments

@Artyom17
Copy link

Description

Torch.dynamo is not working on H100 due to obsolete triton & pytorch

Steps to reproduce

Easily reproducible on H100 by running 'pytest -k benchmark'

Expected Behavior

Works.

Actual Behavior

Doesn't work. The issue is in old Triton (v2.0.0) which does not know anything about H100 (sm_90).
Getting the following errors:

  NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
  The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
  If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

This one could be solved by installing a newer Torch 2.0.1+cu118 from the suggested url.

The second one is a triton issue:

E       RuntimeError: CUDA error: no kernel image is available for execution on the device
E       CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

v2.0.0. has limitiation - it supports only up to < sm_90 (not including). Could not install a newer triton easily, since it complains being incompatible. However, I was able hack Triton: got it locally, synced to v2.0.0. tag and reverted the d54c04ab commit. But I am not sure it is using all SMs correctly on H100 after this surgery.

Your environment

Using Docker:

DOCKER_BUILDKIT=1 docker build -t kernl .
docker run --rm -it --gpus all -v $(pwd):/kernl kernl

Also tried the more recent NVidia Docker image (12.2.0-devel-ubuntu22.04 - same result.

Packages:

Package                   Version       Editable project location
------------------------- ------------- -------------------------
aiohttp                   3.8.5
aiosignal                 1.3.1
anyio                     3.7.1
appdirs                   1.4.4
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
arrow                     1.2.3
asttokens                 2.2.1
async-lru                 2.0.3
async-timeout             4.0.2
attrs                     23.1.0
audioread                 3.0.0
Babel                     2.12.1
backcall                  0.2.0
beautifulsoup4            4.12.2
black                     23.7.0
bleach                    6.0.0
blinker                   1.4
certifi                   2023.7.22
cffi                      1.15.1
charset-normalizer        3.2.0
click                     8.1.6
cmake                     3.27.0
comm                      0.1.3
cryptography              3.4.8
datasets                  2.14.0
dbus-python               1.2.18
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.7
distro                    1.7.0
distro-info               1.1build1
exceptiongroup            1.1.2
executing                 1.2.0
fastjsonschema            2.18.0
filelock                  3.12.2
flake8                    6.0.0
fqdn                      1.5.1
frozenlist                1.4.0
fsspec                    2023.6.0
httplib2                  0.20.2
huggingface-hub           0.16.4
idna                      3.4
importlib-metadata        6.8.0
iniconfig                 2.0.0
ipykernel                 6.25.0
ipython                   8.14.0
ipython-genutils          0.2.0
ipywidgets                8.0.7
isoduration               20.11.0
isort                     5.12.0
jedi                      0.18.2
jeepney                   0.7.1
Jinja2                    3.1.2
joblib                    1.3.1
json5                     0.9.14
jsonpointer               2.4
jsonschema                4.18.4
jsonschema-specifications 2023.7.1
jupyter                   1.0.0
jupyter_client            8.3.0
jupyter-console           6.6.3
jupyter_core              5.3.1
jupyter-events            0.6.3
jupyter-lsp               2.2.0
jupyter_server            2.7.0
jupyter_server_terminals  0.4.4
jupyterlab                4.0.3
jupyterlab-pygments       0.2.2
jupyterlab_server         2.24.0
jupyterlab-widgets        3.0.8
kernl                     0.2.2         /kernl/src
keyring                   23.5.0
launchpadlib              1.10.16
lazr.restfulclient        0.14.4
lazr.uri                  1.0.6
lazy_loader               0.3
librosa                   0.10.0.post2
lit                       16.0.6
llvmlite                  0.40.1
MarkupSafe                2.1.3
matplotlib-inline         0.1.6
mccabe                    0.7.0
mistune                   3.0.1
more-itertools            8.10.0
mpmath                    1.3.0
msgpack                   1.0.5
multidict                 6.0.4
multiprocess              0.70.15
mypy-extensions           1.0.0
nbclient                  0.8.0
nbconvert                 7.7.3
nbformat                  5.9.1
nest-asyncio              1.5.6
networkx                  3.1
notebook                  7.0.0
notebook_shim             0.2.3
numba                     0.57.1
numpy                     1.24.4
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-cupti-cu11    11.7.101
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.2.10.91
nvidia-cusolver-cu11      11.4.0.1
nvidia-cusparse-cu11      11.7.4.91
nvidia-nccl-cu11          2.14.3
nvidia-nvtx-cu11          11.7.91
oauthlib                  3.2.0
overrides                 7.3.1
packaging                 23.1
pandas                    2.0.3
pandocfilters             1.5.0
parso                     0.8.3
pathspec                  0.11.1
pexpect                   4.8.0
pickleshare               0.7.5
pip                       23.2.1
platformdirs              3.9.1
pluggy                    1.2.0
pooch                     1.6.0
prometheus-client         0.17.1
prompt-toolkit            3.0.39
psutil                    5.9.5
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   12.0.1
pycodestyle               2.10.0
pycparser                 2.21
pyflakes                  3.0.1
Pygments                  2.15.1
PyGObject                 3.42.1
PyJWT                     2.3.0
pyparsing                 2.4.7
pytest                    7.4.0
python-apt                2.4.0+ubuntu1
python-dateutil           2.8.2
python-json-logger        2.0.7
pytz                      2023.3
PyYAML                    6.0.1
pyzmq                     25.1.0
qtconsole                 5.4.3
QtPy                      2.3.1
referencing               0.30.0
regex                     2023.6.3
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.9.2
safetensors               0.3.1
scikit-learn              1.3.0
scipy                     1.11.1
SecretStorage             3.3.1
Send2Trash                1.8.2
setuptools                58.1.0
six                       1.16.0
sniffio                   1.3.0
soundfile                 0.12.1
soupsieve                 2.4.1
soxr                      0.3.5
stack-data                0.6.2
sympy                     1.12
tabulate                  0.9.0
termcolor                 2.3.0
terminado                 0.17.1
threadpoolctl             3.2.0
tinycss2                  1.2.1
tokenize-rt               5.1.0
tokenizers                0.13.3
tomli                     2.0.1
torch                     2.0.0
tornado                   6.3.2
tqdm                      4.65.0
traitlets                 5.9.0
transformers              4.31.0
triton                    2.0.0
typing_extensions         4.7.1
tzdata                    2023.3
unattended-upgrades       0.1
uri-template              1.3.0
urllib3                   2.0.4
wadllib                   1.3.6
wcwidth                   0.2.6
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.6.1
wheel                     0.41.0
widgetsnbextension        4.0.8
xxhash                    3.2.0
yarl                      1.9.2
zipp                      1.0.0

Self-service

  • I would be willing to help fix this bug myself.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Artyom17 Artyom17 changed the title bug: bug: Torch.dynamo is not working on H100 due to obsolete triton & pytorch Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant