the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" #2524

gitleibin · 2024-05-04T02:10:52Z

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

v2.14.0-4248-g3448956e87e 2.14.0.600

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

2024-05-04 09:45:04.334204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.

Standalone code to reproduce the issue

he supported AMDGPU versions are gfx1030gfx1100,

Relevant log output

>>> import os
>>> from tensorflow.python.client import device_lib
>>> os.environ["TF_CPP_MIN_LOG_LEVEL"]="99"
>>> 
>>> if __name__=="__main__":
...     print(device_lib.list_local_devices())
... 
2024-05-04 09:45:04.333922: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334007: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334103: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334144: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334184: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
2024-05-04 09:45:04.334239: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:756] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-05-04 09:45:04.334253: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 1, name: AMD Radeon Graphics, pci bus id: 0000:12:00.0) with AMDGPU version : gfx1036. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14288687369984854945
xla_global_id: -1
]

briansp2020 · 2024-05-04T13:55:44Z

This is already fixed in the code but they seem to take forever to release the updated binary. If you are ok with building from source, it should work. This script should give you some idea how to compile it. Also, the latest docker image has the fix. So, if you are OK with using a Docker container, try ROCm 6.1 images from https://hub.docker.com/r/rocm/tensorflow/tags

JMaravalhasSilva · 2024-05-26T22:23:43Z

This has been an issue for many months now... See #2410. If what @briansp2020 has said about the Docker image being fixed is accurate, it's quite baffling that they didn't bother to update the package on pypi...

Still, if you do not want to use the Docker image, there's an alternative to compiling tensorflow. You can download nightly wheels from here: http://ml-ci.amd.com:21096/job/tensorflow/job/release-rocmfork-r214-rocm-enhanced/job/release-build-whl/. This was mentioned by jayfurmanek in #2410, and it worked quite well for me.

vivaaprimavera · 2024-07-22T13:46:16Z

This issue is still present in rocm 6.1.2

2024-07-21 22:16:52.680470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2266] Ignoring visible gpu device (device: 0, name: AMD Radeon RX 6600, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1030. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.

tensorflow ignores a gfx1030. gpu

tensorflow_rocm-2.14.0.600

Agent 2

Name: gfx1030
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6600
Vendor Name: AMD

Eskander · 2024-09-07T21:07:07Z

I think they may have given up on the pypi package, but instructions on this repo were not updated and the change was poorly communicated (no surprises here). According to https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/tensorflow-install.html:

As of ROCm 6.1, tensorflow-rocm packages are found at https://repo.radeon.com/rocm/manylinux. Prior to ROCm 6.1, packages were found at https://pypi.org/project/tensorflow-rocm.

JMaravalhasSilva · 2024-09-08T12:07:56Z

I confirm what @Eskander is saying. They dropped the pypi package, and the pypi page has no mention of that. However, I would currently advise against attempting to install tensorflow-rocm on your system - not because of tensorflow per se, but actually because of rocm itself (assuming you are installing rocm on your system).

Rocm currently breaks my system. I'm on a fresh install of Ubuntu 24.04.1, disabled iGPU on the BIOS (I have a Ryzen 7950X), attempted to install via amdgpu with dkms, then uninstalled, installed again with --no-dkms, and no luck. GNOME implodes the moment you get to the login screen - I could only login, see a bunch of glitching, open a terminal and uninstall.

Additionally, if you actually check their repos, there's currently no tensorflow build for python 3.12, which Ubuntu 24.04 now ships by default... So yeah, even if you could get ROCM working, tensorflow-rocm for Ubuntu LTS is currently broken, despite AMD claiming support for it...

Anyways, the docker version seems to work fine perfectly fine with my 7900 XTX, so I believe this particular issue has been solved and can now be closed.

Lastly, if you are running Fedora, I hear they now are shipping with ROCM 6 installed by default. You'll still have the python version issue, but maybe you'll have better luck there.

gitleibin changed the title ~~the supported AMDGPU versions are gfx1030gfx1100, may be lost a "gfx1030,gfx1100"~~ the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" May 4, 2024

briansp2020 mentioned this issue Jun 5, 2024

RDNA3/7900XTX support ROCm/MIOpen#2342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" #2524

the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" #2524

gitleibin commented May 4, 2024

briansp2020 commented May 4, 2024

JMaravalhasSilva commented May 26, 2024

vivaaprimavera commented Jul 22, 2024

Eskander commented Sep 7, 2024

JMaravalhasSilva commented Sep 8, 2024 •

edited

Loading

the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" #2524

the supported AMDGPU versions are gfx1030gfx1100, may be lost a ',' between the devices "gfx1030,gfx1100" #2524

Comments

gitleibin commented May 4, 2024

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

briansp2020 commented May 4, 2024

JMaravalhasSilva commented May 26, 2024

vivaaprimavera commented Jul 22, 2024

Eskander commented Sep 7, 2024

JMaravalhasSilva commented Sep 8, 2024 • edited Loading

JMaravalhasSilva commented Sep 8, 2024 •

edited

Loading