Nightly built wheels from CI doesn't work on Navi 31 #2142

evshiron · 2023-06-29T05:01:18Z

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

source

Tensorflow Version

tf_nightly_rocm-2.14.0dev20230628.550-cp310-cp310-manylinux2014_x86_64.whl

Custom Code

Yes

OS Platform and Distribution

Ubuntu 22.04.2

Tested in ROCm 5.5.1 host and ROCm 5.5.0 container.

Mobile device

No response

Python version

3.10

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

The sample script doesn't work when using tf-nightly-rocm from CI, which should have Navi 31 support out of the box for the time being.

The log for a previously succeeded run can be found here: ROCm/ROCm#1880 (comment).

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = np.random.randn(10000,25)
targets = np.random.randn(10000)

model = tf.keras.Sequential([
     tf.keras.layers.Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.MeanSquaredError())

model.fit(x=features, y=targets)

Relevant log output

2023-06-29 04:53:06.731832: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-29 04:53:06.755074: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-29 04:53:07.345334: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.352729: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.352755: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353192: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353228: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353254: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353322: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353346: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353374: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:809] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-06-29 04:53:07.353387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1816] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21420 MB memory:  -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:03:00.0
2023-06-29 04:53:07.479918: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
[SAME LOG TRIMMED]
2023-06-29 04:53:07.677340: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:30] 'hipModuleLoadData(&module, data)' failed with 'hipErrorNoBinaryForGpu'

2023-06-29 04:53:07.677348: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:30] 'hipModuleGetFunction(&function, module, kernel_name)' failed with 'hipErrorInvalidHandle'

2023-06-29 04:53:07.677353: W tensorflow/core/framework/op_kernel.cc:1816] INTERNAL: 'hipModuleLaunchKernel( function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<hipStream_t>(stream), params, nullptr)' failed with 'hipErrorInvalidHandle'
/usr/local/lib/python3.10/dist-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer GlorotUniform is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.
  warnings.warn(
2023-06-29 04:53:07.678085: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:30] 'hipModuleLoadData(&module, data)' failed with 'hipErrorNoBinaryForGpu'

2023-06-29 04:53:07.678093: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:30] 'hipModuleGetFunction(&function, module, kernel_name)' failed with 'hipErrorInvalidHandle'

2023-06-29 04:53:07.678098: W tensorflow/core/framework/op_kernel.cc:1816] INTERNAL: 'hipModuleLaunchKernel( function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<hipStream_t>(stream), params, nullptr)' failed with 'hipErrorInvalidHandle'
Traceback (most recent call last):
  File "/root/test.py", line 14, in <module>
    model.fit(x=features, y=targets)
  File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py", line 52, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
tensorflow.python.framework.errors_impl.InternalError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1367, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1348, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1125, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/usr/local/lib/python3.10/dist-packages/keras/src/backend.py", line 2102, in random_uniform
        return tf.random.stateless_uniform(

    InternalError: Exception encountered when calling layer 'sequential' (type Sequential).
    
    {{function_node __wrapped__Sub_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'hipModuleLaunchKernel( function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<hipStream_t>(stream), params, nullptr)' failed with 'hipErrorInvalidHandle' [Op:Sub] name: 
    
    Call arguments received by layer 'sequential' (type Sequential):
      • inputs=tf.Tensor(shape=(None, 25), dtype=float32)
      • training=True
      • mask=None

The text was updated successfully, but these errors were encountered:

briansp2020 · 2023-08-12T21:35:40Z

Made some progress building TF for 7900XTX. #2191
I was able to run a simple example but it leaves my computer in an unstable state, and eventually, GUI crashes.

evshiron · 2023-08-13T02:38:59Z

We have a thread ROCm/ROCm#1880 for it, and after #2101, TensorFlow should work to some extent. It's even already integrated in their CI. But in your PR there is a missing comma, so I don't know where has it been.

briansp2020 · 2023-08-13T04:14:40Z

It looks like #2101 is already merged. It's weird that the comma is still missing in the main source.

With the comma added, my build is using the 7900XTX. I still see instability here and there. Hopefully, they will release the working code soon.

briansp2020 · 2023-09-10T17:27:50Z

Is anyone else having issues building tf from the source? For me, the build usually fails 2 to 3 times, usually after consuming a lot of memory. Hower, if run the build script again, eventually it completes and generates whl file.

I'm now trying to build using 5.7 environment and the problem seems to be much worse. I retried at least 10 time and it still fails, usually with the following error. But if I retry, it compiles for a while again then crashes.

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /root/tensorflow-upstream/tensorflow/compiler/xla/stream_executor/rocm/BUILD:532:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_runtime_wrapper.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/cmath'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stddef.h'
'/opt/rocm-5.7.0/include/hip/hip_version.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_libdevice_declares.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_math.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/algorithm'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/new'
'/opt/rocm/llvm/lib/clang/17.0.0/include/limits.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stdint.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_stdlib.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_cuda_math_forward_declares.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_cmath.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_cuda_complex_builtins.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/complex'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__stddef_max_align_t.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stdarg.h'
'/opt/rocm-5.7.0/include/hip/hip_runtime.h'
'/opt/rocm-5.7.0/include/hip/hip_common.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_runtime.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_common.h'
'/opt/rocm-5.7.0/include/hip/hip_runtime_api.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/host_defines.h'
'/opt/rocm-5.7.0/include/hip/driver_types.h'
'/opt/rocm-5.7.0/include/hip/texture_types.h'
'/opt/rocm-5.7.0/include/hip/channel_descriptor.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_channel_descriptor.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_vector_types.h'
'/opt/rocm-5.7.0/include/hip/surface_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_runtime_pt_api.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/hip_ldg.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_atomic.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_device_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/math_fwd.h'
'/opt/rocm-5.7.0/include/hip/hip_vector_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/device_library_decls.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_warp_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_unsafe_atomics.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_surface_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/texture_fetch_functions.h'
'/opt/rocm-5.7.0/include/hip/hip_texture_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/ockl_image.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/texture_indirect_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_math_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/hip_fp16_math_fwd.h'
'/opt/rocm-5.7.0/include/hip/library_types.h'
'/opt/rocm-5.7.0/include/hip/hip_bfloat16.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_bfloat16.h'
'/opt/rocm-5.7.0/include/hip/hip_fp16.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_fp16.h'
clang++: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build

i-chaochen · 2023-09-18T10:57:20Z

Is anyone else having issues building tf from the source? For me, the build usually fails 2 to 3 times, usually after consuming a lot of memory. Hower, if run the build script again, eventually it completes and generates whl file.

I'm now trying to build using 5.7 environment and the problem seems to be much worse. I retried at least 10 time and it still fails, usually with the following error. But if I retry, it compiles for a while again then crashes.

INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /root/tensorflow-upstream/tensorflow/compiler/xla/stream_executor/rocm/BUILD:532:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_runtime_wrapper.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/cmath'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stddef.h'
'/opt/rocm-5.7.0/include/hip/hip_version.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_libdevice_declares.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_math.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/algorithm'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/new'
'/opt/rocm/llvm/lib/clang/17.0.0/include/limits.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stdint.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_stdlib.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_cuda_math_forward_declares.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_hip_cmath.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__clang_cuda_complex_builtins.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/cuda_wrappers/complex'
'/opt/rocm/llvm/lib/clang/17.0.0/include/__stddef_max_align_t.h'
'/opt/rocm/llvm/lib/clang/17.0.0/include/stdarg.h'
'/opt/rocm-5.7.0/include/hip/hip_runtime.h'
'/opt/rocm-5.7.0/include/hip/hip_common.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_runtime.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_common.h'
'/opt/rocm-5.7.0/include/hip/hip_runtime_api.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/host_defines.h'
'/opt/rocm-5.7.0/include/hip/driver_types.h'
'/opt/rocm-5.7.0/include/hip/texture_types.h'
'/opt/rocm-5.7.0/include/hip/channel_descriptor.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_channel_descriptor.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_vector_types.h'
'/opt/rocm-5.7.0/include/hip/surface_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_runtime_pt_api.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/hip_ldg.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_atomic.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_device_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/math_fwd.h'
'/opt/rocm-5.7.0/include/hip/hip_vector_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/device_library_decls.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_warp_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_unsafe_atomics.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_surface_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/texture_fetch_functions.h'
'/opt/rocm-5.7.0/include/hip/hip_texture_types.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/ockl_image.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/texture_indirect_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_math_functions.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/hip_fp16_math_fwd.h'
'/opt/rocm-5.7.0/include/hip/library_types.h'
'/opt/rocm-5.7.0/include/hip/hip_bfloat16.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_bfloat16.h'
'/opt/rocm-5.7.0/include/hip/hip_fp16.h'
'/opt/rocm-5.7.0/include/hip/amd_detail/amd_hip_fp16.h'
clang++: warning: argument unused during compilation: '-fcuda-flush-denormals-to-zero' [-Wunused-command-line-argument]
Target //tensorflow/tools/pip_package:build_pip_package failed to build

seems the build couldn't find rocm stuff at all, I suspect it's related to LLVM version (cannot guarantee).

We recommend you're using rocm/tensorflow-build to build from the source if you want to try
https://hub.docker.com/repository/docker/rocm/tensorflow-build

briansp2020 · 2023-09-18T13:51:57Z

To use ROCm 5.7, I changed the ROCM_INSTALL_DIR in build_rocm_python3 to point to /opt/rocm. Making it to point to /opt/rocm-5.7.0 explicitly seems to fix the issue. Not sure why though since /opt/ROCm is a sym-link to the exact same location. 🤷‍♂️

i-chaochen · 2023-09-18T13:54:49Z

To use ROCm 5.7, I changed the ROCM_INSTALL_DIR in build_rocm_python3 to point to /opt/rocm. Making it to point to /opt/rocm-5.7.0 explicitly seems to fix the issue. Not sure why though since /opt/ROCm is a sym-link to the exact same location. 🤷‍♂️

I am not sure what's your environment, but could you try it in rocm/tensorflow-build ? Thanks

briansp2020 · 2023-09-18T16:53:47Z

@i-chaochen
Can you tell me how to build using rocm/tensorflow-build? I usually build using build_rocm_python3 script in tensorflow-upstream repo. It's not clear to me how to build for rocm using rocm/tensorflow-build.

Thanks!

i-chaochen · 2023-09-18T18:43:34Z

@i-chaochen Can you tell me how to build using rocm/tensorflow-build? I usually build using build_rocm_python3 script in tensorflow-upstream repo. It's not clear to me how to build for rocm using rocm/tensorflow-build.

Thanks!

If you installed docker container, then you can follow the instructions from our repo. https://github.com/ROCmSoftwarePlatform/tensorflow-upstream#tensorflow-rocm-port

$ alias drun='sudo docker run \
      -it \
      --network=host \
      --device=/dev/kfd \
      --device=/dev/dri \
      --ipc=host \
      --shm-size 16G \
      --group-add video \
      --cap-add=SYS_PTRACE \
      --security-opt seccomp=unconfined \
      -v $HOME/dockerx:/dockerx'

$ drun rocm/tensorflow-build:latest-python3.9-rocm5.6.0    # for example I am using rocm5.6.0

Within the launched container, you can git clone our Tensorflow repo and change the correct rocm-version with directory path in build_rocm_python3 script and build it.

$ git clone https://github.com/ROCmSoftwarePlatform/tensorflow-upstream.git
$ cd tensorflow-upstream/
$ ./build_rocm_python3

PS: Because I am using rocm5.6.0 as an example, it's same as in this script. If you're using rocm5.7.0 docker container, you need to change to rocm-5.7.0 as well., e.g., /opt/rocm-5.7.0 in https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/build_rocm_python3#L33

johnnynunez · 2023-11-03T13:15:30Z

#!/bin/sh

set -ex

sudo apt install -y patchelf

sudo apt-get update && sudo apt-get install -y openjdk-8-jdk openjdk-8-jre unzip wget git libstdc++-12-dev
python3 -m venv venv
python3 -m pip install --upgrade pip

source venv/bin/activate

export WORKDIR=$(pwd)
export PYTHON_BIN_PATH=$WORKDIR/venv/bin/python3
export PYTHON_LIB_PATH=$WORKDIR/venv/lib/python3.11/site-packages
export ROCM_PATH=/opt/rocm-5.7.0
export TF_NEED_ROCM=1
export GPU_DEVICE_TARGETS=gfx1100

# build tensorflow-rocm

if [ -d "tensorflow-upstream" ]; then
    echo "tensorflow-upstream folder exists. Skipping git clone."
else
    git clone --recursive https://github.com/ROCmSoftwarePlatform/tensorflow-upstream
fi
cd tensorflow-upstream

# install bazel in venv
curl -L https://github.com/bazelbuild/bazelisk/releases/download/v1.18.0/bazelisk-linux-amd64 -o $WORKDIR/venv/bin/bazel \
  && chmod +x $WORKDIR/venv/bin/bazel

bazel clean --expunge

# declare build targets
printf '%s\n' ${GPU_DEVICE_TARGETS} | sudo tee -a $ROCM_PATH/bin/target.lst

# https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/243d98f63f538787b438a15ec0e7cc2f5f9c2d10/tensorflow/tools/ci_build/Dockerfile.rocm#L108
sudo touch $ROCM_PATH/.info/version

pip install setuptools wheel numpy packaging requests

bash build_rocm_python3

cd ..

Secondly modify this line. In my case 32gb 16 cores and 32 threads.

 RESOURCE_OPTION="--local_ram_resources=60000 --local_cpu_resources=35 --jobs=70"

RESOURCE_OPTION="--local_ram_resources=28000 --local_cpu_resources=16 --jobs=32"

evshiron mentioned this issue Aug 4, 2023

feat: add support for gfx1100 devices #2101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nightly built wheels from CI doesn't work on Navi 31 #2142

Nightly built wheels from CI doesn't work on Navi 31 #2142

evshiron commented Jun 29, 2023 •

edited

Loading

briansp2020 commented Aug 12, 2023

evshiron commented Aug 13, 2023 •

edited

Loading

briansp2020 commented Aug 13, 2023

briansp2020 commented Sep 10, 2023

i-chaochen commented Sep 18, 2023 •

edited

Loading

briansp2020 commented Sep 18, 2023

i-chaochen commented Sep 18, 2023

briansp2020 commented Sep 18, 2023

i-chaochen commented Sep 18, 2023 •

edited

Loading

johnnynunez commented Nov 3, 2023

Nightly built wheels from CI doesn't work on Navi 31 #2142

Nightly built wheels from CI doesn't work on Navi 31 #2142

Comments

evshiron commented Jun 29, 2023 • edited Loading

Issue Type

Have you reproduced the bug with TF nightly?

Source

Tensorflow Version

Custom Code

OS Platform and Distribution

Mobile device

Python version

Bazel version

GCC/Compiler version

CUDA/cuDNN version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

briansp2020 commented Aug 12, 2023

evshiron commented Aug 13, 2023 • edited Loading

briansp2020 commented Aug 13, 2023

briansp2020 commented Sep 10, 2023

i-chaochen commented Sep 18, 2023 • edited Loading

briansp2020 commented Sep 18, 2023

i-chaochen commented Sep 18, 2023

briansp2020 commented Sep 18, 2023

i-chaochen commented Sep 18, 2023 • edited Loading

johnnynunez commented Nov 3, 2023

evshiron commented Jun 29, 2023 •

edited

Loading

evshiron commented Aug 13, 2023 •

edited

Loading

i-chaochen commented Sep 18, 2023 •

edited

Loading

i-chaochen commented Sep 18, 2023 •

edited

Loading