Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

Open
pzread opened this issue Aug 4, 2023 · 0 comments
Open

[XLA-HLO:GPU] Type errors on BERT_LARGE_FP16_JAX_* models #117

pzread opened this issue Aug 4, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@pzread
Copy link

pzread commented Aug 4, 2023

docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work" "gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr
2023-08-04 19:15:21.721351: I xla/service/service.cc:168] XLA service 0x5640370dddd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-08-04 19:15:21.721415: I xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA A100-SXM4-40GB, Compute Capability 8.0
2023-08-04 19:15:21.721767: I xla/pjrt/gpu/se_gpu_pjrt_client.cc:633] Using BFC allocator.
2023-08-04 19:15:21.721826: I xla/pjrt/gpu/gpu_helpers.cc:105] XLA backend allocating 31753961472 bytes on device 0 for BFCAllocator.
2023-08-04 19:15:31.158463: I xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8900
2023-08-04 19:15:34.067278: I xla/stream_executor/gpu/asm_compiler.cc:328] ptxas warning : Registers are spilled to local memory in function 'triton_gemm_dot_295', 996 bytes spill stores, 1108 bytes spill loads

2023-08-04 19:15:36.668819: W xla/service/gpu/runtime/support.cc:58] Intercepted XLA runtime error:
INTERNAL: Unexpected GEMM dtype: f32 f32 f16
2023-08-04 19:15:36.699421: F xla/tools/multihost_hlo_runner/hlo_runner_main.cc:121] Non-OK-status: xla::FunctionalHloRunner::LoadAndRunAndDump( *client.value(), preproc_options, raw_compile_options, running_options, {hlo_file}, input_format, dump_output_literal_to, task_id) status: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.gemm' failed: Unexpected GEMM dtype: f32 f32 f16; current tracing scope: custom-call; current profiling annotation: XlaModule:#hlo_module=extracted,program_id=131#.

Reproduce:

wget -O xla_hlo_before_optimizations.txt https://storage.googleapis.com/iree-model-artifacts/jax/jax_models_0.4.13_1688607404/BERT_LARGE_FP16_JAX_384XI32_BATCH1/xla_hlo_before_optimizations.txt

docker run --gpus all --mount="type=bind,src="${PWD}",target=/work" --workdir="/work" "gcr.io/iree-oss/openxla-benchmark/cuda11.8-cudnn8.9@sha256:c39107c4160e749b7c4bac18862c6c1b6d56e1aa60644a4fe323e315ffba0a0b" /work/xla-tools-dir/hlo_runner_main --hlo_file=/work/xla_hlo_before_optimizations.txt --device_type=gpu --num_repeats=50 --input_format=text --num_replicas=1 --num_partitions=1 --logtostderr
@pzread pzread added the bug Something isn't working label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant