From 8e7b6ab199c75ac8fb04637347f8e675e2b864a3 Mon Sep 17 00:00:00 2001 From: myhloli Date: Sun, 12 Jan 2025 03:57:08 +0800 Subject: [PATCH] docs(faq): add troubleshooting guide for old GPUs encountering CUDA errors Added a new section in both English and Chinese FAQs addressing the issue where old GPUs like M40 encounter a RuntimeError due to unsupported BF16 precision. The guide includes steps to manually disable BF16 precision by modifying the relevant code in "pdf_parse_union_core_v2.py". --- docs/FAQ_en_us.md | 20 ++++++++++++++++++++ docs/FAQ_zh_cn.md | 23 ++++++++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/docs/FAQ_en_us.md b/docs/FAQ_en_us.md index f62a7849..053145f4 100644 --- a/docs/FAQ_en_us.md +++ b/docs/FAQ_en_us.md @@ -73,3 +73,23 @@ pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhlol ``` Reference: https://github.com/opendatalab/MinerU/issues/1004 + +### 9. Old Graphics Cards Such as M40 Encounter "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED" + +An error occurs during operation (cuda): +``` +RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) +``` +Because BF16 precision is not supported on graphics cards before the Turing architecture and some graphics cards are not recognized by torch, it is necessary to manually disable BF16 precision. +Modify the code in lines 287-290 of the "pdf_parse_union_core_v2.py" file (note that the location may vary in different versions): +``` +if torch.cuda.is_bf16_supported(): + supports_bfloat16 = True +else: + supports_bfloat16 = False +``` +Change it to: +``` +supports_bfloat16 = False +``` +Reference: https://github.com/opendatalab/MinerU/issues/1508 \ No newline at end of file diff --git a/docs/FAQ_zh_cn.md b/docs/FAQ_zh_cn.md index d3616d6a..795dd1b5 100644 --- a/docs/FAQ_zh_cn.md +++ b/docs/FAQ_zh_cn.md @@ -57,7 +57,6 @@ cuda11对新显卡的兼容性不好,需要升级paddle使用的cuda版本 ```bash pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/ ``` - 参考:https://github.com/opendatalab/MinerU/issues/558 ### 7.在部分Linux服务器上,程序一运行就报错 `非法指令 (核心已转储)` 或 `Illegal instruction (core dumped)` @@ -74,3 +73,25 @@ pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhlol ``` 参考:https://github.com/opendatalab/MinerU/issues/1004 + +### 9. 旧显卡如M40出现 "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED" + +在运行过程中(使用CUDA)出现以下错误: +``` +RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP) +``` +由于Turing架构之前的显卡不支持BF16精度,并且部分显卡未能被PyTorch正确识别,因此需要手动关闭BF16精度。 + +请找到并修改`pdf_parse_union_core_v2.py`文件中的第287至290行代码(注意:不同版本中位置可能有所不同),原代码如下: +```python +if torch.cuda.is_bf16_supported(): + supports_bfloat16 = True +else: + supports_bfloat16 = False +``` +将其修改为: +```python +supports_bfloat16 = False +``` + +参考:https://github.com/opendatalab/MinerU/issues/1508