使用return_word_box功能，输出很奇怪 #14430

huameinan219 · 2024-12-20T01:42:22Z

huameinan219
Dec 20, 2024

🔎 Search before asking

I have searched the PaddleOCR Docs and found no similar bug report.
I have searched the PaddleOCR Issues and found no similar bug report.
I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

直接安装paddleocr包，调用PaddleOCR(lang="ch", return_word_box=True),输出的结果类似于：
[[[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]], ('纯臻营养护发素', 0.9946897625923157, [46.085826210826205, [['纯', '臻', '营', '养', '护', '发', '素']], [[3, 10, 16, 23, 30, 36, 43]], ['cn']])]
后面的 [46.085826210826205, [['纯', '臻', '营', '养', '护', '发', '素']], [[3, 10, 16, 23, 30, 36, 43]] 代表什么呢？如何将其转化为4点检测框呢？我查看了#10377，里面的解释没看懂。
查看源码源码第233行，cal_ocr_word_box函数返回的应该是检测框啊，为啥调用paddleocr包，返回的单字检测结果不是框？该如何将其转化为框呢？

🏃‍♂️ Environment (运行环境)

x86 CPU

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

PaddleOCR(lang="ch", return_word_box=True)

GreatV · 2024-12-20T02:29:32Z

GreatV
Dec 20, 2024
Maintainer

您提到使用 return_word_box=True 时，输出的结果中包含了一些难以理解的内容，例如：

[[[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]], 
('纯臻营养护发素', 0.9946897625923157, [46.085826210826205, 
[['纯', '臻', '营', '养', '护', '发', '素']], 
[[3, 10, 16, 23, 30, 36, 43]], 
['cn']])]

以下是对该输出结构的解析以及如何将其转化为检测框的详细说明：

输出结构解析

检测框部分
输出的第一部分是文本的整体检测框，表示检测到的区域的四个顶点坐标。例如：
```
[[[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]]]
```
这部分是一个标准的四点检测框，直接表示矩形框的边界。
识别结果部分
第二部分是识别的文字和相关信息，包括：
- 识别出的文字内容：如 '纯臻营养护发素'。
- 识别的置信度：如 0.9946897625923157。
- 其后的复杂结构 [...] 包含以下信息：
  - 单字检测框的宽度统计：如 [46.085826210826205]，表示整体的宽度或相关的分布。
  - 单字拆分结果：
    - 每个单字：['纯', '臻', '营', '养', '护', '发', '素']。
    - 单字的偏移量信息：如 [[3, 10, 16, 23, 30, 36, 43]]，表示每个字的相对位置。
  - 语言信息：如 ['cn']，标识检测到的语言。

输出为何包含单字信息？

这是因为 return_word_box=True 启用了单字级别的框检测功能，返回了每个字的相关信息，而不是单纯的整体检测框。PaddleOCR 在这种模式下不仅返回整体矩形框，还提供更细粒度的单字信息。

如何将单字信息转化为检测框？

如果您只想提取单字的检测框，可以按照以下步骤进行处理：

解析偏移量
假设您已经获取了整体检测框的坐标和单字的偏移量，如：

overall_box = [[26.0, 37.0], [304.0, 37.0], [304.0, 73.0], [26.0, 73.0]]
char_offsets = [[3, 10, 16, 23, 30, 36, 43]]  # 每个字的偏移量

计算每个字的框
使用整体检测框的宽度和高度，根据偏移量拆分每个字的区域。例如：

overall_width = overall_box[1][0] - overall_box[0][0]
overall_height = overall_box[2][1] - overall_box[0][1]
char_widths = char_offsets[0]  # 单字的宽度分布

# 累积计算每个字的边界框
char_boxes = []
x_start = overall_box[0][0]  # 从整体框的左上角开始
for i, char_width in enumerate(char_widths):
    x_end = x_start + char_width
    char_boxes.append([[x_start, overall_box[0][1]],  # 左上角
                       [x_end, overall_box[0][1]],   # 右上角
                       [x_end, overall_box[2][1]],   # 右下角
                       [x_start, overall_box[2][1]]])  # 左下角
    x_start = x_end

输出单字框
char_boxes 的结果将包含每个单字的四点检测框。

示例代码

以下是完整的示例代码：

from paddleocr import PaddleOCR

# 初始化 OCR
ocr = PaddleOCR(lang="ch", return_word_box=True)

# 执行 OCR
img_path = "./your_image.jpg"
results = ocr.ocr(img_path)

# 解析结果
for line in results[0]:
    overall_box, (text, confidence, extra_info) = line
    char_offsets = extra_info[1][1]  # 单字偏移量
    overall_width = overall_box[1][0] - overall_box[0][0]
    overall_height = overall_box[2][1] - overall_box[0][1]

    # 计算单字框
    char_boxes = []
    x_start = overall_box[0][0]
    for char_width in char_offsets[0]:
        x_end = x_start + char_width
        char_boxes.append([[x_start, overall_box[0][1]],
                           [x_end, overall_box[0][1]],
                           [x_end, overall_box[2][1]],
                           [x_start, overall_box[2][1]]])
        x_start = x_end

    print(f"Text: {text}, Confidence: {confidence}")
    print(f"Overall Box: {overall_box}")
    print(f"Character Boxes: {char_boxes}")

总结

输出中包含单字信息是 return_word_box=True 的设计特性。
单字框可以通过整体框和偏移量计算得到。
如果您只需要整体框，可以忽略单字信息，直接使用返回的 overall_box。

如有进一步问题，欢迎继续提问！

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用return_word_box功能，输出很奇怪 #14430

{{title}}

Replies: 1 comment

{{title}}

Select a reply

使用return_word_box功能，输出很奇怪 #14430

huameinan219 Dec 20, 2024

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Replies: 1 comment

GreatV Dec 20, 2024 Maintainer

输出结构解析

输出为何包含单字信息？

如何将单字信息转化为检测框？

示例代码

总结

huameinan219
Dec 20, 2024

GreatV
Dec 20, 2024
Maintainer