You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some quirks with the ocr_only partitioning strategy in unstructured. The expectation in general is that bounding box coordinates are reported counter-clockwise starting at the top-left corner. Since the y-axis increases going down, this means that the second and third points should have higher y-values than the first and last points. This was true for the fast partitioning strategy, but does not appear to always be the case for ocr_only. For example, this are coordinates reported in one case:
You can see here that the points are reported starting at the lower left, and going clockwise. The OCR code in unstructured appears to be undergoing a fairly significant refactoring, so this may be actively changing, but in the meantime, we should make our bounding box conversion behave correctly in this case.
The text was updated successfully, but these errors were encountered:
There are some quirks with the ocr_only partitioning strategy in unstructured. The expectation in general is that bounding box coordinates are reported counter-clockwise starting at the top-left corner. Since the y-axis increases going down, this means that the second and third points should have higher y-values than the first and last points. This was true for the fast partitioning strategy, but does not appear to always be the case for ocr_only. For example, this are coordinates reported in one case:
You can see here that the points are reported starting at the lower left, and going clockwise. The OCR code in unstructured appears to be undergoing a fairly significant refactoring, so this may be actively changing, but in the meantime, we should make our bounding box conversion behave correctly in this case.
The text was updated successfully, but these errors were encountered: