Training OCR #294

gasparuff · 2024-10-22T18:03:49Z

Hello,

I'm trying to train the OCR for Saudi Arabian license plates. I've been following the content in this issue: #33 but it seems a bit outdated and I'm having troubles understanding how to properly do it.

I downloaded a dataset from roboflow that looks like this:

3 folder:

valid
train
test

each has a folder "labels" and "images" - so far, so good.

but the labels look a bit different than what I thought it would be. I'm showing you a photo and the corresponding label file for it here.

photo:

label:
webp_png_jpg.rf.3fc702e1e3cda7913c8f292f8d5229d1.txt

content of the file:

7 0.25078125 0.7703125 0.0484375 0.14375
2 0.16171875 0.7703125 0.0453125 0.1453125
8 0.33828125 0.775 0.040625 0.1359375
13 0.7171875 0.7828125 0.0609375 0.153125
13 0.803125 0.7828125 0.065625 0.1484375
13 0.88828125 0.78125 0.059375 0.1578125

I feel like I have to convert it to something else before proceeding. The label file sems like something similar to CSV, with the first item int he row being the class (digit or letter) and the other 4 numbers being x and y coordinates in the image.

Please help me understand how to use this. Thanks!

The text was updated successfully, but these errors were encountered:

ApelSYN · 2024-10-25T07:35:24Z

Hello,

I'm trying to train the OCR for Saudi Arabian license plates. I've been following the content in this issue: #33 but it seems a bit outdated and I'm having troubles understanding how to properly do it.

I downloaded a dataset from roboflow that looks like this:

3 folder:

valid

train

test

each has a folder "labels" and "images" - so far, so good.

but the labels look a bit different than what I thought it would be. I'm showing you a photo and the corresponding label file for it here.

photo:

Historically, we do not use roboflow for markup. We liked the VGG Image Annotator (VIA). We mark up the VIA dataset, then convert it to the YOLO format. But it is not essential where to mark. We recommend tagging at least 5,000 photos, in our European dataset there are about 15,000 photos.

But finding a zone with a number is only part of the task, next you need to train an OCR model that will read the text from a zone with a number found using YOLO. Also, the task is complicated by the fact that the number consists of 2 lines, so you need to correctly detect 4 points that are the corners of the quadrilateral that describes the number, then divide the image into 2 lines and read it.

label: webp_png_jpg.rf.3fc702e1e3cda7913c8f292f8d5229d1.txt

content of the file:
7 0.25078125 0.7703125 0.0484375 0.14375
2 0.16171875 0.7703125 0.0453125 0.1453125
8 0.33828125 0.775 0.040625 0.1359375
13 0.7171875 0.7828125 0.0609375 0.153125
13 0.803125 0.7828125 0.065625 0.1484375
13 0.88828125 0.78125 0.059375 0.1578125
I feel like I have to convert it to something else before proceeding. The label file sems like something similar to CSV, with the first item int he row being the class (digit or letter) and the other 4 numbers being x and y coordinates in the image.

Please help me understand how to use this. Thanks!

In our approach, we will mark one class "0" (Let's call it Numberplate) - this is a frame around the number, reading the text is done by another OCR model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training OCR #294

Training OCR #294

gasparuff commented Oct 22, 2024 •

edited

Loading

ApelSYN commented Oct 25, 2024

Training OCR #294

Training OCR #294

Comments

gasparuff commented Oct 22, 2024 • edited Loading

ApelSYN commented Oct 25, 2024

gasparuff commented Oct 22, 2024 •

edited

Loading