Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training OCR #294

Open
gasparuff opened this issue Oct 22, 2024 · 1 comment
Open

Training OCR #294

gasparuff opened this issue Oct 22, 2024 · 1 comment

Comments

@gasparuff
Copy link

gasparuff commented Oct 22, 2024

Hello,

I'm trying to train the OCR for Saudi Arabian license plates. I've been following the content in this issue: #33 but it seems a bit outdated and I'm having troubles understanding how to properly do it.

I downloaded a dataset from roboflow that looks like this:

3 folder:

  • valid
  • train
  • test

each has a folder "labels" and "images" - so far, so good.

but the labels look a bit different than what I thought it would be. I'm showing you a photo and the corresponding label file for it here.

photo:
webp_png_jpg rf 3fc702e1e3cda7913c8f292f8d5229d1

label:
webp_png_jpg.rf.3fc702e1e3cda7913c8f292f8d5229d1.txt

content of the file:

7 0.25078125 0.7703125 0.0484375 0.14375
2 0.16171875 0.7703125 0.0453125 0.1453125
8 0.33828125 0.775 0.040625 0.1359375
13 0.7171875 0.7828125 0.0609375 0.153125
13 0.803125 0.7828125 0.065625 0.1484375
13 0.88828125 0.78125 0.059375 0.1578125

I feel like I have to convert it to something else before proceeding. The label file sems like something similar to CSV, with the first item int he row being the class (digit or letter) and the other 4 numbers being x and y coordinates in the image.

Please help me understand how to use this. Thanks!

@ApelSYN
Copy link
Member

ApelSYN commented Oct 25, 2024

Hello,

I'm trying to train the OCR for Saudi Arabian license plates. I've been following the content in this issue: #33 but it seems a bit outdated and I'm having troubles understanding how to properly do it.

I downloaded a dataset from roboflow that looks like this:

3 folder:

  • valid
  • train
  • test

each has a folder "labels" and "images" - so far, so good.

but the labels look a bit different than what I thought it would be. I'm showing you a photo and the corresponding label file for it here.

photo: webp_png_jpg rf 3fc702e1e3cda7913c8f292f8d5229d1

Historically, we do not use roboflow for markup. We liked the VGG Image Annotator (VIA). We mark up the VIA dataset, then convert it to the YOLO format. But it is not essential where to mark. We recommend tagging at least 5,000 photos, in our European dataset there are about 15,000 photos.

But finding a zone with a number is only part of the task, next you need to train an OCR model that will read the text from a zone with a number found using YOLO. Also, the task is complicated by the fact that the number consists of 2 lines, so you need to correctly detect 4 points that are the corners of the quadrilateral that describes the number, then divide the image into 2 lines and read it.

label: webp_png_jpg.rf.3fc702e1e3cda7913c8f292f8d5229d1.txt

content of the file:

7 0.25078125 0.7703125 0.0484375 0.14375
2 0.16171875 0.7703125 0.0453125 0.1453125
8 0.33828125 0.775 0.040625 0.1359375
13 0.7171875 0.7828125 0.0609375 0.153125
13 0.803125 0.7828125 0.065625 0.1484375
13 0.88828125 0.78125 0.059375 0.1578125

I feel like I have to convert it to something else before proceeding. The label file sems like something similar to CSV, with the first item int he row being the class (digit or letter) and the other 4 numbers being x and y coordinates in the image.

Please help me understand how to use this. Thanks!

In our approach, we will mark one class "0" (Let's call it Numberplate) - this is a frame around the number, reading the text is done by another OCR model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants