Recognize both digit & alphabet when fine tune digits #11

duonghb53 · 2019-08-26T09:06:51Z

Dear Shreeshrii,
I try your guide to fine tune from data_best/eng.datatrained add number font Ocrb but when I get ocrb.datatrained to recognize it still get alphabet & digit.
I don't know how to do same you create digit.datatrained. It only get digit.
Please help me.
Thank you.

Shreeshrii · 2019-08-27T08:58:30Z

Now you can also use the blacklist config to avoid alphabet.

tesseract input output --oem 1 --psm 6 -l eng-c tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

duonghb53 · 2019-08-28T06:30:17Z

Dear Shreeshrii,
I have any questions:

Difference between when using to recognize only digits:
tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
&
tessedit_char_whitelist=1234567890
I use your file digit.datatrained using on Tool VietOCR and it only recognize digit(I haven't to do anything) but when using ocrb.datatrained then it recognize to digit & alphabet.
I don't understand why.
Please help.
Many thank.

Shreeshrii · 2019-08-28T08:11:47Z

tessedit_char_blacklist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

will ignore a-z and A-Z only. Punctuation, digits and any other characters in unicharset will be recognized.

tessedit_char_whitelist=1234567890

Only the digits 0-9 will be recognized.

digit.traineddata

was only trained on a limited characterset of 0-9

ocrb.traineddata

I will have to check, but it was trained for both Alphabet and digits in OCRB font for recognition of ID.

Please note that all these are proof of concept traineddata files for training. I have not used any of them.

duonghb53 · 2019-08-29T10:15:34Z

Dear Shreeshrii,
I use file your tessdata for recognize: digit.datatrained

I don't understand about Tesseract.
I training with new data ocrb font but result is not exactly.

Can you recommend help me?
Many thanks.

Shreeshrii · 2019-08-29T10:24:46Z

You should get perfect results using eng.traineddata from tessdata_best.

Make sure your image is 300 dpi.

tesseract 3324069222.png -
3324069222

duonghb53 · 2019-08-30T03:22:27Z

Dear Shreeshrii,
I tried change image to 300 dpi and test. Result is better.
But have some difference between I only change dpi to 300 and image both change dpi to 300 & white padding. (I use EngineMode = Ltsm, PageSegMode pageSegMode = PageSegMode.SingleBlock)
I try recognize with tessdata_best/eng.traineddata & digits.traineddata (I download in https://github.com/Shreeshrii/tessdata_shreetest/blob/master/digits.traineddata)
This is result.
I see result is same.
What do I need to improve accuracy?
(Improve quality image, fine tune file datatrain or try recognize with other parameter.....)
I don't know how.
It often mistake: 5->6, 7->2, 9->2
Regards,

duonghb53 · 2019-08-30T03:23:55Z

Dear Shreeshrii,
This is image I use and result:
Test.zip
Please view it.
Regards,

Shreeshrii · 2019-08-30T08:29:40Z

Try suggestions in https://groups.google.com/forum/?fromgroups#!searchin/tesseract-ocr/lorenzo%7Csort:date/tesseract-ocr/2uBsbG9XHzI/1Y9QoA37BQAJ

…

On Fri, Aug 30, 2019 at 8:53 AM duonghb53 ***@***.***> wrote: Dear Shreeshrii, This is image I use and result: Test.zip <https://github.com/Shreeshrii/tessdata_shreetest/files/3558152/Test.zip> Please view it. Regards, — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=ABG37I3KVUS2YU5HY4EBTGTQHCHEXA5CNFSM4IPNQ6Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5QNUBY#issuecomment-526440967>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG37I4X3BGLW7UQZUWZVQTQHCHEXANCNFSM4IPNQ6YQ> .

--

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

duonghb53 · 2019-09-19T02:59:34Z

Dear Shreeshrii,
I'm trying recognize(using databest/eng) digits but it often mistake digit 1 with '|, , /, ]' because my font & image is skew.
My idea is find this font and tune up from databest/eng. Is is good?
Do you suggest me to solve it?
Thanks.

Shreeshrii · 2019-09-19T11:36:37Z

@nguyenq Quan Is it possible to use digits config file with VietOCR?

nguyenq · 2019-09-19T23:09:45Z

According to its readme file:

You can put init-only and non-init control parameters in tessdata/configs/tess_configs and tess_configvars files, respectively, to modify Tesseract's behaviour.

duonghb53 · 2019-09-20T02:59:10Z

@nguyenq I try config follow your guide but it still recognize to alphabet.
Because I configs wrong?
Please view attached file:

Shreeshrii · 2019-09-20T13:17:23Z

I get the correct answer in Vietocr using a screenshot copy of your image.
I am using the latest version I downloaded just now - vietocr 5.5.1

duonghb53 · 2019-09-23T03:12:01Z

Dear @Shreeshrii ,
Thank you.
I try fine tune tessdata from your file: digits.traineddata.
I can recognize easier and more accurate.
But seem it regularly recognize mistake number 2 to number 3.
Why is it happen?

Shreeshrii · 2019-09-23T03:53:01Z

If your images are skewed, either deskew before feeding to tesseract or train on italic font matching your images.

arrrrny mentioned this issue Oct 17, 2019

None of the traineddata works for me #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognize both digit & alphabet when fine tune digits #11

Recognize both digit & alphabet when fine tune digits #11

duonghb53 commented Aug 26, 2019

Shreeshrii commented Aug 27, 2019 •

edited

Loading

duonghb53 commented Aug 28, 2019

Shreeshrii commented Aug 28, 2019

duonghb53 commented Aug 29, 2019

Shreeshrii commented Aug 29, 2019

duonghb53 commented Aug 30, 2019

duonghb53 commented Aug 30, 2019

Shreeshrii commented Aug 30, 2019 via email

duonghb53 commented Sep 19, 2019

Shreeshrii commented Sep 19, 2019

nguyenq commented Sep 19, 2019

duonghb53 commented Sep 20, 2019

Shreeshrii commented Sep 20, 2019

duonghb53 commented Sep 23, 2019

Shreeshrii commented Sep 23, 2019

Recognize both digit & alphabet when fine tune digits #11

Recognize both digit & alphabet when fine tune digits #11

Comments

duonghb53 commented Aug 26, 2019

Shreeshrii commented Aug 27, 2019 • edited Loading

duonghb53 commented Aug 28, 2019

Shreeshrii commented Aug 28, 2019

duonghb53 commented Aug 29, 2019

Shreeshrii commented Aug 29, 2019

duonghb53 commented Aug 30, 2019

duonghb53 commented Aug 30, 2019

Shreeshrii commented Aug 30, 2019 via email

duonghb53 commented Sep 19, 2019

Shreeshrii commented Sep 19, 2019

nguyenq commented Sep 19, 2019

duonghb53 commented Sep 20, 2019

Shreeshrii commented Sep 20, 2019

duonghb53 commented Sep 23, 2019

Shreeshrii commented Sep 23, 2019

Shreeshrii commented Aug 27, 2019 •

edited

Loading