List of the 10 training fonts? #1

koch-aai · 2018-10-01T14:17:35Z

Thanks for uploading this trained model - could you possibly provide some info about the training data?

Specifically the fonts used and the average string length. Has this been tested on SVHN by any chance?

Thanks!

Shreeshrii · 2018-10-01T15:20:14Z

It has NOT been tested at all. It is a proof of concept finetune training. Users are encouraged to finetune specific to their own user case, fonts etc.
The training text used for last training version with commas is at
https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text
Modified versions of eng.punc and eng.numbers have been used. These could be further modified based on user requirements. They might cause minor improvements in recognition. Files used can be compared with the ones in langdata/eng and are made available at
https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.punc
https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.numbers
Fonts used for the last training were those listed in src/training/language_specific.sh for Latin script (with minor modifications). Again, finetuning with fonts used in images to be OCRed will lead to better accuracy.
Here is the list used

./digits/eng.Arial_Bold.exp0.lstmf
./digits/eng.Arial_Bold_Italic.exp0.lstmf
./digits/eng.Arial.exp0.lstmf
./digits/eng.Arial_Italic.exp0.lstmf
./digits/eng.Courier_New_Bold.exp0.lstmf
./digits/eng.Courier_New_Bold_Italic.exp0.lstmf
./digits/eng.Courier_New.exp0.lstmf
./digits/eng.Courier_New_Italic.exp0.lstmf
./digits/eng.FreeMono.exp0.lstmf
./digits/eng.FreeSans.exp0.lstmf
./digits/eng.FreeSerif.exp0.lstmf
./digits/eng.Georgia_Bold.exp0.lstmf
./digits/eng.Georgia_Bold_Italic.exp0.lstmf
./digits/eng.Georgia.exp0.lstmf
./digits/eng.Georgia_Italic.exp0.lstmf
./digits/eng.Times_New_Roman_Bold.exp0.lstmf
./digits/eng.Times_New_Roman_Bold_Italic.exp0.lstmf
./digits/eng.Times_New_Roman.exp0.lstmf
./digits/eng.Times_New_Roman_Italic.exp0.lstmf
./digits/eng.Trebuchet_MS_Bold.exp0.lstmf
./digits/eng.Trebuchet_MS_Bold_Italic.exp0.lstmf
./digits/eng.Trebuchet_MS.exp0.lstmf
./digits/eng.Trebuchet_MS_Italic.exp0.lstmf
./digits/eng.Verdana_Bold.exp0.lstmf
./digits/eng.Verdana_Bold_Italic.exp0.lstmf
./digits/eng.Verdana.exp0.lstmf
./digits/eng.Verdana_Italic.exp0.lstmf

An earlier training with 10 fonts used only the non-italic version of the fonts and did not include the freefonts - FreeMono, FreeSans, FreeSerif.

Bech007 · 2019-09-10T10:15:23Z

Hi @Shreeshrii
i have a dataset .txt i want train tesseract at them but i don't know how i can do that?
thank

Shreeshrii · 2019-09-10T11:45:49Z

Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
and
https://github.com/Shreeshrii/tess4training

myjun1124 · 2020-07-28T05:07:29Z

This digit traindata was useful to me. It worked well after preprocessing that I used for recognizing temperature on the screen from thermal camera.

Shreeshrii · 2020-07-28T08:21:28Z

Thanks for the comment @myjun1124. Glad to know it worked for you.

nikhilcms · 2020-11-02T11:56:19Z

Hi @Shreeshrii , I found that tesseract 4.1.1 works good for extraction of words, but many times in failed to extract digits ( specifically bold ), how can i solve this issue ?

Shreeshrii mentioned this issue Dec 20, 2018

How to fine tune your digits_comma file #4

Open

arrrrny mentioned this issue Oct 17, 2019

None of the traineddata works for me #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of the 10 training fonts? #1

List of the 10 training fonts? #1

koch-aai commented Oct 1, 2018

Shreeshrii commented Oct 1, 2018 •

edited

Loading

Bech007 commented Sep 10, 2019

Shreeshrii commented Sep 10, 2019

myjun1124 commented Jul 28, 2020

Shreeshrii commented Jul 28, 2020

nikhilcms commented Nov 2, 2020

List of the 10 training fonts? #1

List of the 10 training fonts? #1

Comments

koch-aai commented Oct 1, 2018

Shreeshrii commented Oct 1, 2018 • edited Loading

Bech007 commented Sep 10, 2019

Shreeshrii commented Sep 10, 2019

myjun1124 commented Jul 28, 2020

Shreeshrii commented Jul 28, 2020

nikhilcms commented Nov 2, 2020

Shreeshrii commented Oct 1, 2018 •

edited

Loading