-
Notifications
You must be signed in to change notification settings - Fork 438
FAQ
Old European texts often use Fraktur or historic Antiqua fonts with long s and ligatures. Those texts require special Tesseract models as the standard models like eng
, deu
or script/Latin
don't recognize them good.
Several models are available for such old texts. deu_frak
is a model which was trained for Tesseract 3. The current standard models are frk
and script/Fraktur
. In addition, there exist models trained by UB Mannheim which often give better results.
This user contributed model only supports the legacy (pattern based) OCR engine, so does not work with a LSTM neural network which typically can achieve better OCR results. The legacy engine has one advantage: it can detect character attributes like cursive or fat.
This is the standard model for German Fraktur texts. It includes a German dictionary. The model has some restrictions regarding the character set which it can recognize. It also has problems especially with ch
and ck
ligatures.
This is the standard model for European Fraktur and historic Antiqua texts. It supports a wider character set than frk
, but has similar problems with ch
and ck
.
Those models typically give the best results. They eliminate the problems of frk
and script/Fraktur
and know different variants of the German umlauts. These variants are available:
-
models based on
script/Fraktur
- models trained from scratch
- models trained from Austrian newspapers with Fraktur
- latest models trained in 2021 (not always the best)
All those models work without any dictionary. Older Tesseract versions therefore show a warning which can simply be ignored. frak2021_1.069 is a model where we added a dictionary.