Fix Czech language diacritics #13914
Replies: 4 comments 7 replies
-
How did you test Czech lang? Can you provide a demo? |
Beta Was this translation helpful? Give feedback.
-
In detail, I found: |
Beta Was this translation helpful? Give feedback.
-
The Czech language is treated as a Latin language in processing, using the character dictionary from https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppocr/utils/dict/latin_dict.txt. Check if it includes all Czech characters. If the dictionary is appropriate, you might need to fine-tune the OCR models for your specific scenario. |
Beta Was this translation helpful? Give feedback.
-
Oh, It seems that some of these characters are missing (not all of them), indeed. How to fix that? Where exactly to add them? |
Beta Was this translation helpful? Give feedback.
-
Hi, this is an awesome project. I first tested with Chinese/English subtitles and was quite impressed. However, testing the Czech language, I found a weird thing - it comes out quite well, except for one significant detail: all diacritical marks come out with an umlaut like in German ü ö, etc., and no correct diacritics for Czech letters like ě,š,č,ř,ž.... Could anyone fix this, please?
Beta Was this translation helpful? Give feedback.
All reactions