Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove references to "kur" and "tgl", add "fil" to man page #3165

Merged
merged 1 commit into from
Dec 3, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions doc/tesseract.1.asc
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ following languages:
*est* (Estonian),
*eus* (Basque),
*fas* (Persian),
*fil* (Filipino),
*fin* (Finnish),
*fra* (French),
*frk* (Frankish),
Expand Down Expand Up @@ -232,7 +233,6 @@ following languages:
*kmr* (Kurdish Kurmanji),
*kor* (Korean),
*kor_vert* (Korean vertical),
*kur* (Kurdish),
*lao* (Lao),
*lat* (Latin),
*lav* (Latvian),
Expand Down Expand Up @@ -277,7 +277,6 @@ following languages:
*tat* (Tatar),
*tel* (Telugu),
*tgk* (Tajik),
*tgl* (Tagalog),
Copy link
Member

@stweil stweil Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tgl is available for tessdata (so we should not remove it here), but is missing for tessdata_fast and tessdata_best which is strange. We could copy the LSTM part to tessdata_fast if that helps.

Should we remove tgl from tessdata and from langdata_lstm which also have the successor fil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

Would you like me to also submit an update for that page?

In general it is not clear to me what the authoritative source on all the "officially" supported/included languages is. I used to think the man page, but now I assume the tessdata repos themselves.

In any case it's confusing when the documentation points to language data that does not exist - my first inclination was search ubuntu packages, where I did find some old ones, but importing those was likely not a good idea, which is why I started digging some more. If it helps I can try to automatically match all the other languages in the man page (or the wiki link) to the ones installed by ubuntu packaging and what is available in tessdata.

Copy link
Contributor Author

@MerlijnWajer MerlijnWajer Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shreeshrii - not sure why my comment went to this thread (I clicked quote reply), but my above comment is in reply to your message.

*tha* (Thai),
*tir* (Tigrinya),
*ton* (Tonga),
Expand Down