-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for tesseract 5.x.y #3673
Comments
What about releasing a 5.0.1 after Christmas at the end of December? I think there are several fixes since 5.0.0 which would be good for a new release. |
Mind reader :-) |
Right before tagging 5.0.1, you can update this sentence from the README:
|
What should be added into v5? |
We already have a wish list for improved training, a lot of issues with layout detection, want improved logging, and much more. Maintaining two branches did not work good with 4.x, and I am afraid it would not work better with 5.x. |
Maybe keep 5.0 as is? It is a good release with a number of changes. |
Do you plan to release 5.0.1 next week? |
Yes, unless we discover that something very important is still missing. |
There is still no fix, and I have no |
clang-cl is not worth it currently. |
You can release 5.0.1 without the clang-cl fix. |
Release 5.0.1 is now online. |
The next release could be a new minor version 5.1.0 with new features, maybe end of January (unless there is an urgent need for a bug fix release 5.0.2). I want to have especially image information in ALTO and hOCR output (see PR #3710 which implements that for hOCR), maybe more from the project list. The new minor release would also disable OpenMP by default for autoconf builds, too. |
https://packages.ubuntu.com/search?keywords=tesseract-ocr Are you going to update Ubuntu 22.04 to 5.0.1 soon? The feature freeze date is February 24. |
i uploaded:
I hope @jbreiden will upload them to debian. |
Hi @AlexanderP,
From https://tracker.debian.org/pkg/tesseract :
So, why can't you directly push new versions of Tesseract to Debian? |
I'd like to create a new release Tesseract 5.1.0 soon. Originally I had planned it for end of January. Are there any contributions or important bug fixes which should be included still pending (then I'd wait), or can we release now? |
I suggest you go ahead with 5.1.0 now. I would like to see improvements related to training and evaluation implemented, but they could go in a future release. |
Release 5.1.0 is now available. |
@amitdo no rights to upload to debian |
There are now several fixes and improvements in git master, so I think it's time for a new release 5.1.1. @egorpugin, is it possible to fix the CI sw build which is currently failing? Are there any other contributions or important bug fixes which should be included still pending (then I'd wait), or can we release now? Ideally #3782 should also be included. |
Yes, I'll check. |
Unfortunately windows build does not work (for me): I tried Clang (14) and MS Visual Studio (2019). Here are logs: |
|
I fixed sw build in ci. |
Let's cpntinue the discusion about the pdf renderer in issue #2879. |
OK. I decided to remove my objection to the recent changes in the pdf renderer. |
What about the useless OpenCL code? It's about time we removed it. |
@jbarlow83, are the latest changes in Tesseract's PDF renderer compatible with OCRmyPDF, or would they break it? |
@stweil The changes in the PDF renderer are compatible with OCRmyPDF and yield a slight improvement in text positioning on Evince. LGTM. I tested Tesseract commit 2b07505 which includes egorpugin's changes by examining visual results in Evince using both OCRmyPDF's wrapper around the Tesseract PDF renderer ( |
The next release will be 5.4.0. |
amitdo commented Mar 18, 2024 •
Done in #4220. |
Can you please release 5.4.0 in the next few days? |
That's my plan. |
on Apr 24
So... what's your current plan? |
Done now, 5.4.0 is available. Sorry for the delay. And as always many thanks to all contributors and supporters who helped with issues, discussions and pull requests. Some things (pull rquests) remain open for follow-up releases. |
The list of changes is generated automatically by GitHub which only uses the information from pull request. Therefore direct commits which were made by maintainers might be missing. I updated the release information now with an initial comment, but feel free to suggest further improvements (or change the release notes as required). |
I think we need a 5.4.1 because of a regression with legacy models (issue #4257) which is now fixed in main. Is there anything else which should be included in the bug fix release? |
GA cmake win64 build started to crash 4 days ago. Also, I was able to replicate the problem described here tesseract-ocr/tesstrain#394 on Windows with the 5.4.0 code: git clone --depth 1 https://github.com/tesseract-ocr/tesstrain
cd tesstrain
mkdir data
unzip ocrd-testset.zip -d data/ocrd-testset-ground-truth
make training MODEL_NAME=ocrd-testset START_MODEL=ces TESSDATA=..../tessdata/best 2>&1 | tee training.log I am checking whether the lastest code fixed also this... |
I agree that it would be good to clarify these two issues, but I cannot reproduce them up to now. |
GA cmake win64 seem like GA/Win env issue, so you do not need to wait for this (I already have minimal working version, now I try to add steps to find out real cause of problem) |
Seems like |
The new release 5.4.1 which fixes the regression with legacy or mixed models is now available. Many thanks for bug reports, reviews and other contributions! |
5.4.1 has one issue. It uses bundled googletest included in source tree as submodules. |
It's time for a new bug fix release. Is there anything urgent which should be included or fixed in the next release? |
I am in the process of creating cmake files with autotools (leptonica has it already) This is not critical, but it takes more time than I expect it... |
... and it currently breaks the autotools builds. |
This is unrelated topic as cmake generate tesseract.pc from other template ( |
@stweil, please go ahead with a new release. |
I'll try to fix the CI failures before tagging a new release. |
I've checked this issue
I propose to impove memory management of public APIs in tess v6 because it is API breakage. In addition C API implementation will be updated from
to
So C API will be retained the same. So,
|
I just added #4336, and we can discuss and track API changes there. |
I suggest to focus on 5.x for 2022 at least.
That means we should not break the API (and ABI?). Use C++17, not C++20/C++23.
The text was updated successfully, but these errors were encountered: