-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update requirement for six library: >=1.11,<2.0 #27
base: master
Are you sure you want to change the base?
Conversation
=1.11,<2.0 is the current Wagtail requirement for six
Codecov Report
@@ Coverage Diff @@
## master #27 +/- ##
==========================================
- Coverage 89.72% 87.67% -2.06%
==========================================
Files 15 15
Lines 146 146
==========================================
- Hits 131 128 -3
- Misses 15 18 +3
Continue to review full report at Codecov.
|
@khink Any suggestions for how to remedy this code coverage issue? Not sure how updating this dependency would have caused wagtail_textract to not pick up any text in your test document. Codecov says that the following lines were never even executed:
Could it be that the 10 second delay for Tesseract to work was not long enough in this case since it never even executed the As an aside, I looked at
Your note in this test file explains that 'CORRECT' and 'STAPLE' are the only two words that Tesseract DOES recognize, so I would have thought the assertion would be:
Your tests have obviously been working, so I'm sure I'm just not seeing things right. |
Hi Dan, Sorry it took some time to get back on this. It looks like you're right, that test does not test what it should, and it really doesn't prove much. I haven't been able to get the tests running myself yet due to dependency problems (new laptop). Not sure when i get around to fixing it. What would you propose to do? I'd be happy to get your PR in. |
If you would approve this PR despite the code coverage problem, then one easy, short-term solution to the code coverage problem might be to type up a few sentences, print and scan that, and upload it in place of the current document that is being OCRed. In my attempts at OCRing, I've found that Tesseract 3 does not do well at all in identifying words that are large and handwritten, but it does do pretty well on regular typewritten documents. I even OCRed some meeting minutes from the 1950s, and it did an acceptable job (though it would be nice to step up Tesseract 4 at some point; however, I realize that this would ideally happen in Textract itself since they're only at 3). If you do the new scan, change the assertions to be positive, and perhaps set the delay to a longer period to give time for the OCRing to complete, then I would think the testing issue would be resolved. |
Hi Dan, Thanks for your research. I'm reluctant to do it while tests for the PR are failing. That said, having no test is better than what we have now. I'm sorry i left it in this state. Would you be willing to make a quick fix to get tests to pass first? Perhaps you could replace the current handwritten test file with a typewritten document, which is properly recognized, so the test makes sense. Or if you decide to just test that the management command runs without error, or that Hope to hear from you. |
Hi Dan, See my reply above, how do you propose to proceed? |
=1.11,<2.0 is the current Wagtail requirement for six