Update requirement for six library: >=1.11,<2.0 #27

DanielSwain · 2019-04-23T14:03:52Z

=1.11,<2.0 is the current Wagtail requirement for six

codecov · 2019-04-23T14:10:20Z

Codecov Report

Merging #27 into master will decrease coverage by 2.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #27      +/-   ##
==========================================
- Coverage   89.72%   87.67%   -2.06%     
==========================================
  Files          15       15              
  Lines         146      146              
==========================================
- Hits          131      128       -3     
- Misses         15       18       +3

Impacted Files	Coverage Δ
src/wagtail_textract/handlers.py	`61.9% <0%> (-14.29%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 645b454...f83d134. Read the comment docs.

DanielSwain · 2019-04-23T16:36:01Z

@khink Any suggestions for how to remedy this code coverage issue? Not sure how updating this dependency would have caused wagtail_textract to not pick up any text in your test document. Codecov says that the following lines were never even executed:

29.  if text:
30.     document.transcription = text.decode()
31.     document.save(transcribe=False)

Could it be that the 10 second delay for Tesseract to work was not long enough in this case since it never even executed the if text: statement?

As an aside, I looked at test_management_command.py and don't understand why it has:

assert 'CORRECT' not in document.transcription
assert 'STAPLE' not in document.transcription

Your note in this test file explains that 'CORRECT' and 'STAPLE' are the only two words that Tesseract DOES recognize, so I would have thought the assertion would be:

assert 'CORRECT' in document.transcription
assert 'STAPLE' in document.transcription

Your tests have obviously been working, so I'm sure I'm just not seeing things right.

khink · 2019-05-03T07:13:03Z

Hi Dan,

Sorry it took some time to get back on this. It looks like you're right, that test does not test what it should, and it really doesn't prove much. I haven't been able to get the tests running myself yet due to dependency problems (new laptop). Not sure when i get around to fixing it. What would you propose to do? I'd be happy to get your PR in.

DanielSwain · 2019-05-13T12:56:37Z

If you would approve this PR despite the code coverage problem, then one easy, short-term solution to the code coverage problem might be to type up a few sentences, print and scan that, and upload it in place of the current document that is being OCRed. In my attempts at OCRing, I've found that Tesseract 3 does not do well at all in identifying words that are large and handwritten, but it does do pretty well on regular typewritten documents. I even OCRed some meeting minutes from the 1950s, and it did an acceptable job (though it would be nice to step up Tesseract 4 at some point; however, I realize that this would ideally happen in Textract itself since they're only at 3). If you do the new scan, change the assertions to be positive, and perhaps set the delay to a longer period to give time for the OCRing to complete, then I would think the testing issue would be resolved.

khink · 2019-05-15T07:22:15Z

Hi Dan,

Thanks for your research.

I'm reluctant to do it while tests for the PR are failing. That said, having no test is better than what we have now. I'm sorry i left it in this state.

Would you be willing to make a quick fix to get tests to pass first? Perhaps you could replace the current handwritten test file with a typewritten document, which is properly recognized, so the test makes sense. Or if you decide to just test that the management command runs without error, or that document.transcription is not empty, that's also fine by me.

Hope to hear from you.

khink · 2021-02-04T08:01:17Z

Hi Dan,

See my reply above, how do you propose to proceed?

Update requirement for six library: >=1.11,<2.0

f83d134

=1.11,<2.0 is the current Wagtail requirement for six

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update requirement for six library: >=1.11,<2.0 #27

Update requirement for six library: >=1.11,<2.0 #27

DanielSwain commented Apr 23, 2019

codecov bot commented Apr 23, 2019 •

edited

Loading

DanielSwain commented Apr 23, 2019

khink commented May 3, 2019

DanielSwain commented May 13, 2019

khink commented May 15, 2019

khink commented Feb 4, 2021 •

edited

Loading

Update requirement for six library: >=1.11,<2.0 #27

Are you sure you want to change the base?

Update requirement for six library: >=1.11,<2.0 #27

Conversation

DanielSwain commented Apr 23, 2019

codecov bot commented Apr 23, 2019 • edited Loading

Codecov Report

DanielSwain commented Apr 23, 2019

khink commented May 3, 2019

DanielSwain commented May 13, 2019

khink commented May 15, 2019

khink commented Feb 4, 2021 • edited Loading

codecov bot commented Apr 23, 2019 •

edited

Loading

khink commented Feb 4, 2021 •

edited

Loading