-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text is Not Rendered in Certain PDFs #348
Comments
Thank you for reporting this @TechD123 ! If you don't want to share the files publicly you can send them to us via mail if that is an option. Otherwise we can wait for another file which has this issue |
I've seen the same behavior, but can't share those PDFs publicly either. @andiwand @ViliusSutkus89 let me know if you are interested in taking a look. |
can you put it into the private testing repo? otherwise you can mail it to me @TomTasche |
If I could reproduce it, I could tell if the problem is in upstream pdf2htmlEX or if it's on our side |
@TomTasche tried those pdf's on pdf2htmlEX docker image available on docker hub and it gives the same result. Although that official pdf2htmlEX release docker image is 4 years old and using outdated Poppler and FontForge. I've inspected the HTML DOM, all the text is there, but hidden by CSS. I assume it's a font issue, will try to debug |
Since PDF is quite important for our users, I'd actually propose to set a bounty for this issue. @andiwand @ViliusSutkus89 I'd love to have a fix for this soon. |
pdf2htmlEX-Android conan is nearly there. Then I can focus on debugging upstream pdf2htmlEX |
Did some debugging, usually this error happens when fontforge errors out while trying to save a "malformed" font. Don't know how much of that malform is actually bad font and how much of that is bad parsing. Will see what I can do to workaround |
…ion-bump Upgrade pdf2htmlEX-Android to 0.18.25 to workaround issue #348
A fix for that is rolling out now, thanks a lot @ViliusSutkus89 ! 🎉 |
After the aforementioned 3.26, the next/latest version on F-Droid is now 3.31 which I just gave a spin (was on 3.19 the entire time before). Unfortunately, a similar issue has appeared, where certain PDFs will either open on a blank screen or render only seemingly random words or letters. I have attached a few for convenience, but the error rate seems far higher than 3.26 based on my gut feeling. I hope you'll give it another look :) |
True. Can reproduce on all three files |
OK, I've debugged the issue. When opening documents we first try to load them through OdfLoader, if that fails, only then we try pdf2htmlEX-Android or other loaders. For most of the pdf documents OdfLoader fails. But for these documents, OdfLoader doesn't fail, and since it doesn't fail, pdf2htmlEX isn't called. Yes, the output is totally broken when parsed by OdfLoader, but OdfLoader thinks that it's ok. I've also noticed that we haven't updated pdf2htmlEX and wvWare versions here, but it's not the cause of the issue. This issue will automagically go away once we solve #369 , which is blocked by opendocument-app/OpenDocument.core#387 . Will try to get on it this week |
@TomTasche should we just surround this by detecting if it is a PDF on the platform side? |
Thanks for the investigation! I found a (dirty) workaround for now in #377 and am rolling it out as we speak. @andiwand can you check why the core does not fail for these PDFs? I can't remember exactly, but I don't think that's "working as intended". |
(Note that my change does not fix the original issue reported here. I think that's an actual bug in the PDF parser? @ViliusSutkus89) |
@TomTasche , which one looks weird? Just tried v3.32 app-pro-release.apk from GitHub releases and all 4 files (verylong.pdf, long.pdf, short.pdf and the emailed yourdocument.pdf) look decent to me. |
Nice! I haven't tested the documents in this thread, because I assumed they were caused by a different issue. @TechD123 can you confirm this is resolved with the latest version? |
I think the current version of the core will try to open and translate all PDFs. We have no way of telling it not to try. I think the only way is to catch this upfront. |
Since updating from 3.19 to the recently published 3.26 (via F-Droid), certain PDFs don't render any text. As my main use case for this app is its PDF reading feature, I have reverted to the previous version.
I'll share a link to a sample PDF once I find one that's affected. So far I've only seen this on PDFs that would leak PII if I shared them here... :P
The text was updated successfully, but these errors were encountered: