Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading incorrect version/revision of a PDF #30

Open
JonasEklundh opened this issue Sep 9, 2024 · 0 comments
Open

Reading incorrect version/revision of a PDF #30

JonasEklundh opened this issue Sep 9, 2024 · 0 comments

Comments

@JonasEklundh
Copy link

Took a long time to debug this, I am creating a PDF that also includes another PDF, when reading that PDF, and saving the new file, the imported PDF showed incorrect information.

Turns out that this PDF, created by a customer in Indesign, had several "EOF" notations in the file, and when viewing the PDF the correct data was shown, but when parsed by tcpdi this was read incorrectly. I do not know how the original PDF file came to have multiple EOF and how that works other than I know it was created with InDesign, but I could verify this by editing the PDF file in a text editor, removing one of the EOF "parts" and save the file, and suddenly it had the wrong data in it.

The discrepancy seems to come from how a PDF viewer reads and displays the file and how tcpdi reads and imports it. I can of course let my customers know about this, but I would also like to make sure that tcpdi behaves as expected when these problems occur. Is this a known problem?

Example files, below is a PDF file which has the price "9 500 kronor inklusive moms" in the second square:
Avtal_Nyanslutning_Hoor_Maglehill_Fiber_Privat_9_500-2.pdf

When opening this file in a reader, it displays correctly. But if I open this file in a text editor and remove one part delimited by EOF, and save it, this is how it is shown in a PDF viewer:

Avtal_Nyanslutning_Hoor_Maglehill_Fiber_Privat_9_500-2 copy.pdf

I haven't "edited" the file, only removed some part delimited by EOF.

The problem is that if I use the first file, the original file, and read it with tcpdi, it will appear as the second file, due to a discrepancy in how it handles EOF (presumably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant