Replies: 1 comment 4 replies
-
Hi @kimyu92, Unfortunately PDFs put a lot of document information at the end of the file, so you usually need to scan the whole thing before starting. As soon as you call Perhaps pdfium is less greedy? I've not tested it for this. libvips will render a page at a time, so the actual rendering process shouldn't need that much memory.
This is a consequence of the way that libvips handles multipage images -- it represents them as a single very tall, thin image, with the pages joined together vertically (a "toilet-roll" image, sorry). If your PDF has pages that are all the same size (for example, it has no pages in landscape), then you can load the whole PDF in one go and loop over pages without reinitialisation. Sadly many PDFs are not like this, so to work for all PDFs, where each page can be a different size, you need to reinitialise. With a PDF where all pages are the same size you can do:
Then you can use pyvips has |
Beta Was this translation helpful? Give feedback.
-
Would it be possible to tweak the following example to stream pdf page by page without loading the whole pdf to memory?
Also, it seems weird that we have to initialize the object again to get to a particular page. Shouldn't there be an api like pdf.get_page(index) which equivalent to
Vips::Image.pdfload(file_name, access: :sequential, page: page_index)
?Beta Was this translation helpful? Give feedback.
All reactions