Can't getData() from /Contents List #1086
Replies: 5 comments
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Unfortunately, I can't share these PDFs. I'm not sure what created them either. I will see if I can't find or create a PDF that exhibits the same behavior. The structure of the PDF seems to be multiple separate tables per page, and these separate tables are getting their own indirect object. This is partly why I am trying to use getData--I need to figure out which page contains a certain table (they can vary in length and thus pagination is not constant) and then figure out where on the page it is located. |
Beta Was this translation helpful? Give feedback.
-
I actually had this same error on this document: http://www.supremecourt.gov/opinions/14pdf/14-7955_aplc.pdf Haven't spelunked to find out what's going on, but thought I'd share. |
Beta Was this translation helpful? Give feedback.
-
As in you new pdf file the
has been changed to:
you will have to modify the syntax to : page = PdfReader(inpdf).pages[0]
text = page.getContents()[_n_].getData() # where _n_ is an index to locate the indirectObject location. |
Beta Was this translation helpful? Give feedback.
-
I am facing the same issue... seeking help
|
Beta Was this translation helpful? Give feedback.
-
I'm trying to dig deep into some PDFs by calling getData directly on part of a page (I am then parsing that data to find coordinates for a bit of text).
This worked for me in the past with essentially:
but with my new PDFs, I am getting an error like this:
"AttributeError: 'ArrayObject' object has no attribute 'getData'"
Digging in, it looks like my old PDF was structured like this (print page) with a single IndirectObject in the contents.
Then page.GetContents() returns:
while my new PDF is structured like this with a list of IndirectObjects in the contents:
then page.getContents() returns:
How do I get at the underlying data of /Contents? going after the pieces of the list with page.getContents()[0] just returns the name of the object and I can't use getData() on that. I can't tell if this is a bug (caused by having a list as the contents) or if I am missing some feature.
Beta Was this translation helpful? Give feedback.
All reactions