-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: argument of type 'PDFObjRef' is not iterable #935
Comments
@jsvine looking forward to your help! |
Hi @caolf Just thought I'd add some info: This seems to have come up before #316 Although it seems in this case, the exception is coming from the underlying
pdfminer/pdfminer.six#495 seems to be the same bug. |
Thanks for reporting @caolf Request you to please share the PDF that has the issue too. |
@samkit-jain I'm sorry, this is an internal document and cannot be made public! |
@caolf Okay, see if you can redact the sensitive information and make it ready to attach here. If not, without it, it will be a bit difficult to properly debug and fix (if pdfplumber issue). |
Hi @samkit-jain There is an example PDF from pdfminer/pdfminer.six#495 (comment) which raises the same exception if you're interested: https://github.com/pdfminer/pdfminer.six/files/11768084/pdfminer_testpart.pdf I don't really know anything about PDF internals, but the issue seems to be the https://github.com/pdfminer/pdfminer.six/blob/master/pdfminer/pdfparser.py#L88 dic={'Length': 4065, 'Length1': 8964, 'Filter': /'FlateDecode', 'DecodeParms': <PDFObjRef:49>} |
Thanks for the PDF @cmdlineluser I'll see if there's something that we can do |
Also fixed by pdfminer/pdfminer.six#906 ! |
Thanks for noting, @dhdaines! |
Describe the bug
raise TypeError: argument of type 'PDFObjRef' is not iterable when exec extract_tables(table_settings=table_settings) for page 3 , but page 1 or page 2 is ok
Code to reproduce the problem
PDF file
Please attach any PDFs necessary to reproduce the problem.
If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.
Screenshots
pdfplumberlib.py:293:
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:300: in extract_tables
tables = self.find_tables(tset)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:294: in find_tables
return TableFinder(self, tset).tables
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/table.py:570: in init
self.edges = self.get_edges()
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/table.py:600: in get_edges
words = self.page.extract_words(**(settings.text_settings or {}))
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:356: in extract_words
return utils.extract_words(self.chars, **kwargs)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/container.py:50: in chars
return self.objects.get("char", [])
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:215: in objects
self._objects: Dict[str, T_obj_list] = self.parse_objects()
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:275: in parse_objects
for obj in self.iter_layout_objects(self.layout._objs):
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfplumber/page.py:161: in layout
interpreter.process_page(self.page_obj)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdfinterp.py:997: in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdfinterp.py:1014: in render_contents
self.init_resources(resources)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdfinterp.py:384: in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdfinterp.py:234: in get_font
font = self.get_font(None, subspec)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdfinterp.py:225: in get_font
font = PDFCIDFont(self, spec)
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdffont.py:1072: in init
ttf = TrueTypeFont(self.basefont, BytesIO(self.fontfile.get_data()))
/Users/caolf/Library/Caches/pypoetry/virtualenvs/python-demo-8KG8_SfQ-py3.11/lib/python3.11/site-packages/pdfminer/pdftypes.py:396: in get_data
self.decode()
self = <PDFStream(119): raw=64251, {'Length': 64251, 'Filter': /'FlateDecode', 'DecodeParms': PDFObjRef:133, 'Length1': 214528}>
E TypeError: argument of type 'PDFObjRef' is not iterable
Environment
looking forward to your help!
Thanks
The text was updated successfully, but these errors were encountered: