TypeError: argument of type 'PDFObjRef' is not iterable #1120

ibecav · 2024-04-11T18:29:28Z

Describe the bug

As with several others I have encountered this error when using the module. For example #935. I encountered it using an exact copy of your example script for extracting form values here https://github.com/jsvine/pdfplumber?tab=readme-ov-file#extracting-form-values but with the example pdf I am enclosing.

Have you tried repairing the PDF?

Yes, the results were (I had to laugh because yes, it really is a pdf file and it certainly renders correctly on screen):

Traceback (most recent call last):
  File "C:\Users\PowellCh\Desktop\RProjs\production_hai\clogged_pdf_toilet.py", line 4, in <module>
    pdf = pdfplumber.open("example.pdf", repair=True)
  File "C:\Users\PowellCh\AppData\Roaming\Python\Python312\site-packages\pdfplumber\pdf.py", line 95, in open
    return cls(
  File "C:\Users\PowellCh\AppData\Roaming\Python\Python312\site-packages\pdfplumber\pdf.py", line 45, in __init__
    self.doc = PDFDocument(PDFParser(stream), password=password or "")
  File "C:\Users\PowellCh\AppData\Roaming\Python\Python312\site-packages\pdfminer\pdfdocument.py", line 752, in __init__
    raise PDFSyntaxError("No /Root object! - Is this really a PDF?")
pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?

Code to reproduce the problem

As stated above a simple copy of one of your examples run against the example pdf.

import pdfplumber
from pdfplumber.utils.pdfinternals import resolve_and_decode, resolve

pdf = pdfplumber.open("example.pdf", repair=True)

def parse_field_helper(form_data, field, prefix=None):
    """ appends any PDF AcroForm field/value pairs in `field` to provided `form_data` list

        if `field` has child fields, those will be parsed recursively.
    """
    resolved_field = field.resolve()
    field_name = '.'.join(filter(lambda x: x, [prefix, resolve_and_decode(resolved_field.get("T"))]))
    if "Kids" in resolved_field:
        for kid_field in resolved_field["Kids"]:
            parse_field_helper(form_data, kid_field, prefix=field_name)
    if "T" in resolved_field or "TU" in resolved_field:
        # "T" is a field-name, but it's sometimes absent.
        # "TU" is the "alternate field name" and is often more human-readable
        # your PDF may have one, the other, or both.
        alternate_field_name  = resolve_and_decode(resolved_field.get("TU")) if resolved_field.get("TU") else None
        field_value = resolve_and_decode(resolved_field["V"]) if 'V' in resolved_field else None
        form_data.append([field_name, alternate_field_name, field_value])


form_data = []
fields = resolve(pdf.doc.catalog["AcroForm"])["Fields"]
for field in fields:
    parse_field_helper(form_data, field)

PDF file

FWIW it's a fillable form pdf created by the CDC and saved locally after filling.

example.pdf

Expected behavior

I expected it to work the same way your example code does. The code does work on other pdf files that aren't of this type.

Actual behavior

Traceback (most recent call last):
  File "C:\Users\PowellCh\Desktop\RProjs\production_hai\clogged_pdf_toilet.py", line 27, in <module>
    for field in fields:
TypeError: 'PDFObjRef' object is not iterable

Screenshots

I can't think of any that would be helpful but please inform if otherwise

Environment

pdfplumber version: [0.11.0]
Python version: [Python 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)] on win32]
OS: [Windows - although FWIW same error on a Mac]

Additional context

My apologies in advance if I forgot any details in this issue. I'm new to Python and your excellent module but have experience in other languages. My current hypothesis based on reading other issues is that there is something non standard about the pdf itself but I am hopeful there is a workaround.

The text was updated successfully, but these errors were encountered:

jeremybmerrill · 2024-04-19T15:05:22Z

Looks like calling resolve() on fields fixes the problem.

Replace fields = resolve(pdf.doc.catalog["AcroForm"])["Fields"] with

fields = resolve(resolve(pdf.doc.catalog["AcroForm"])["Fields"])

and it looks like it works. I think we could modify the example code to do this.

ibecav · 2024-04-19T15:19:37Z

Thank you. I'll try this fix in a little bit. As to changing the example I'll leave that to your discretion I'm by no means an expert but my understanding is that PDFs can be fickle and as I noted your example does work on some PDFs as is.

@jeremybmerrill

h/t @jeremybmerrill

ibecav · 2024-04-19T15:52:31Z

Thank you, that does indeed seem to resolve the error.

jsvine · 2024-04-19T16:03:20Z

Thanks @jeremybmerrill for the solution, and @ibecav for flagging. I've now updated the example code in the README.

jeremybmerrill · 2024-04-19T16:07:10Z

great! I'm by no means an expert either -- all standards-compliant PDFs are alike, but all weird PDFs are weird in their own unique way -- but I do know that calling resolve() at every opportunity seems to make problems disappear.

ibecav added the bug label Apr 11, 2024

jsvine added a commit that referenced this issue Apr 19, 2024

Update form-parsing code in README, per #1120

2e9819c

h/t @jeremybmerrill

jsvine closed this as completed Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: argument of type 'PDFObjRef' is not iterable #1120

TypeError: argument of type 'PDFObjRef' is not iterable #1120

ibecav commented Apr 11, 2024

jeremybmerrill commented Apr 19, 2024

ibecav commented Apr 19, 2024

ibecav commented Apr 19, 2024

jsvine commented Apr 19, 2024

jeremybmerrill commented Apr 19, 2024

TypeError: argument of type 'PDFObjRef' is not iterable #1120

TypeError: argument of type 'PDFObjRef' is not iterable #1120

Comments

ibecav commented Apr 11, 2024

Describe the bug

Have you tried repairing the PDF?

Code to reproduce the problem

PDF file

Expected behavior

Actual behavior

Screenshots

Environment

Additional context

jeremybmerrill commented Apr 19, 2024

ibecav commented Apr 19, 2024

ibecav commented Apr 19, 2024

jsvine commented Apr 19, 2024

jeremybmerrill commented Apr 19, 2024