TypeError: 'PDFObjRef' object is not iterable #1004

corobin · 2024-07-10T00:03:30Z

after updating to version 20240706 extract_text() on a pdf throws an error TypeError: 'PDFObjRef' object is not iterable

this did not occur on the previous version 20231228

Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> from pdfminer.high_level import extract_text
>>> text = extract_text("Working.pdf")
>>> text = extract_text("Error.pdf")
Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    text = extract_text(path)
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\high_level.py", line 169, in extract_text
    for page in PDFPage.get_pages(
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 171, in get_pages
    for (pageno, page) in enumerate(cls.create_pages(doc)):
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 127, in create_pages
    yield cls(document, objid, tree, next(page_labels))
  File "C:\Program Files\Python312\Lib\site-packages\pdfminer\pdfpage.py", line 63, in __init__
    mediabox_params: List[Any] = [
TypeError: 'PDFObjRef' object is not iterable
>>>

Working.pdf - newly created blank page with acrobat

Error.pdf - downloaded, I cannot change the process of its creation. I deleted all visible text on the page which did not appear to affect the behaviour of the error

The text was updated successfully, but these errors were encountered:

felixxm · 2024-07-10T09:39:00Z

We hit the same issue with next(high_level.extract_pages(pdf_page_path)) calls.

myhloli · 2024-07-23T15:52:22Z

same error with this：opendatalab/MinerU#198

dhdaines · 2024-07-31T19:46:33Z

Probably need to call resolve1 on self.attrs["MediaBox"] as well... it's indirect objects all the way down...

MarcoPeli · 2024-08-03T00:49:50Z

Probably need to call resolve1 on self.attrs["MediaBox"] as well... it's indirect objects all the way down...

I had same error, using resolve1 fixed it for me.

… not iterable` This fixes upstream issue pdfminer/pdfminer.six#1004 and the build of python3Packages.pdfplumber.

jsvine mentioned this issue Jul 14, 2024

Update version of pdfminer-six to 20240706 jsvine/pdfplumber#1166

Open

dotlambda mentioned this issue Jul 25, 2024

python312Packages.pdfminer-six: 20231228 -> 20240706 NixOS/nixpkgs#329409

Merged

13 tasks

dhdaines added a commit to dhdaines/pdfminer.six that referenced this issue Jul 31, 2024

fix: dereference MediaBox (fixes: pdfminer#1004)

ad101c1

dhdaines linked a pull request Jul 31, 2024 that will close this issue

Make sure to dereference MediaBox in /Pages #1027

Open

sarahec mentioned this issue Sep 4, 2024

build failure: python3Packages.pdf-plumber unit tests fail due to error in python3Packages.pdfminer-six NixOS/nixpkgs#339639

Closed

dotlambda added a commit to dotlambda/nixpkgs that referenced this issue Sep 5, 2024

python312Packages.pdfminer-six: fix `TypeError: 'PDFObjRef' object is…

5410e1e

… not iterable` This fixes upstream issue pdfminer/pdfminer.six#1004 and the build of python3Packages.pdfplumber.

dotlambda mentioned this issue Sep 5, 2024

python312Packages.pdfminer-six: fix TypeError: 'PDFObjRef' object is not iterable NixOS/nixpkgs#339919

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: 'PDFObjRef' object is not iterable #1004

TypeError: 'PDFObjRef' object is not iterable #1004

corobin commented Jul 10, 2024 •

edited

Loading

felixxm commented Jul 10, 2024

myhloli commented Jul 23, 2024

dhdaines commented Jul 31, 2024

MarcoPeli commented Aug 3, 2024

TypeError: 'PDFObjRef' object is not iterable #1004

TypeError: 'PDFObjRef' object is not iterable #1004

Comments

corobin commented Jul 10, 2024 • edited Loading

felixxm commented Jul 10, 2024

myhloli commented Jul 23, 2024

dhdaines commented Jul 31, 2024

MarcoPeli commented Aug 3, 2024

corobin commented Jul 10, 2024 •

edited

Loading