ValueError and KeyErrors in PDFDocument #573

tongbaojia · 2021-01-27T15:54:26Z

I ran into a Value error and then Key Error in the process of trying to fix the bug
I checked out the dev branch and reproduced the bug using: python pdf2txt.py xxx.pdf.
I couldn't share the document now because it is private and protected.
But the Stacktrace tells the story:

Traceback (most recent call last):
  File "pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/high_level.py", line 83, in extract_text_to_fp
    caching=not disable_caching):
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 572, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 826, in read_xref_from
    self.read_xref_from(parser, pos, xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 815, in read_xref_from
    xref.load(parser)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 233, in load
    or stream['Type'] is not LITERAL_XREF:
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdftypes.py", line 219, in __getitem__
    return self.attrs[name]
KeyError: 'Type'


Traceback (most recent call last):
  File "pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/high_level.py", line 83, in extract_text_to_fp
    caching=not disable_caching):
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 572, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 826, in read_xref_from
    self.read_xref_from(parser, pos, xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 815, in read_xref_from
    xref.load(parser)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 231, in load
    (_, stream) = parser.nextobject()
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/psparser.py", line 610, in nextobject
    self.do_keyword(pos, token)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfparser.py", line 72, in do_keyword
    ((_, objid), (_, genno)) = self.pop(2)
ValueError: not enough values to unpack (expected 2, got 1)

I will file a PR along soon to fix this.

The text was updated successfully, but these errors were encountered:

pietermarsman · 2022-08-07T12:49:45Z

Fixed by #574

tongbaojia mentioned this issue Jan 27, 2021

Fix value error and key error #574

Merged

6 tasks

pietermarsman closed this as completed Aug 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError and KeyErrors in PDFDocument #573

ValueError and KeyErrors in PDFDocument #573

tongbaojia commented Jan 27, 2021

pietermarsman commented Aug 7, 2022

ValueError and KeyErrors in PDFDocument #573

ValueError and KeyErrors in PDFDocument #573

Comments

tongbaojia commented Jan 27, 2021

pietermarsman commented Aug 7, 2022