Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError and KeyErrors in PDFDocument #573

Closed
tongbaojia opened this issue Jan 27, 2021 · 1 comment
Closed

ValueError and KeyErrors in PDFDocument #573

tongbaojia opened this issue Jan 27, 2021 · 1 comment

Comments

@tongbaojia
Copy link
Contributor

  • I ran into a Value error and then Key Error in the process of trying to fix the bug
  • I checked out the dev branch and reproduced the bug using: python pdf2txt.py xxx.pdf.
    I couldn't share the document now because it is private and protected.
  • But the Stacktrace tells the story:
Traceback (most recent call last):
  File "pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/high_level.py", line 83, in extract_text_to_fp
    caching=not disable_caching):
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 572, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 826, in read_xref_from
    self.read_xref_from(parser, pos, xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 815, in read_xref_from
    xref.load(parser)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 233, in load
    or stream['Type'] is not LITERAL_XREF:
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdftypes.py", line 219, in __getitem__
    return self.attrs[name]
KeyError: 'Type'


Traceback (most recent call last):
  File "pdf2txt.py", line 204, in <module>
    sys.exit(main())
  File "pdf2txt.py", line 198, in main
    outfp = extract_text(**vars(A))
  File "pdf2txt.py", line 66, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/high_level.py", line 83, in extract_text_to_fp
    caching=not disable_caching):
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 572, in __init__
    self.read_xref_from(parser, pos, self.xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 826, in read_xref_from
    self.read_xref_from(parser, pos, xrefs)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 815, in read_xref_from
    xref.load(parser)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfdocument.py", line 231, in load
    (_, stream) = parser.nextobject()
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/psparser.py", line 610, in nextobject
    self.do_keyword(pos, token)
  File "/Users/baojiatong/Work/pdfminer.six/pdfminer/pdfparser.py", line 72, in do_keyword
    ((_, objid), (_, genno)) = self.pop(2)
ValueError: not enough values to unpack (expected 2, got 1)

I will file a PR along soon to fix this.

@pietermarsman
Copy link
Member

Fixed by #574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants