Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.error: 'l' format requires -2147483648 <= number <= 2147483647 #186

Closed
thehayat opened this issue Sep 27, 2018 · 1 comment · Fixed by #352
Closed

struct.error: 'l' format requires -2147483648 <= number <= 2147483647 #186

thehayat opened this issue Sep 27, 2018 · 1 comment · Fixed by #352

Comments

@thehayat
Copy link

There are thousands of PDFs for which this error is occurring. I am attaching whole traceback.

Traceback (most recent call last):
  File "/mnt/c/Users/aq4'july/Desktop/CleanTextAPI_withNewPdfAnalyser/PdfAnalyzer.py", line 108, in GetTextFromPdf
    for page in PDFPage.get_pages(pdfFile):
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfpage.py", line 129, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 577, in __init__
    self._initialize_password(password)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 603, in _initialize_password
    handler = factory(docid, param, password)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 303, in __init__
    self.init()
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 310, in init
    self.init_key()
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 323, in init_key
    self.key = self.authenticate(self.password)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 372, in authenticate
    key = self.authenticate_user_password(password)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 378, in authenticate_user_password
    key = self.compute_encryption_key(password)
  File "/usr/local/lib/python3.5/dist-packages/pdfminer/pdfdocument.py", line 357, in compute_encryption_key
    hash.update(struct.pack('<l', self.p))  # 4
struct.error: 'l' format requires -2147483648 <= number <= 2147483647

However these PDFs were successfully parsed before and text was extracted. Here is the sample PDF
Note: I'm have pdfminer.six==20170720 and Python3.5.2

@pietermarsman
Copy link
Member

pietermarsman commented Jan 6, 2020

I have the same results with the current version of pdfminer.six.

I did some research to understand this error and this is what I understood.

The PDF reference (Table 3.19) says something about the P key in the encryption dictionairy:

(Required) A set of flags specifying which operations are permitted when the document is opened with user access (see Table 3.20).

And:

The value of the P entry is an unsigned 32-bit integer containing a set of flags specifying which access permissions should be granted when the document is opened with user access. Table 3.20 shows the meanings of these flags. Bit positions within the flag word are numbered from 1 (low-order) to 32 (high-order). A 1 bit in any position enables the corresponding access permission. Which bits are meaningful, and in some cases how they are interpreted, depends on the security handler’s revision number (specified in the encryption dictionary’s R entry).

The struct.pack('<l', self.p) command packs creates a bytes object with the least significant byte at the first address containing a signed long integer of 4 bytes (i.e. 32 bits).

And another paragraph about the encoding of the P key:

Note: PDF integer objects are represented internally in signed twos-complement form. Since all the reserved high-order flag bits in the encryption dictionary’s P val- ue are required to be 1, the value must be specified as a negative integer. For exam- ple, assuming revision 2 of the security handler, the value -44 permits printing and copying but disallows modifying the contents and annotations.

And thus, the struct.pack command erroneously uses a signed long instead of an unsigned long.

pietermarsman added a commit that referenced this issue Jan 7, 2020
… file trailer, as unsigned long (#352)

Fixes #186 

* Tread the permissions (the /P entry) as unsigned long, fix #186

* handle negative values for p

* Extract function for resolving an twos-complement

* Add test for issue #352

* Add line to CHANGELOG.md

* Only ints can be converted to a uint using two's-complement method

* Standardize import style; multiple imports from same module on one line

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants