Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypdf.errors.PdfStreamError: Stream length not defined #1715

Closed
FelipeErmeson opened this issue Mar 15, 2023 · 6 comments · Fixed by #1717
Closed

pypdf.errors.PdfStreamError: Stream length not defined #1715

FelipeErmeson opened this issue Mar 15, 2023 · 6 comments · Fixed by #1717
Labels
is-robustness-issue From a users perspective, this is about robustness

Comments

@FelipeErmeson
Copy link

FelipeErmeson commented Mar 15, 2023

I can read the pdf normally, but when I try to create a new one by adding the page, it gives an error.

Environment

print(pypdf.__version__)"
3.5.2

Code + PDF

Here is the document to reproduce the error:
https://drive.google.com/file/d/15Y96AUD7sCoclk3_n-jLp_YOycaWp-KX/view?usp=sharing

from pypdf import PdfReader, PdfWriter

reader = PdfReader("cf danilo.pdf")
num_pages = len(reader.pages) 
for n in range(num_pages):
    writer = PdfWriter() 
    page = reader.pages[n] 
    writer.add_page(page)

Traceback

Traceback (most recent call last):
  File "/.../main.py", line 8, in <module>
    writer.add_page(page)
    ^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/_writer.py", line 349, in add_page
    return self._add_page(page, list.append, excluded_keys)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/_writer.py", line 301, in _add_page
    page = cast("PageObject", page_org.clone(self, False, excluded_keys))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 182, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 268, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 182, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 268, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 182, in clone
    d__._clone(self, pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 268, in _clone
    v.clone(pdf_dest, force_duplicate, ignore_fields)
  File "/.../pypdf/pypdf/generic/_base.py", line 273, in clone
    obj = self.get_object()
          ^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/generic/_base.py", line 290, in get_object
    obj = self.pdf.get_object(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/_reader.py", line 1351, in get_object
    retval = read_object(self.stream, self)  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 1127, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pypdf/pypdf/generic/_data_structures.py", line 450, in read_from_stream
    raise PdfStreamError("Stream length not defined")
pypdf.errors.PdfStreamError: Stream length not defined
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Mar 15, 2023
the field /Length is normally required, but acrobat reader and other reader are tolerant

closes py-pdf#1715
@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Mar 15, 2023
@pubpub-zz
Copy link
Collaborator

@MartinThoma so quick.
The fix is available for review

@FelipeErmeson
Your PDF actually is not in accordance with pdf 1.7 specification : Length is normally mandatory.
As Acrobat and other readers are tolerant, I've added so code for tolerance.
be aware that Warnings will be reported

MartinThoma pushed a commit that referenced this issue Mar 18, 2023
The field /Length is normally required, but Acrobat Reader and other readers are tolerant

Closes #1715
@MartinThoma
Copy link
Member

@pubpub-zz Thank you 🚀

@MartinThoma
Copy link
Member

The fix by @pubpub-zz was just merged and will be released today in pypdf > 3.5.2

@MartinThoma
Copy link
Member

@FelipeErmeson Thanks for reporting it! If you want, I'll add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

@FelipeErmeson
Copy link
Author

Hi @MartinThoma. Glad to have helped, it will be an honor.

MartinThoma added a commit that referenced this issue Mar 19, 2023
@MartinThoma
Copy link
Member

Done :-) It might take a few minutes until the docs are re-created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants