Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add level parameter to compress_content_streams #2044

Merged
merged 2 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/user/file-size.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Reduce PDF Size
# Reduce PDF File Size

There are multiple ways to reduce the size of a given PDF file. The easiest
one is to remove content (e.g. images) or pages.
Expand Down Expand Up @@ -96,6 +96,10 @@ with open("out.pdf", "wb") as f:
writer.write(f)
```

`page.compress_content_streams` uses [`zlib.compress`](https://docs.python.org/3/library/zlib.html#zlib.compress) and support the
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved
`level` paramter: `level=0` is no compression, `level=9` is the
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved
highest compression.

Using this method, we have seen a reduction by 70% (from 11.8 MB to 3.5 MB)
with a real PDF.

Expand Down
4 changes: 2 additions & 2 deletions pypdf/_page.py
Original file line number Diff line number Diff line change
Expand Up @@ -1763,7 +1763,7 @@ def scaleTo(self, width: float, height: float) -> None: # deprecated
deprecation_with_replacement("scaleTo", "scale_to", "3.0.0")
self.scale_to(width, height)

def compress_content_streams(self) -> None:
def compress_content_streams(self, level: int = -1) -> None:
"""
Compress the size of this page by joining all content streams and
applying a FlateDecode filter.
Expand All @@ -1773,7 +1773,7 @@ def compress_content_streams(self) -> None:
"""
content = self.get_contents()
if content is not None:
content_obj = content.flate_encode()
content_obj = content.flate_encode(level)
try:
content.indirect_reference.pdf._objects[ # type: ignore
content.indirect_reference.idnum - 1 # type: ignore
Expand Down
5 changes: 3 additions & 2 deletions pypdf/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,17 +225,18 @@ def _decode_png_prediction(data: str, columns: int, rowlength: int) -> bytes:
return output.getvalue()

@staticmethod
def encode(data: bytes) -> bytes:
def encode(data: bytes, level: int = -1) -> bytes:
"""
Compress the input data using zlib.

Args:
data: The data to be compressed.
level: See https://docs.python.org/3/library/zlib.html#zlib.compress

Returns:
The compressed data.
"""
return zlib.compress(data)
return zlib.compress(data, level)


class ASCIIHexDecode:
Expand Down
4 changes: 2 additions & 2 deletions pypdf/generic/_data_structures.py
Original file line number Diff line number Diff line change
Expand Up @@ -880,7 +880,7 @@ def flateEncode(self) -> "EncodedStreamObject": # deprecated
deprecation_with_replacement("flateEncode", "flate_encode", "3.0.0")
return self.flate_encode()

def flate_encode(self) -> "EncodedStreamObject":
def flate_encode(self, level: int = -1) -> "EncodedStreamObject":
from ..filters import FlateDecode

if SA.FILTER in self:
Expand Down Expand Up @@ -909,7 +909,7 @@ def flate_encode(self) -> "EncodedStreamObject":
retval[NameObject(SA.FILTER)] = f
if parms is not None:
retval[NameObject(SA.DECODE_PARMS)] = parms
retval._data = FlateDecode.encode(self._data)
retval._data = FlateDecode.encode(self._data, level)
return retval


Expand Down