Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved memory usage #1032

Merged
merged 1 commit into from
Nov 6, 2021
Merged

Improved memory usage #1032

merged 1 commit into from
Nov 6, 2021

Conversation

Masterwow3
Copy link
Contributor

We have been using pdf-lib for the last 4 months to generate our PDFs on the WEB and on the server. I noticed that for a PDF with 180 pages and high resolution PNGs, the RAM usage increases a lot (~6GB).

My code change ensures that after a page.drawImage you can directly call an image.embed().

This decodes and encodes the image immediately and remove the decoded image from RAM immediately.
With this change I have the same performance but a maximum RAM usage of 220MB.

Usage without these changes:
image
Max RAM usage:
image

Usage with these changes:
image
Max RAM usage:
image

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Masterwow3! Thanks for investigating this issue and opening a PR! I suspect this is a problem lots of people have run into, so it'll be great to release a fix 😃.

I like the general approach to took to eliminate the memory leak. However, I would like to change a few of the implementation details before merging:

  • Please use undefined instead of null. The null value isn't used in the pdf-lib codebase.
  • Instead of clearing the PngEmbedder.image property, we should clear the PDFImage.embedder property after the image has been embedded.
  • Please add unit tests to ensure that (1) the property is cleared after the first embed, and (2) a single PDFDocument object with an embedded image can still be updated and saved multiple times (unless there's already a test for this).

Let me know if you have any questions!

@ritterzk
Copy link

ritterzk commented Nov 2, 2021

Hello @Hopding,

i am pleased that the amendment is accepted. I have adapted the commit as you requested.

The unneeded field "alreadyEmbedded" has been removed.
(https://github.com/Hopding/pdf-lib/pull/1032/files#diff-2e6b05e8490d2c6cc7ca78f78e8fa58bcaef509fbc97754da0808bca0f9d47ffL38)

Additionally, I made the embed() method parellel safe.
(https://github.com/Hopding/pdf-lib/pull/1032/files#diff-2e6b05e8490d2c6cc7ca78f78e8fa58bcaef509fbc97754da0808bca0f9d47ffR131)

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @Masterwow3!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants