-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read HTML string and generated PDF file in chunks. #949
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When the HTML content and generated PDF files get quite large (how large is too large depends on your system, OS, config and available resources), trying to read all of the content into memory can lead to `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`. Instead of reading this content entirely into memory, this content should be read in chunks to save memory usage. After some benchmarking, a chunk size of 1MB was picked (`1024 * 1024`). Here are the benchmarks comparing different methods and chunk sizes for different content sizes: ``` 13027836 bytes 13.03 MBs user system total real write: 0.000767 0.004443 0.005210 ( 0.005312) each_char: 5.756789 0.032231 5.789020 ( 5.797378) each_byte: 8.997680 0.067377 9.065057 ( 9.179755) StringIO 1 KB: 0.004029 0.006966 0.010995 ( 0.011648) StringIO 1 MB: 0.016100 0.007118 0.023218 ( 0.023509) StringIO 10 MB: 0.003347 0.006924 0.010271 ( 0.010334) StringIO 100 MB: 0.000456 0.003758 0.004214 ( 0.007080) StringIO 1 GB: 0.000468 0.003787 0.004255 ( 0.005037) 706583272 bytes 0.71 GBs user system total real write: 0.001035 0.285726 0.286761 ( 0.324529) each_char: 362.444086 1.820033 364.264119 (365.362415) each_byte: 548.788409 3.254867 552.043276 (553.390843) StringIO 1 KB: 0.310588 0.331768 0.642356 ( 0.697581) StringIO 1 MB: 0.302101 0.325285 0.627386 ( 0.671933) StringIO 10 MB: 0.254845 0.294017 0.548862 ( 0.895430) StringIO 100 MB: 0.471879 0.429933 0.901812 ( 1.181456) StringIO 1 GB: 0.000471 0.260011 0.260482 ( 0.653977) 5577825775 bytes 5.58 GBs user system total real write: ERROR ERROR ERROR ERROR each_char: 2926.215017 38.658114 2964.873131 (3008.319599) each_byte: 4305.082576 35.090730 4340.173306 (4363.944091) StringIO 1 KB: 4.145908 3.962275 8.108183 ( 9.490059) StringIO 1 MB: 3.741062 2.779802 6.520864 ( 7.423770) StringIO 10 MB: 2.916272 2.553926 5.470198 ( 6.271349) StringIO 100 MB: 4.262794 3.007702 7.270496 ( 10.986725) StringIO 1 GB: 2.063459 4.572225 6.635684 ( 9.212933) ``` You can see with the 5.58 GB content size, using `write` didn't even complete. Instead, I received a `Errno::EINVAL` error. This allows significantly large PDFs to be generated. Additionally, instead of just throwing a cryptic `Errno::EINVAL Invalid argument @ io_fread` error, I added a `rescue` that logs an error with a helpful description indicating if the HTML content or PDF file is too large.
joshuapinter
force-pushed
the
read_content_in_chunks
branch
from
October 26, 2020 02:12
e994a4e
to
7685921
Compare
unixmonkey
approved these changes
Oct 30, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. Thank you for this, and all the details in your PR, including benchmarks.
Thank you thank you. 🙏 |
Thanks for this excellent work! |
@Roy-Mao You're very welcome! Is this something that you ran into as well? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When the HTML content and generated PDF files get too large (how large is "too large" depends on your system, OS, config and available resources), trying to read all of the content into memory can lead to
Errno::EINVAL
errors likeInvalid argument @ io_fread
andInvalid argument @ io_write
.Initially, I thought this was a
wkhtmltopdf
issue so I posted an Issue there instead and quickly discovered the error was happening inwicked_pdf
prior to running thewkhtmltopdf
command.Instead of reading this content entirely into memory, this content should be read in chunks to save memory usage.
After some benchmarking, a chunk size of 1MB was picked (
1024 * 1024
). Here are the benchmarks comparing different methods and chunk sizes for different content sizes:You can see with the 5.58 GB content size, using
write
didn't even complete. Instead, I received aErrno::EINVAL
error.This allows significantly larger PDFs to be generated.
Additionally, instead of just throwing a cryptic
Errno::EINVAL Invalid argument @ io_fread
error, I added arescue
that logs an error with a helpful description indicating if the HTML content or PDF file is too large.