Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read HTML string and generated PDF file in chunks. #949

Merged
merged 1 commit into from
Oct 30, 2020

Commits on Oct 26, 2020

  1. Read HTML string and generated PDF file in chunks.

    When the HTML content and generated PDF files get quite large (how large is too large depends on your system, OS, config and available resources), trying to read all of the content into memory can lead to `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`.
    
    Instead of reading this content entirely into memory, this content should be read in chunks to save memory usage.
    
    After some benchmarking, a chunk size of 1MB was picked (`1024 * 1024`). Here are the benchmarks comparing different methods and chunk sizes for different content sizes:
    
    ```
    13027836 bytes
    13.03 MBs
                               user     system      total        real
    write:                 0.000767   0.004443   0.005210 (  0.005312)
    each_char:             5.756789   0.032231   5.789020 (  5.797378)
    each_byte:             8.997680   0.067377   9.065057 (  9.179755)
    StringIO 1 KB:         0.004029   0.006966   0.010995 (  0.011648)
    StringIO 1 MB:         0.016100   0.007118   0.023218 (  0.023509)
    StringIO 10 MB:        0.003347   0.006924   0.010271 (  0.010334)
    StringIO 100 MB:       0.000456   0.003758   0.004214 (  0.007080)
    StringIO 1 GB:         0.000468   0.003787   0.004255 (  0.005037)
    
    706583272 bytes
    0.71 GBs
                               user     system      total        real
    write:                 0.001035   0.285726   0.286761 (  0.324529)
    each_char:           362.444086   1.820033 364.264119 (365.362415)
    each_byte:           548.788409   3.254867 552.043276 (553.390843)
    StringIO 1 KB:         0.310588   0.331768   0.642356 (  0.697581)
    StringIO 1 MB:         0.302101   0.325285   0.627386 (  0.671933)
    StringIO 10 MB:        0.254845   0.294017   0.548862 (  0.895430)
    StringIO 100 MB:       0.471879   0.429933   0.901812 (  1.181456)
    StringIO 1 GB:         0.000471   0.260011   0.260482 (  0.653977)
    
    5577825775 bytes
    5.58 GBs
                                user     system       total         real
    write:                     ERROR      ERROR       ERROR        ERROR
    each_char:           2926.215017  38.658114 2964.873131 (3008.319599)
    each_byte:           4305.082576  35.090730 4340.173306 (4363.944091)
    StringIO 1 KB:          4.145908   3.962275    8.108183 (   9.490059)
    StringIO 1 MB:          3.741062   2.779802    6.520864 (   7.423770)
    StringIO 10 MB:         2.916272   2.553926    5.470198 (   6.271349)
    StringIO 100 MB:        4.262794   3.007702    7.270496 (  10.986725)
    StringIO 1 GB:          2.063459   4.572225    6.635684 (   9.212933)
    ```
    
    You can see with the 5.58 GB content size, using `write` didn't even complete. Instead, I received a `Errno::EINVAL` error.
    
    This allows significantly large PDFs to be generated.
    
    Additionally, instead of just throwing a cryptic `Errno::EINVAL Invalid argument @ io_fread` error, I added a `rescue` that logs an error with a helpful description indicating if the HTML content or PDF file is too large.
    joshuapinter committed Oct 26, 2020
    Configuration menu
    Copy the full SHA
    7685921 View commit details
    Browse the repository at this point in the history