Add `wicked_pdf_url_base64`. #947

joshuapinter · 2020-10-21T22:40:15Z

We're using this in production to solve our problem. Looking to get some feedback from @mileszs before finalizing this properly and, hopefully, getting it merged to help others in similar situations.

Using URLs to reference images can cause a lot of problems with wkhtmltopdf, particularly when used in the header and footer. It produces errors such as "Too many open files" as well as buffer/stack overflows. These issues can be seen here, among other places:

Large pdf generation header and footer fails #288
*** buffer overflow detected ***: wkhtmltopdf terminated wkhtmltopdf/wkhtmltopdf#1819
QT crashes when callin wkhtmltopdf with large number of inputs wkhtmltopdf/wkhtmltopdf#3019
Buffer Overflow with large PDF wkhtmltopdf/wkhtmltopdf#2093
Wkhtmltopdf exited with status 134 *** buffer overflow detected *** wkhtmltopdf/wkhtmltopdf#4762
Too many open files issue wkhtmltopdf/wkhtmltopdf#3081
Crash with html footers and headers on huge pdfs. Increase count of open files per user not possible for my case. wkhtmltopdf/wkhtmltopdf#4296

As described here, one of the key solutions to avoiding this issue is providing the image encoded as base64. This helper method takes a URL of an image, opens the URL, reads the image data and encodes it to base64.

If the URL cannot be open (OpenURI::HTTPError is raised), it will log a warning and return nil.

TODOs:

Better placement (since this is not technically part of the asset pipeline)?
Better naming?
Tests.

unixmonkey · 2020-10-27T14:08:30Z

lib/wicked_pdf/wicked_pdf_helper/assets.rb

+      # Using `image_tag` with URLs when generating PDFs (specifically large PDFs with lots of pages) can cause buffer/stack overflows.
+      #
+      def wicked_pdf_url_base64(url)
+        asset = URI.parse(url).open


Does read_from_uri work here instead? This project has been cautioned against using URI#open for security reasons.

Let me try it out. I original just had open() and Rubocop cautioned against it and in favour of URI.parse. Hang on...

Okay, using read_from_uri won't work because it just returns a String and we need the proper response in order to get the content_type.

I was able to get it to work using:

response = Net::HTTP.get_response(URI(url)) base64 = Base64.encode64(response.body).gsub(/\s+/, '') "data:#{response.content_type};base64,#{Rack::Utils.escape(base64)}"

Does that work?

P.s. We're still on Rails 4.2 but this made me open up an Issue with our own codebase to replace all uses of URI.parse.

Okay, updated the code and also had to change how failed responses were handled.

Note, I wanted to setup a guard like:

if !response.is_a?(Net::HTTPSuccess) Rails.logger.warn("[wicked_pdf] #{response.code} #{response.message}: #{url}") return nil end

But your Rubocop config wanted me to change it to:

unless response.is_a?(Net::HTTPSuccess) Rails.logger.warn("[wicked_pdf] #{response.code} #{response.message}: #{url}") nil end

And I despise that with a passion (we have that Cop changed in our Rubocop config) so went with the if/else instead as it's much more clear.

Using URLs to reference images can cause a lot of problems with wkhtmltopdf, particularly when used in the header and footer. It produces errors such as "Too many open files" as well as buffer/stack overflows. A good solution to this is providing the image encoded as base64. This helper method takes a URL of an image, opens the URL, reads the image data and encodes it to base64. If the URL does not have a successful response (`response.is_a?(Net::HTTPSuccess) == false`), it will log a warning and return `nil`. We could optionally make this raise but that should be a project-wide change. TODO: - [ ] Better placement (since this is not technically part of the asset pipeline)? - [ ] Better naming? - [ ] Tests.

unixmonkey

Thanks for this!

joshuapinter · 2020-10-30T16:42:34Z

👍 Thanks for a great library.

joshuapinter mentioned this pull request Oct 22, 2020

Buffer Overflow with large PDF wkhtmltopdf/wkhtmltopdf#2093

Open

joshuapinter force-pushed the image_url_to_base64_helper branch from 5115057 to 1dad7bc Compare October 26, 2020 02:11

joshuapinter marked this pull request as ready for review October 26, 2020 02:21

joshuapinter marked this pull request as draft October 26, 2020 02:21

unixmonkey reviewed Oct 27, 2020

View reviewed changes

joshuapinter force-pushed the image_url_to_base64_helper branch from 1dad7bc to 113e247 Compare October 27, 2020 17:08

unixmonkey added the hacktoberfest-accepted label Oct 30, 2020

unixmonkey marked this pull request as ready for review October 30, 2020 16:32

unixmonkey approved these changes Oct 30, 2020

View reviewed changes

unixmonkey merged commit 60b6c40 into mileszs:master Oct 30, 2020

joshuapinter deleted the image_url_to_base64_helper branch October 30, 2020 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `wicked_pdf_url_base64`. #947

Add `wicked_pdf_url_base64`. #947

joshuapinter commented Oct 21, 2020 •

edited

Loading

unixmonkey Oct 27, 2020

joshuapinter Oct 27, 2020

joshuapinter Oct 27, 2020

joshuapinter Oct 27, 2020

unixmonkey left a comment

joshuapinter commented Oct 30, 2020 •

edited

Loading

Add wicked_pdf_url_base64. #947

Add wicked_pdf_url_base64. #947

Conversation

joshuapinter commented Oct 21, 2020 • edited Loading

TODOs:

unixmonkey Oct 27, 2020

Choose a reason for hiding this comment

joshuapinter Oct 27, 2020

Choose a reason for hiding this comment

joshuapinter Oct 27, 2020

Choose a reason for hiding this comment

joshuapinter Oct 27, 2020

Choose a reason for hiding this comment

unixmonkey left a comment

Choose a reason for hiding this comment

joshuapinter commented Oct 30, 2020 • edited Loading

Add `wicked_pdf_url_base64`. #947

Add `wicked_pdf_url_base64`. #947

joshuapinter commented Oct 21, 2020 •

edited

Loading

joshuapinter commented Oct 30, 2020 •

edited

Loading