Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF only embeds first 19 external images, then blanks. No error present #1085

Open
jathayde opened this issue Dec 9, 2023 · 3 comments
Open

Comments

@jathayde
Copy link

jathayde commented Dec 9, 2023

Issue description

Generating a PDF of many cloth patches (collector resource), I end up with the cover image and the first 18 items rendering correctly, and the rest (which are all the same partial as the first 18) not rendering the image (but rendering the partial/CSS/HTML). Images are included using a method based off this comment here: #36 (comment)

My full method:

  require 'open-uri'
  def embed_remote_image(url, content_type)
    asset = URI.open(url, "r:UTF-8", &:read)
    base64 = Base64.encode64(asset.to_s).gsub(/\s+/, "")
    "data:#{content_type};base64,#{Rack::Utils.escape(base64)}"
  rescue OpenURI::HTTPError => e
    if e.message == '404 Not Found'
      Rails.logger.debug "Missing file"
    elsif e.message == '403 Forbidden'
      Rails.logger.debug "Forbidden file"
    else
      Rails.logger.debug { "HTTP Error: #{e.message}" }
    end
  end

The PDF build is being triggered through a sidekiq process, and the cover is built separately, with the same image command, and merged together at the end. Images are on S3, with Cloudfront in front of it. They render fine on the web version of the page. header and footer are also included PDF files. ulimit is unlimited on both dev (macOS) and prod (Ubuntu 20.04.6 LTS "Focal")

Expected or desired behavior

All of the images would render correctly.

System specifications

wicked_pdf gem version (output of cat Gemfile.lock | grep wicked_pdf): 2.6.4

wkhtmltopdf version (output of wkhtmltopdf --version): 0.12.6 (with patched qt)

whtmltopdf provider gem and version if one is used: wkhtmltopdf-binary 0.12.6.6 (uses Heroku version 2.12.6.0 in production, with similar results)

platform/distribution and version (e.g. Windows 10 / Ubuntu 16.04 / Heroku cedar): This is running on macOS Sonoma, Ruby 3.2.2 (2023-03-30 arm64-darwin22) and Rails 7.1.2. Production is Ubuntu 20.04.6LTS but via dokku instances.

@dmitry
Copy link

dmitry commented Jan 26, 2024

Have you tried to render the same image more than 18 times (have you tried to debug URI.read return?)?
I've tried to render base64 images and it can render 500 easily.

@unixmonkey
Copy link
Collaborator

I suspect your issue is that the image has to load from a remote source multiple times, which is longer than the timeout to get assets, so wkhtmltopdf gives up.

You can try adjusting the timeout, or using the window_status setting. Though, if the images are the same repeated, I'd suggest caching them so they don't have to re-download every time.

Maybe something like this (untested, but I'm sure you'll see where I'm going):

def embed_remote_image(url, content_type)
  # Setup data store to memoize assets already downloaded.
  @assets ||= @assets.presence || {}
    
  # Return early from cache if already cached.
  return @assets[url] if @assets[url].present?

  asset = URI.open(url, "r:UTF-8", &:read)
  base64 = Base64.encode64(asset.to_s).gsub(/\s+/, "")
  result = "data:#{content_type};base64,#{Rack::Utils.escape(base64)}"

  # Cache { url: result } to @assets hash for later requests to the same URL.
  @assets[url] = result

  result
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    Rails.logger.debug "Missing file"
  elsif e.message == '403 Forbidden'
    Rails.logger.debug "Forbidden file"
  else
    Rails.logger.debug { "HTTP Error: #{e.message}" }
  end
end

@jathayde
Copy link
Author

Interesting follow up - I've been trying to work on this intermittently. A real-time render on a similar PDF on the site will do hundreds of images no problem. Only when it kicks to a background task does this happen. So somewhere in either the controller or the task, this is failing silently. There's nothing glaring in console (but it also runs by a mile a minute pulling everything in). I'm guessing something is breaking in the background task, and not with the image fetching itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants