,____ |---.\ ___ | ` / .-\ ./=) | |"|_/\/| ; |-;| /_| / \_| |/ \ | / \/\( | | / |` ) | / \ _/ | /--._/ \ | `/|) | / / | | .' | | / \ | (_.-.__.__./ /
Grim is a simple gem for extracting (reaping) a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
You will need ghostscript, imagemagick, and poppler installed. On the Mac (OSX) I highly recommend using Homebrew to get them installed.
brew install ghostscript imagemagick poppler
gem install grim
pdf = Grim.reap("/path/to/pdf") # returns Grim::Pdf instance for pdf
count = pdf.count # returns the number of pages in the pdf
png = pdf[3].save('/path/to/image.png') # will return true if page was saved or false if not
text = pdf[3].text # returns text as a String
pdf.each do |page|
puts page.text
end
We also support using other processors (the default is whatever version of Imagemagick/Ghostscript is in your path).
# specifying one processor with specific ImageMagick and GhostScript paths
Grim.processor = Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/convert", :ghostscript_path => "/path/to/gs"})
# multiple processors with fallback if first fails, useful if you need multiple versions of convert/gs
Grim.processor = Grim::MultiProcessor.new([
Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.7/convert", :ghostscript_path => "/path/to/9.04/gs"}),
Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.6/convert", :ghostscript_path => "/path/to/9.02/gs"})
])
pdf = Grim.reap('/path/to/pdf)
See LICENSE for details.