Overlay text from Google Vision Cloud (OCR) over the original PDF-as-image to create a searchable pdf

When needed

You have a PDF as image. You have recognized the text by Google vision cloud (gvc) OCR service. And now want to apply the text to the pdf to make it searchable.

Note! This too makes sense only for PDFs recognized with GVC. For images recognized by GVC you can use better toosls like [https://github.com/dinosauria123/gcv2hocr/]. You may also try to use [https://github.com/mah-jp/pdf4search]. I failed to learn Japaneese.

Problems

It's not possible to use the same font and the same size of text so the text is added approximately to the same place, where it's on image. Do not expect 100% good resut.

Quick Examples

In exmaples folder you can find in pair of files (PDF-as-image and generater by Google vision cloud JSON) and a couple of out files (for production with hidden text and for debug with visible text).

To get the idea see a video demonstrating results [https://streamable.com/r2gyk9]

Installation

Surely you need PHP.

To use in your project:

composer require gruz/gvc-pdf2pdf

OR

git clone git@github.com:gruz/gcv-pdf2pdf.git
cd gcv-pdf2pdf
composer install

Usage

Check the code, usage is obvious. For quick start:

use GvcPdf2Pdf\PdfTextApply;

$jsonFile = __DIR__ .'/01-in-google-vision-cloud.json';
$pdfSourceFile = __DIR__ .'/01-in-pdf-as-image.pdf';

$pdfOutFile1 = __DIR__ .'/01-out-pdf-with-text-hidden.pdf';
$pdf = new PdfTextApply($jsonFile, $pdfSourceFile, $pdfOutFile1);
$pdf->watermark->text = 'gruz.ml';
$pdf->run();

Running example

Singe file

Go to the project folder if not there and run

php example/example.php

This will (renerate) example/01-out-pdf-with-text-hidden.pdf and example/01-out-pdf-with-text-visible-for-debug.pdf files.

Folder

Go to the project folder if not there and run

php example/example-folder.php

It uses files pairs placed in example/in/ and generates output to example/out/

As a result you get regenerated out pdfs.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
example		example
src		src
.gitignore		.gitignore
README.md		README.md
composer.json		composer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overlay text from Google Vision Cloud (OCR) over the original PDF-as-image to create a searchable pdf

When needed

Problems

Quick Examples

Installation

Usage

Running example

Singe file

Folder

About

Releases

Packages

Languages

gruz/gcv-pdf2pdf

Folders and files

Latest commit

History

Repository files navigation

Overlay text from Google Vision Cloud (OCR) over the original PDF-as-image to create a searchable pdf

When needed

Problems

Quick Examples

Installation

Usage

Running example

Singe file

Folder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages