Feature: Azure PDF scan #282

dinhtungdu · 2021-05-23T00:58:11Z

Description of the Change

This PR utilizes the Azure Computer Vision Read API to extract text from multi-page PDF files. It supports both textbase and text-heavy image base pdf files.

Because of the Read API design, this feature uses WP Cron to periodically check and grab the result.

Verification Process

Go to ClassifAI > Image Processing.
See the new setting Enable Scanning PDF.
Enable that feature.
Upload a PDF file.
Right after the file is uploaded, open its media modal, see Classifai Read PDF field with a disabled In progress! button.
Wait for some minutes for API to process the file, check the modal again, see the description field filled with content of the PDF file.
Open the attachment detail page, see a new metabox Classifai PDF Processing with rescan checkbox.

Checklist:

I have read the CONTRIBUTING document.
My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my change.
All new and existing tests passed.

Applicable Issues

Changelog Entry

includes/Classifai/Providers/Azure/ComputerVision.php

includes/Classifai/Providers/Azure/Read.php

dkotter

Haven't fully tested this out yet but code looks good. Left a few minor comments

dinhtungdu · 2021-06-02T04:35:15Z

@dkotter Thanks for the head up, I fixed those typo issues.

phpbits · 2021-07-13T17:07:04Z

@jeffpaul @dinhtungdu Confirming that this feature is working as expected. I followed the steps and the PDF file was scanned successfully. See screenshot below:

helen

Nice, it's working! We probably need to get smarter with the scan button in a future release, because you can do something like request a scan, switch items, come back, and request another scan. It should probably check the status any time that button is loaded up, or load the button with AJAX entirely.

dinhtungdu added 8 commits May 17, 2021 10:46

initial read module

fa014be

Log error to attachment post meta

88d3a69

save response to database

b6737b1

store read result

5ff828a

show images related actions for image only

6c36ff2

add rescan button and checkbox

6fb3960

update feature description

b1a6b8a

refactoring

37f7858

dinhtungdu self-assigned this May 23, 2021

dinhtungdu requested review from dkotter and helen May 23, 2021 01:09

dinhtungdu added 2 commits May 23, 2021 01:17

fix: wording

aaa127d

fix: grammar

068bfb1

jeffpaul added this to the 1.7.0 milestone May 24, 2021

dkotter reviewed May 28, 2021

View reviewed changes

includes/Classifai/Providers/Azure/ComputerVision.php Outdated Show resolved Hide resolved

dkotter reviewed May 28, 2021

View reviewed changes

includes/Classifai/Providers/Azure/Read.php Outdated Show resolved Hide resolved

dkotter reviewed May 28, 2021

View reviewed changes

includes/Classifai/Providers/Azure/Read.php Outdated Show resolved Hide resolved

dkotter reviewed May 28, 2021

View reviewed changes

fix typo issues

542fcb5

jeffpaul requested a review from dkotter June 2, 2021 04:53

dinhtungdu linked an issue Jun 2, 2021 that may be closed by this pull request

Update to v3 API to gain PDF OCR functionality #265

Closed

jeffpaul mentioned this pull request Jul 7, 2021

Release version 1.7.0 #289

Closed

21 tasks

jeffpaul and others added 3 commits August 26, 2021 09:15

Merge branch 'develop' into features/pdf-scanning

fe37b0e

Merge branch 'develop' into features/pdf-scanning

5a7a4bf

Avoid PHP warning

7e1c8ae

helen approved these changes Aug 26, 2021

View reviewed changes

helen merged commit 0b72995 into develop Aug 26, 2021

helen deleted the features/pdf-scanning branch August 26, 2021 22:27

jeffpaul mentioned this pull request Aug 31, 2021

PDF scan button improvement #307

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Azure PDF scan #282

Feature: Azure PDF scan #282

dinhtungdu commented May 23, 2021 •

edited

Loading

dkotter left a comment

dinhtungdu commented Jun 2, 2021

phpbits commented Jul 13, 2021

helen left a comment

Feature: Azure PDF scan #282

Feature: Azure PDF scan #282

Conversation

dinhtungdu commented May 23, 2021 • edited Loading

Description of the Change

Verification Process

Checklist:

Applicable Issues

Changelog Entry

dkotter left a comment

Choose a reason for hiding this comment

dinhtungdu commented Jun 2, 2021

phpbits commented Jul 13, 2021

helen left a comment

Choose a reason for hiding this comment

dinhtungdu commented May 23, 2021 •

edited

Loading