Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Image OCR #228

Merged
merged 49 commits into from
Nov 2, 2020
Merged

Azure Image OCR #228

merged 49 commits into from
Nov 2, 2020

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Oct 6, 2020

Description of the Change

Add support for Azure image OCR.

Adds a new setting that is turned off by default but can be turned on to run OCR scanning on images. When an image is initially uploaded, we run a few checks on that image. First, we check if the image matches one of the supported types (JPEG, PNG, GIF, BMP). If so, we then check if we have a previous image scan and if so, we check if we have either the handwriting or text tags set, with a high confidence level (above 90%). If all those checks pass, we then run OCR scanning.

If we get a successful response back, we then parse the text out of that response, save that text to wp_content for that image and then save the full response to post meta.

We also add scan/recan functionality to the media modal and single media edit screens. If a scan is run from here, we bypass our checks, as we assume if someone is manually starting a scan, they don't care about the checks. Also added some better error handling to the scan/rescan buttons, so if a scan fails, we don't continue to show the loading icon but instead remove that, keep the button disabled and change the button text to say error (ideally we would add even better error handling in a separate PR). This is now in a separate PR: #231

TODO: last piece is to add describedby text into the content when needed. Need to support the block editor and the classic editor

Alternate Designs

Benefits

Images with text can now have that text automatically read and then inserted into the content, with the proper describedby tags. This helps provide more context for images, especially images that are screenshots of text (like social media posts).

Possible Drawbacks

With this feature turned on, will cause a slight slowdown (seems like roughly 5 extra seconds), in image processing.

Verification Process

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Applicable Issues

#111

Darin Kotter added 8 commits October 1, 2020 13:04
…erate attatchment metadata hook, that will kick off OCR processing. Add a new REST endpoint that will also utilize this same function. Add an OCR class that does all the processing. Currently only process PNG images.
…odal and single media edit screens. Bypass mime check if recan button or option is used. Add filter around approved media types
…ensions. Add support for the four file types azure supports. Some code cleanup
…out supported image types besides PNG for now
… button, to populate the image description after success.
…ng. Pass the previously run image scan into our OCR function and utilize that to determine if the image needs OCR run or not.
@jeffpaul jeffpaul requested a review from helen October 6, 2020 15:54
@jeffpaul jeffpaul added this to the 1.6.0 milestone Oct 6, 2020
@jeffpaul jeffpaul added the type:enhancement New feature or request. label Oct 6, 2020
@jeffpaul
Copy link
Member

Noting here that @dinhtungdu is going to work on scaffolding in the Gutenberg bits

dinhtungdu and others added 4 commits October 23, 2020 15:13
…encodings properly, so we don't end up with weird characters. Keep track of if we need to modify the content or not, so if we don't, we can just return the original and not risk messing anything up. Minor code formatting
…at WordPress itself does in other places. This will hopefully be more lenient across environments and encoding types
src/js/editor-ocr.js Outdated Show resolved Hide resolved
@helen
Copy link
Contributor

helen commented Oct 24, 2020

This is feeling pretty good to me for a first run. I think what we might want to do is add a field to wp_prepare_attachment_for_js() that's a bool for whether classifai_computer_vision_ocr is non-empty (like classifaiHasOcr or something) and only show the prompt in that case, not just if the description field is populated because that description field could be populated manually or from unrelated EXIF data.

src/js/editor-ocr.js Outdated Show resolved Hide resolved
@dinhtungdu
Copy link
Contributor

@helen I fixed the REST API response issue. This PR is working on my live site.

helen and others added 22 commits October 28, 2020 19:58
…art cropping button. Make sure we have HTML elements before trying to add event handlers to them
This way any additional line breaks will keep the single block with its ID intact.

Also massages the message in the modal.
Didn't realize verse was a pre without line wrapping :(
This is in-progress, see code comments for needs
… the classname we want to add. Minor linting fixes
This allows for a number of customizations, such as being able to store the text results in a different field in case you use post_content extensively in the editorial workflow already, or set other meta, or update the alt text based on the text results, and so forth.

Also adds the full $scan data as context for the `classfai_ocr_text` fitler.
fix: switch to use internal style
Copy link
Contributor

@helen helen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here, everybody!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Azure Computer Vision for OCR text generation for uploaded files
4 participants