Skip to content

Latest commit

 

History

History
81 lines (51 loc) · 5.07 KB

File metadata and controls

81 lines (51 loc) · 5.07 KB

Document Reader

The Document Reader stack let you deploy a serverless infrastructure to build OCR application that produce spoken text extracted from images you can provide throught REST API. It follows the producer/consumer schema presented here and uses a bunch of services from AWS:

Getting Started

This stack inherits from the Producer/Consumer stack the same logic regarding the request of URL and pushing of the message: the extension is around an Lambda that act as a consumer of the object uploaded and by using Rekognition and Polly create speach of text recognized in the images. Below a schema of the stack as is:

architecture_schema

Another schema with the parts in common with the Producer/Consumer and the Upload Form stacks.

architecture_schema

A blog post is available here.

To use the stack / modify it, just clone the repository and move to the templates/document-reader folder starting from the root of the repository, like this:

git clone https://github.com/made2591/immutable.templates
cd immutable.templates/templates/document-reader
# start deploy (see later)

Architecture overview

The user asks to API Gateway (1) for a pre-signed URL to upload a document. The API Gateway will trigger a Lambda function (2) that will invoke the getSignedUrl URL action by using the S3 API (3) and provides back the URL to API Gateway (4) - that will forwards it directly to the user (6). The user is now able to push his document to S3 with the provided URL (7). When the document is uploaded, S3 will put a message over an SQS queue (8). The consumer will be able to retrieve the reference to the document sent by polling the SQS Queue (9). With this message, the consumer can ask to API Gateway the permission to retrieve the original document produced (10). Once the pre-signed URL is generated and sent back (11), it can retrieve safely the content of the message directly from S3 (12). A consumer Lambda provides the document retrieved (12) to Rekognition service (13) and get back the extracted text (13) if any. After that, it sends this text to Polly (15) and gets back an AudioStream (14) ready to be uploaded to S3. Before going ahead, it saves the references of the document, the extracted test, and the produced output to a DynamoDB table (17). Finally, it saves the AudioStream as a .mp3 file to S3 (18), where the document was originally stored by the user.

Prerequisites

The only needed tool is Node.js - ≥ 8.11.x - and the AWS Cloud Development Kit - AWS CDK. You can install it by running

$ npm i -g aws-cdk

Deployment of Stack

Just as any other CDK stack, this are the main commands that can help you with the most common actions:

  • npm run build compile typescript to js
  • npm run watch watch for changes and compile
  • cdk deploy deploy this stack to your default AWS account/region
  • cdk diff compare deployed stack with current state
  • cdk synth emits the synthesized CloudFormation template

Built With

Contributing

Please read CONTRIBUTING.md for details on how to contact me.

Authors

Almost all the stacks proposed in this repository, and their implementation, are deeply discussed between people below:

Thank you for your interest!

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • Fix architecture schemas
  • Inspiration
  • etc