Skip to content

OpenAI Whisper Container (GPU and CPU) and Lambda (CPU) - speech recognition model

License

Notifications You must be signed in to change notification settings

ventz/whisper-openai-container

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this?

This is the OpenAI Whisper project (https://github.com/openai/whisper - Offline Speech Recognition model) - inside a Container, with an option to deploy as a stand-alone Docker container, or an AWS Lambda function that is container backed.

In summary: it lets you transcribe voice to text extremely accurately and quickly, for free.

How Do I use this?

There are 2 ways to run/interact with this:

  • As a "regular container" (Docker) or
  • As an AWS Lambda (container backed) function - via an a direct API, or S3 "put" automation.

1.) As a "regular container":

docker exec -it ventz/whisper /bin/bash"
# Assuming you have a 'recording.mp4' and have pulled it/mounted it on the container:
whisper 'recording.mp4' --language English --model base --fp16 False

2.) As an AWS Lambda (container backed) functions:

The idea is that you will setup a S3 bucket with a hook that calls this Lambda when a new object is created or dropped.

This involves:

a.) Tagging the local docker image and pushing it to ECR:

docker tag ventz/whisper:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/whisper:latest
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/whisper:latest

b.) Deploying a new Lambda function from ECR:

aws lambda create-function --region us-east-1 --function-name transcribe \
   --package-type Image  \
   --code ImageUri=<ECR Image URI>   \
   --role  arn:aws:iam::123456789012:role/service-role/transcribe

NOTE: The role needs to have: i.) AWSLambdaBasicExecutionRole (for: 'logs:CreateLogStream', and 'logs:PutLogEvents')

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Effect": "Allow",
           "Action": "logs:CreateLogGroup",
           "Resource": "arn:aws:logs:us-east-1:123456789012:*"
       },
       {
           "Effect": "Allow",
           "Action": [
               "logs:CreateLogStream",
               "logs:PutLogEvents"
           ],
           "Resource": [
               "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/transcribe:*"
           ]
       }
   ]
}

and

ii.) Write access to S3 bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<YOUR BUCKET NAME>/*"
        }
    ]
}

c.) Update code if you ever re-configure/re-build your container/Dockerfile:

# NOTE: This assumes your function was deployed with the name 'transcribe' 
aws lambda update-function-code --function-name transcribe --image-uri $(aws lambda get-function --function-name transcribe | jq -r '.Code.ImageUri')

You can check when done with:

while [ "$(aws lambda get-function --function-name transcribe | jq -r '.Configuration.LastUpdateStatus')" != "Successful" ]; do
    sleep 1
done

Works locally but not in AWS Lambda?

The container has to be amd64 due to the statically compiled ffmpeg being only amd64. This means you cannot use the ARM64 Lambdas.

If you are building the container on a Mac M# series model and pushing to ECR, replace the 1st line in the Dockerfile with:

FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.12

MANUALLY TESTING THE LAMBDA LOCALLY (not within AWS):

docker run -it --rm -d -p 9000:8080 --name whisper ventz/whisper

and then

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @test-s3-json

NOTE: This is a "fake" event just to make sure you can locally run the lambda. You will need a real s3 bucket and real file/recording + IAM permissions(see test-s3-json)

About

OpenAI Whisper Container (GPU and CPU) and Lambda (CPU) - speech recognition model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published