Skip to content

This is a script to automate Interview transcription and proofreading and transcript formatting into .docx. Uses AWS Transcribe, S3, and OpenAI API (GPT3.5).

Notifications You must be signed in to change notification settings

uitrial/Interview-Transcribe-Proofread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Interview Transcribe and Proofread

This is a script to automate Interview transcription and proofreading and transcript formatting into .docx. Uses AWS Transcribe, S3, and OpenAI API (GPT3.5). Produces very high-quality transcriptions, even with very bad sound quality input files.

Installation

  • Python 3.10+
  • AWS IAM with access to S3, AWS Transcribe
  • S3 bucket with public access - so this script can upload and read bucket contents.
  • OpenAI API key
  • See .envsample for what is needed
  pip3 install -r requirements.txt
  • Rename .envsample to .env and save with your keys and bucket name.

Usage/Examples

usage: process_transcripts.py [-h] input_folder s3_folder

positional arguments:
  input_folder  Input folder with .mp4 videos
  s3_folder     Output folder name in S3 bucket

e.g:

python3 process_transcripts.py /Users/kvyb/Documents/Uitrial_Interviews testing_proofread

Features

  • Uploads .mp4 2-speaker video to S3 bucket
  • Transcribes speaker voices into text
  • Proofreads the transcribed text, improving quality
  • Formats the text into a .docx format. Output sample:

Transcript output formatting sample

Feedback

If you have any feedback, please reach out to me.

Note:

  • Only supports .mp4
  • Still needs a quick manual proof-read. AWS Transcribe isn't perfect.
  • Costs approximately $0.08 in total cost per 60-minute interview.

Uitrial TMS Badge

About

This is a script to automate Interview transcription and proofreading and transcript formatting into .docx. Uses AWS Transcribe, S3, and OpenAI API (GPT3.5).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages