This simple python project lets you convert the audio of a file into searchable text by using cloud computing resources from Azure Cognitive Services.
- Python 3
- Instance of Azure Speech Service
- Recommended audio format:
- type: WAV (required)
- precision: 16-bit
- sample rate: 8kHz or 16kHz
- channel: mono
- Create free Azure Subscripition
- Create free instance of Speech service (5 audio hours per month)
The default audio format for the recognition to work is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). You can convert your audio with this Online Audio Converter.
-
Create virutal environment for installing the dependencies
python3 -m venv venv
-
Activate virtual environment
# Linux source venv/bin/activate # Windows .\venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
- Get API key and region of your Speech service resource
- Enter API key and location into env_sample.txt
- Enter input path, output path and language of your audio file into env_sample.txt
- Rename the file to .env
python3 transcription.py