A cutting-edge web application leveraging AI to perform multilingual speech recognition and transcription.
This web application processes uploaded audio files and transcribes spoken language into text, supporting multiple languages. It is designed to provide high accuracy using Whisper. The app is ideal for generating accessible content, translating recorded meetings, or transcribing multilingual audio files.
Speech-to-text transcription. Supports multiple languages. Intuitive and responsive web interface. AI-powered for high transcription accuracy.
Frontend: HTML, CSS, JavaScript
Backend: FastAPI, WebSockets
AI Model: Whisper (fine-tuned version)
Other Tools: Python, Hugging Face, Lora, Bitsandbytes
Python 3.9+: Ensure that Python is installed. You can download it from python.org.
Visual Studio Code: A lightweight and powerful editor with extensions like Python and Prettier. Download it from code.visualstudio.com.
Google Colab Pro : Use this for fine-tuning and training the Whisper model, offering enhanced computing resources.
Download the files to your local machine.
Launch the application in Visual Studio Code.
Choose a file to transcribe and upload it through the web interface.
View the transcription results once the file is processed.
This project is licensed under the MIT License. See the LICENSE file for details.