This Python project lets you create multiple transcripts from YouTube videos using Whisper AI. Originally designed for Google Colab, it now works both locally and on Colab with GPU acceleration.
- Downloads and processes multiple YouTube videos from
youtube_urls.txt
- Creates accurate transcripts using OpenAI's Whisper AI model
- Automatically manages audio files (downloads and cleanup)
- Organized output structure with all transcripts in a dedicated directory
- Comprehensive error handling and logging
- GPU acceleration support (both local and Colab)
- Uses
youtube-dl
in nightly mode for better compatibility
transcriptor/
├── models/ # Stores Whisper AI models
├── transcripts/ # Stores generated transcripts
├── config.py # Configuration settings
├── main.py # Main transcription logic
├── download.py # Model download script
└── youtube_urls.txt # Input YouTube URLs
- Python 3.x
- whisper
- torch
- youtube-dl (included)
- Clone the repository:
git clone https://github.com/byigitt/transcriptor.git cd transcriptor
- Install dependencies:
pip install whisper torch chmod 755 youtube-dl
- Download the Whisper model:
python download.py
- Create
youtube_urls.txt
with your YouTube URLs (one per line) - Run the transcription:
python main.py
- Create a new Colab notebook
- Change runtime type to GPU
- Clone the repository:
!git clone https://github.com/byigitt/transcriptor.git %cd transcriptor
- Install dependencies and run:
!chmod 755 youtube-dl !pip install whisper torch !python download.py !python main.py
The project uses config.py
for centralized settings:
- Model selection and device settings
- Input/output paths configuration
- YouTube download settings
- Logging configuration
- Transcripts are saved in the
transcripts/
directory - Each transcript is named after its video with
-transcript.txt
suffix - Audio files are automatically cleaned up after transcription
- If you encounter GPU-related errors, the system will automatically fall back to CPU
- Check the logs for detailed error messages and debugging information
- Make sure your YouTube URLs are valid and accessible
- Keep the Colab tab open during processing to prevent file deletion
Feel free to:
- Open an issue for bugs or questions
- Submit pull requests for improvements
- Check existing issues for common problems
Google Colab provides free GPU access and faster processing. The project works particularly well with Turkish language content (tested with Google Oyun ve Uygulama Akademisi education videos) but supports all languages supported by Whisper AI.