An AI-powered podcast generator that converts PDF documents into natural-sounding conversations.
- Docker
- Docker Compose
-
Clone the repository:
git clone git@github.com:anandrmedia/podcastgen.git cd podcastgen
-
Create a
.env
file from the example:cp .env.example .env
-
Edit
.env
and add your OpenAI API key. -
Configure LLM: In
server/src/index.ts
, configure the LLM settings according to your needs:const llm = new LLM({ baseUrl: "https://api.openai.com/v1/", // For OpenAI // baseUrl: "https://api.deepseek.com/v1", // For Deepseek model: "gpt-4", // Model name apiKey: process.env.OPENAI_API_KEY // API key from .env });
Supported LLM providers:
- OpenAI (api.openai.com)
- Deepseek (api.deepseek.com)
- Any compatible API with OpenAI format
-
Create required directories:
mkdir -p server/src/tmp_data mkdir -p server/src/generated-files/scripts mkdir -p voices
-
Download Piper voice models:
- Download the required .onnx voice models from https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/
- Place them in the
voices
directory Required models (folder structure should exactly be like this): - voices/lessac/medium/en_US-lessac-medium.onnx
- voices/lessac/medium/en_US-lessac-medium.onnx.json
- voices/kusal/en_US-kusal-medium.onnx
- voices/kusal/en_US-kusal-medium.onnx.json
-
Start the application:
docker-compose up
The application will be available at:
- Web UI: http://localhost:3001
- API: http://localhost:3000
- Open http://localhost:3001 in your browser
- Upload a PDF file using the "Upload PDF" button
- Wait for the processing to complete
- Select the generated script from the sidebar
- Click "Play" to start the podcast
To stop the application:
docker-compose down
To rebuild the containers after making changes:
docker-compose up --build
If you encounter any issues:
- Check the Docker logs:
docker-compose logs
- Ensure all required voice models are in the
voices
directory - Verify your OpenAI API key is correct
- Make sure ports 3000 and 3001 are available on your system
This project is licensed under the MIT License - see the LICENSE.md file for details.
This project uses several third-party components, including:
- Piper for text-to-speech conversion
- Piper voice models from rhasspy/piper-voices
For detailed license information of third-party components, see THIRD_PARTY_LICENSES.md.