Podcast Generator

An AI-powered podcast generator that converts PDF documents into natural-sounding conversations.

Prerequisites

Docker
Docker Compose

Setup Instructions

Clone the repository:

git clone git@github.com:anandrmedia/podcastgen.git
cd podcastgen

Create a .env file from the example:
```
cp .env.example .env
```
Edit .env and add your OpenAI API key.

Configure LLM: In server/src/index.ts, configure the LLM settings according to your needs:

const llm = new LLM({
    baseUrl: "https://api.openai.com/v1/",    // For OpenAI
    // baseUrl: "https://api.deepseek.com/v1", // For Deepseek
    model: "gpt-4",                          // Model name
    apiKey: process.env.OPENAI_API_KEY       // API key from .env
});

Supported LLM providers:

OpenAI (api.openai.com)
Deepseek (api.deepseek.com)
Any compatible API with OpenAI format

Create required directories:

mkdir -p server/src/tmp_data
mkdir -p server/src/generated-files/scripts
mkdir -p voices

Download Piper voice models:
- Download the required .onnx voice models from https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/
- Place them in the voices directory Required models (folder structure should exactly be like this):
- voices/lessac/medium/en_US-lessac-medium.onnx
- voices/lessac/medium/en_US-lessac-medium.onnx.json
- voices/kusal/en_US-kusal-medium.onnx
- voices/kusal/en_US-kusal-medium.onnx.json
Start the application:
```
docker-compose up
```

The application will be available at:

Web UI: http://localhost:3001
API: http://localhost:3000

Usage

Open http://localhost:3001 in your browser
Upload a PDF file using the "Upload PDF" button
Wait for the processing to complete
Select the generated script from the sidebar
Click "Play" to start the podcast

Development

To stop the application:

docker-compose down

To rebuild the containers after making changes:

docker-compose up --build

Troubleshooting

If you encounter any issues:

Check the Docker logs:
```
docker-compose logs
```
Ensure all required voice models are in the voices directory
Verify your OpenAI API key is correct
Make sure ports 3000 and 3001 are available on your system

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Third-Party Licenses

This project uses several third-party components, including:

Piper for text-to-speech conversion
Piper voice models from rhasspy/piper-voices

For detailed license information of third-party components, see THIRD_PARTY_LICENSES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
server		server
web		web
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
docker-compose.yml		docker-compose.yml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast Generator

Prerequisites

Setup Instructions

Usage

Development

Troubleshooting

License

Third-Party Licenses

About

Releases

Packages

Languages

License

meanands/podcastgen

Folders and files

Latest commit

History

Repository files navigation

Podcast Generator

Prerequisites

Setup Instructions

Usage

Development

Troubleshooting

License

Third-Party Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages