diff --git a/docs/source/usage/api.md b/docs/source/usage/api.md new file mode 100644 index 0000000..931b414 --- /dev/null +++ b/docs/source/usage/api.md @@ -0,0 +1,179 @@ + +# Podcastfy REST API Documentation + +## Overview + +The Podcastfy API allows you to programmatically generate AI podcasts from various input sources. This document outlines the API endpoints and their usage. + +## Using cURL with Podcastfy API + +### Prerequisites +1. Confirm cURL installation: +```bash +curl --version +``` + +### API Request Flow +Making a prediction requires two sequential requests: +1. POST request to initiate processing - returns an `EVENT_ID` +2. GET request to fetch results - uses the `EVENT_ID` to fetch results + +Between step 1 and 2, there is a delay of 1-3 minutes. We are working on reducing this delay and implementing a way to notify the user when the podcast is ready. Thanks for your patience! + +### Basic Request Structure +```bash +# Step 1: POST request to initiate processing +# Make sure to include http:// or https:// in the URL +curl -X POST https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs \ + -H "Content-Type: application/json" \ + -d '{ + "data": [ + "text_input", + "https://yourwebsite.com", + [], # pdf_files + [], # image_files + "gemini_key", + "openai_key", + "elevenlabs_key", + 2000, # word_count + "engaging,fast-paced", # conversation_style + "main summarizer", # roles_person1 + "questioner", # roles_person2 + "Introduction,Content,Conclusion", # dialogue_structure + "PODCASTFY", # podcast_name + "YOUR PODCAST", # podcast_tagline + "openai", # tts_model + 0.7, # creativity_level + "" # user_instructions + ] + }' + +# Step 2: GET request to fetch results +curl -N https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs/$EVENT_ID + + +# Example output result +event: complete +data: [{"path": "/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "url": "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "size": null, "orig_name": "podcast_81106b4ca62542f1b209889832a421df.mp3", "mime_type": null, "is_stream": false, "meta": {"_type": "gradio.FileData"}}] + +``` + +You can download the file by extending the URL prefix "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=" with the path to the file in variable `path`. (Note: The variable "url" above has a bug introduced by Gradio, so please ignore it.) + +### Parameter Details +| Index | Parameter | Type | Description | +|-------|-----------|------|-------------| +| 0 | text_input | string | Direct text input for podcast generation | +| 1 | urls_input | string | URLs to process (include http:// or https://) | +| 2 | pdf_files | array | List of PDF files to process | +| 3 | image_files | array | List of image files to process | +| 4 | gemini_key | string | Google Gemini API key | +| 5 | openai_key | string | OpenAI API key | +| 6 | elevenlabs_key | string | ElevenLabs API key | +| 7 | word_count | number | Target word count for podcast | +| 8 | conversation_style | string | Conversation style descriptors (e.g. "engaging,fast-paced") | +| 9 | roles_person1 | string | Role of first speaker | +| 10 | roles_person2 | string | Role of second speaker | +| 11 | dialogue_structure | string | Structure of dialogue (e.g. "Introduction,Content,Conclusion") | +| 12 | podcast_name | string | Name of the podcast | +| 13 | podcast_tagline | string | Podcast tagline | +| 14 | tts_model | string | Text-to-speech model ("gemini", "openai", "elevenlabs", or "edge") | +| 15 | creativity_level | number | Level of creativity (0-1) | +| 16 | user_instructions | string | Custom instructions for generation | + + +## Using Python + +### Installation + +```bash +pip install gradio_client +``` + +### Quick Start + +```python +from gradio_client import Client, handle_file + +client = Client("thatupiso/Podcastfy.ai_demo") +``` + +### API Endpoints + +#### Generate Podcast (`/process_inputs`) + +Generates a podcast from provided text, URLs, PDFs, or images. + +##### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| text_input | str | Yes | - | Raw text input for podcast generation | +| urls_input | str | Yes | - | Comma-separated URLs to process | +| pdf_files | List[filepath] | Yes | None | List of PDF files to process | +| image_files | List[filepath] | Yes | None | List of image files to process | +| gemini_key | str | No | "" | Google Gemini API key | +| openai_key | str | No | "" | OpenAI API key | +| elevenlabs_key | str | No | "" | ElevenLabs API key | +| word_count | float | No | 2000 | Target word count for podcast | +| conversation_style | str | No | "engaging,fast-paced,enthusiastic" | Conversation style descriptors | +| roles_person1 | str | No | "main summarizer" | Role of first speaker | +| roles_person2 | str | No | "questioner/clarifier" | Role of second speaker | +| dialogue_structure | str | No | "Introduction,Main Content Summary,Conclusion" | Structure of dialogue | +| podcast_name | str | No | "PODCASTFY" | Name of the podcast | +| podcast_tagline | str | No | "YOUR PERSONAL GenAI PODCAST" | Podcast tagline | +| tts_model | Literal['openai', 'elevenlabs', 'edge'] | No | "openai" | Text-to-speech model | +| creativity_level | float | No | 0.7 | Level of creativity (0-1) | +| user_instructions | str | No | "" | Custom instructions for generation | + +##### Returns + +| Type | Description | +|------|-------------| +| filepath | Path to generated audio file | + +##### Example Usage + +```python +from gradio_client import Client, handle_file + +client = Client("thatupiso/Podcastfy.ai_demo") + +# Generate podcast from URL +result = client.predict( + text_input="", + urls_input="https://example.com/article", + pdf_files=[], + image_files=[], + gemini_key="your-gemini-key", + openai_key="your-openai-key", + word_count=1500, + conversation_style="casual,informative", + podcast_name="Tech Talk", + tts_model="openai", + creativity_level=0.8 +) + +print(f"Generated podcast: {result}") +``` + +### Error Handling + +The API will return appropriate error messages for: +- Invalid API keys +- Malformed input +- Failed file processing +- TTS generation errors + +### Rate Limits + +Please be aware of the rate limits for the underlying services: +- Gemini API +- OpenAI API +- ElevenLabs API + +## Notes + +- At least one input source (text, URL, PDF, or image) must be provided +- API keys are required for corresponding services +- The generated audio file format is MP3 \ No newline at end of file diff --git a/docs/source/usage/cli.md b/docs/source/usage/cli.md index 74033a1..732893e 100644 --- a/docs/source/usage/cli.md +++ b/docs/source/usage/cli.md @@ -3,7 +3,7 @@ Podcastfy can be used as a command-line interface (CLI) tool. See below some usage examples. Please make sure you follow configuration instructions first - [See Setup](README.md#setup). -1. Generate a podcast from URLs using OpenAI TTS (default): +1. Generate a podcast from URLs (using OpenAI TTS by default): ``` python -m podcastfy.client --url https://example.com/article1 --url https://example.com/article2 ``` @@ -47,8 +47,18 @@ Please make sure you follow configuration instructions first - [See Setup](READM ``` python -m podcastfy.client --url https://example.com/article1 --image path/to/image1.jpg ``` + +10. Generate a transcript using a local LLM: + ``` + python -m podcastfy.client --url https://example.com/article1 --transcript-only --local + ``` For more information on available options, use: ``` python -m podcastfy.client --help ``` + +11. Generate a podcast from raw text input: + ``` + python -m podcastfy.client --text "Your raw text content here that you want to convert into a podcast" + ``` diff --git a/docs/source/usage/config.md b/docs/source/usage/config.md index bb372d8..0392c56 100644 --- a/docs/source/usage/config.md +++ b/docs/source/usage/config.md @@ -8,25 +8,61 @@ The project uses a combination of a `.env` file for managing API keys and sensit 2. Add your API keys and other sensitive information to the `.env` file. For example: ``` - JINA_API_KEY=your_jina_api_key_here GEMINI_API_KEY=your_gemini_api_key_here ELEVENLABS_API_KEY=your_elevenlabs_api_key_here OPENAI_API_KEY=your_openai_api_key_here ``` API Key Requirements: -- `JINA_API_KEY`: Required only for parsing website content as input. (get your [free API key](https://jina.ai/reader/#apiform)) -- `GEMINI_API_KEY`: Mandatory for all operations. (get your [free API key](aistudio.google.com/app/apikey)) -- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation (paid service). `Edge TTS` can be also used for audio generation without an API key. +- `GEMINI_API_KEY`: Required for transcript generation if not using a [local llm](local_llm.md). (get your [free API key](aistudio.google.com/app/apikey)) +- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation if not using Microsoft Edge TTS `tts_model=edge`. Ensure you have the necessary API keys based on your intended usage of Podcastfy. > [!Note] > Never share your `.env` file or commit it to version control. It contains sensitive information that should be kept private. The `config.yaml` file can be shared and version-controlled as it doesn't contain sensitive data. +## Example Configurations + +Here's a table showing example configurations: + +| Configuration | Base LLM | TTS Model | API Keys Required | +|---------------|----------|-----------|-------------------| +| Default | Gemini | OpenAI | GEMINI_API_KEY and OPENAI_API_KEY | +| No API Keys Required | Local LLM | Edge | None | +| Recommended | Gemini | 'gemini' (Google) | GEMINI_API_KEY | + +In our experience, ElevenLabs and Google TTS model are the best models in terms quality of audio generation with the latter having an edge over the former due to its multispeaker capability. ElevenLabs is the most expensive but it's easy to setup and offers great customization (voice options and multilingual capability). Google TTS model is cheaper but is limited to English only and requires some extra steps to set up. + +## Setting up Google TTS Model + +You can use Google TTS model by setting the `tts_model` parameter to `gemini` in `Podcastfy`. + +Google TTS model requires a Google Cloud API key, you can use the same API key you are already using for Gemini or create a new one. After you have secured your API Key there are two additional steps in order to use Google Multispeaker TTS model: + +- Step 1: You will need to enable the Cloud Text-to-Speech API on the API key. + - Go to "https://console.cloud.google.com/apis/dashboard" + - Select your project (or create one by clicking on project list and then on "new project") + - Click "+ ENABLE APIS AND SERVICES" at the top of the screen + - Enter "text-to-speech" into the search box + - Click on "Cloud Text-to-Speech API" and then on "ENABLE" + - You should be here: "https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=..." + +- Step 2: You need to add the Cloud Text-to-Speech API permission to the API KEY you're using on the Google Cloud console. + + - Go to https://console.cloud.google.com/apis/credentials + - Click on whatever key you're using for Gemini + - Go down to API Restrictions and add the Cloud Text-to-Speech API + +Phew!!! That was a lot of steps but you only need to do it once and you might be impressed with the quality of the audio. See [Google TTS](https://cloud.google.com/text-to-speech) for more details. Thank you @mobarski and @evandempsey for the help! + ## Conversation Configuration See [conversation_custom.md](conversation_custom.md) for more details. +## Running Local LLMs + +See [local_llm.md](local_llm.md) for more details. + ## Optional configuration The `config.yaml` file in the root directory contains non-sensitive configuration settings. You can modify this file to adjust various parameters such as output directories, text-to-speech settings, and content generation options. diff --git a/docs/source/usage/config_custom copy.md b/docs/source/usage/config_custom copy.md new file mode 100644 index 0000000..4154c86 --- /dev/null +++ b/docs/source/usage/config_custom copy.md @@ -0,0 +1,63 @@ +# Podcastfy Advanced Configuration Guide + +Podcastfy uses a `config.yaml` file to manage various settings and parameters. This guide explains each configuration option available in the file. + + + +## Content Generator + +- `gemini_model`: "gemini-1.5-pro-latest" + - The Gemini AI model used for content generation. +- `max_output_tokens`: 8192 + - Maximum number of tokens for the output generated by the AI model. +- `temperature`: 1 + - Controls randomness in the AI's output. 0 means deterministic responses. Range for gemini-1.5-pro: 0.0 - 2.0 (default: 1.0) +- `langchain_tracing_v2`: false + - Enables LangChain tracing for debugging and monitoring. If true, requires langsmith api key + +## Content Extractor + +- `youtube_url_patterns`: + - Patterns to identify YouTube URLs. + - Current patterns: "youtube.com", "youtu.be" + +## Website Extractor + +- `markdown_cleaning`: + - `remove_patterns`: + - Patterns to remove from extracted markdown content. + - Current patterns remove image links, hyperlinks, and URLs. + +## YouTube Transcriber + +- `remove_phrases`: + - Phrases to remove from YouTube transcriptions. + - Current phrase: "[music]" + +## Logging + +- `level`: "INFO" + - Default logging level. +- `format`: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" + - Format string for log messages. + + +## Website Extractor + +- `markdown_cleaning`: + - `remove_patterns`: + - Additional patterns to remove from extracted markdown content: + - '\[.*?\]': Remove square brackets and their contents + - '\(.*?\)': Remove parentheses and their contents + - '^\s*[-*]\s': Remove list item markers + - '^\s*\d+\.\s': Remove numbered list markers + - '^\s*#+': Remove markdown headers +- `unwanted_tags`: + - HTML tags to be removed during extraction: + - 'script', 'style', 'nav', 'footer', 'header', 'aside', 'noscript' +- `user_agent`: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' + - User agent string to be used for web requests +- `timeout`: 10 + - Request timeout in seconds for web scraping + + diff --git a/docs/source/usage/conversation_custom.md b/docs/source/usage/conversation_custom.md index b0663ca..39f1b6c 100644 --- a/docs/source/usage/conversation_custom.md +++ b/docs/source/usage/conversation_custom.md @@ -1,6 +1,6 @@ # Podcastfy Conversation Configuration -Podcastfy offers a range of customization options to tailor your AI-generated podcasts. This document outlines how you can adjust parameters such as conversation style, word count, and dialogue structure to suit your specific needs. See [System Config](https://github.com/souzatharsis/podcastfy/blob/main/usage/config_custom.md) for additional seetings. See [Notes of Caution](#notes-of-caution) to avoid unexpected results. +Podcastfy offers a range of customization options to tailor your AI-generated podcasts. This document outlines how you can adjust parameters such as conversation style, word count, and dialogue structure to suit your specific needs. ## Table of Contents @@ -12,11 +12,12 @@ Podcastfy offers a range of customization options to tailor your AI-generated po 3. [Customization Scenarios](#customization-scenarios) 1. [Using the Python Package](#using-the-python-package) 2. [Using the CLI](#using-the-cli) - 3. [Dev Config](#dev-config) 4. [Notes of Caution](#notes-of-caution) -## Parameters +## Conversation Parameters + +Podcastfy uses the default conversation configuration stored in [podcastfy/conversation_config.yaml](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy/conversation_config.yaml). | Parameter | Default Value | Type | Description | |-----------|---------------|------|-------------| @@ -30,8 +31,55 @@ Podcastfy offers a range of customization options to tailor your AI-generated po | output_language | "English" | str | Language of the output | | engagement_techniques | ["rhetorical questions", "anecdotes", "analogies", "humor"] | list[str] | Techniques to engage the audience | | creativity | 0 | int | Level of creativity/temperature (0-1) | - -Podcastfy uses the the default conversation configuration stored in [podcastfy/conversation_config.yaml](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy/conversation_config.yaml). +| user_instructions | "" | str | Custom instructions to guide the conversation focus and topics | + +## Text-to-Speech (TTS) Settings + +Podcastfy uses the default TTS configuration stored in [podcastfy/conversation_config.yaml](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy/conversation_config.yaml). + +### ElevenLabs TTS + +- `default_voices`: + - `question`: "Chris" + - Default voice for questions in the podcast. + - `answer`: "Jessica" + - Default voice for answers in the podcast. +- `model`: "eleven_multilingual_v2" + - The ElevenLabs TTS model to use. + +### OpenAI TTS + +- `default_voices`: + - `question`: "echo" + - Default voice for questions using OpenAI TTS. + - `answer`: "shimmer" + - Default voice for answers using OpenAI TTS. +- `model`: "tts-1-hd" + - The OpenAI TTS model to use. + +### Edge TTS + +- `default_voices`: + - `question`: "en-US-JennyNeural" + - Default voice for questions using Edge TTS. + - `answer`: "en-US-EricNeural" + - Default voice for answers using Edge TTS. + +### General TTS Settings + +- `default_tts_model`: "openai" + - Default text-to-speech model to use. +- `output_directories`: + - `transcripts`: "./data/transcripts" + - Directory for storing generated transcripts. + - `audio`: "./data/audio" + - Directory for storing generated audio files. +- `audio_format`: "mp3" + - Format of the generated audio files. +- `temp_audio_dir`: "data/audio/tmp/" + - Temporary directory for audio processing. +- `ending_message`: "Bye Bye!" + - Message to be appended at the end of the podcast. ## Customization Examples @@ -105,7 +153,7 @@ custom_config = { "word_count": 200, "conversation_style": ["casual", "humorous"], "podcast_name": "Tech Chuckles", - "creativity": 7 + "creativity": 0.7 } generate_podcast( @@ -130,21 +178,9 @@ conversation_style: - casual - humorous podcast_name: Tech Chuckles -creativity: 7 +creativity: 0.7 ``` -### 3. Dev Config - -For contributors to the Podcastfy package, the default configuration is stored in [podcastfy/conversation_config.yaml](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy/conversation_config.yaml). This file serves as the baseline configuration for all generated podcasts. - -To modify the default configuration: - -1. Locate the `conversation_config.yaml` file in the project root. -2. Edit the file using your preferred text editor. -3. Commit and push your changes, justifying proposed changes. - -Remember that changes to this file will affect the default behavior of Podcastfy for all users. Consider the impact of your changes and discuss significant modifications with the project maintainers before implementing them. - ## Notes of Caution @@ -152,6 +188,7 @@ Remember that changes to this file will affect the default behavior of Podcastfy - The `output_language` defines both the language of the transcript and the language of the audio. Here's some relevant information: - Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress. - Transcripts are generated using Google's Gemini 1.5 Pro, which supports 100+ languages by default. - - Audio is generated using `openai` (default), `elevenlabs`, or `edge` TTS models. + - Audio is generated using `openai` (default), `elevenlabs`, `gemini`,or `edge` TTS models. + - The `gemini`(Google) TTS model is English only. - The `openai` TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience. - The `elevenlabs` TTS model has English voices by default, in order to use a non-English voice you would need to download a custom voice for the target language in your `elevenlabs` account settings and then set the `text_to_speech.elevenlabs.default_voices` parameters to the voice you want to use in the [config.yaml file](https://github.com/pedroslopez/podcastfy/blob/main/podcastfy/config.yaml) (this config file is only available in the source code of the project, not in the pip package, hence if you are using the pip package you will not be able to change the ElevenLabs voice). For more information on ElevenLabs voices, visit [ElevenLabs Voice Library](https://elevenlabs.io/voice-library) diff --git a/docs/source/usage/docker.md b/docs/source/usage/docker.md new file mode 100644 index 0000000..fd02191 --- /dev/null +++ b/docs/source/usage/docker.md @@ -0,0 +1,349 @@ +# Docker Setup Guide for Podcastfy + +This guide explains how to use Docker to run Podcastfy in your local environment or for development. + +## Prerequisites + +- Docker installed on your system [1] +- Docker Compose [1] +- API keys [2] + +[1] See Appendix A for detailed installation instructions. +[2] See [config.md](https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md) for more details. + +## Available Images + +Podcastfy provides pre-built Docker images through GitHub Container Registry (ghcr.io): + +1. **Production Image**: `ghcr.io/souzatharsis/podcastfy:latest` + - Contains the latest PyPI release + - Recommended for production use + +2. **Development Image**: `ghcr.io/souzatharsis/podcastfy:dev` + - Includes development tools and dependencies + - Used for contributing and development + +## Deployment + +### Quick Deployment Steps + +1. Create a new directory and navigate to it: +```bash +mkdir -p /path/to/podcastfy +cd /path/to/podcastfy +``` + +2. Create a `.env` file with your API keys (see [config.md](https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md) for more details): +```plaintext +GEMINI_API_KEY=your_gemini_api_key +OPENAI_API_KEY=your_openai_api_key # Optional: only needed for OpenAI TTS +``` + +3. Create a `docker-compose.yml`: +```yaml +version: '3.8' + +services: + podcastfy: + image: ghcr.io/souzatharsis/podcastfy:latest + environment: + - GEMINI_API_KEY=${GEMINI_API_KEY} + - OPENAI_API_KEY=${OPENAI_API_KEY} + ports: + - "8000:8000" + command: python3 -m podcastfy.server + healthcheck: + test: ["CMD", "python3", "-c", "import podcastfy"] + interval: 30s + timeout: 10s + retries: 3 +``` + +4. Pull and start the container: +```bash +docker pull ghcr.io/souzatharsis/podcastfy:latest +docker-compose up podcastfy +``` + +The service will be available at `http://localhost:8000` + +### Directory Structure +``` +/path/to/podcastfy/ +├── .env # Environment variables +└── docker-compose.yml # Docker Compose configuration +``` + +## Development Setup + +### Using Pre-built Development Image + +1. Pull the development image: +```bash +docker pull ghcr.io/souzatharsis/podcastfy:dev +``` + +2. Clone the repository and start development environment: +```bash +git clone https://github.com/souzatharsis/podcastfy.git +cd podcastfy +docker-compose up podcastfy-dev +``` + +### Building Locally + +Alternatively, you can build the images locally: +```bash +# Build production image +docker-compose build podcastfy + +# Build development image +docker-compose build podcastfy-dev +``` + +## Running Tests + +Run the test suite using: +```bash +docker-compose up test +``` + +This will run tests in parallel using pytest-xdist. + +## Environment Variables + +Required environment variables: +- `GEMINI_API_KEY` - Your Google Gemini API key +- `OPENAI_API_KEY` - Your OpenAI API key (optional: only needed for OpenAI TTS) + +## Container Details + +### Production Container +- Based on Ubuntu 24.04 +- Installs Podcastfy from PyPI +- Includes FFmpeg for audio processing +- Runs in a Python virtual environment +- Exposed port: 8000 + +### Development Container +- Based on Ubuntu 24.04 +- Includes development tools (flake8, pytest) +- Mounts local code for live development +- Runs in editable mode (`pip install -e .`) +- Exposed port: 8001 + +## Continuous Integration + +The Docker images are automatically: +- Built and tested on every push to main branch +- Built and tested for all pull requests +- Published to GitHub Container Registry +- Tagged with version numbers for releases (v*.*.*) + +## Health Checks + +All services include health checks that: +- Run every 30 seconds +- Verify Podcastfy can be imported +- Timeout after 10 seconds +- Retry up to 3 times + +## Common Commands + +```bash +# Pull latest production image +docker pull ghcr.io/souzatharsis/podcastfy:latest + +# Pull development image +docker pull ghcr.io/souzatharsis/podcastfy:dev + +# Start production service +docker-compose up podcastfy + +# Start development environment +docker-compose up podcastfy-dev + +# Run tests +docker-compose up test + +# Build images locally +docker-compose build + +# View logs +docker-compose logs + +# Stop all containers +docker-compose down +``` + +## Troubleshooting + +### Common Issues + +1. **API Key Errors** + - Verify your `.env` file exists and contains valid API keys + - Check if the environment variables are properly passed to the container + +2. **Port Conflicts** + - Ensure ports 8000 (production) and 8001 (development) are available + - Modify the port mappings in `docker-compose.yml` if needed + +3. **Volume Mounting Issues (Development)** + - Verify the correct path to your local code + - Check permissions on the mounted directories + +4. **Image Pull Issues** + - Ensure you have access to the GitHub Container Registry + - If you see "unauthorized" errors, the image might be private + - Try authenticating with GitHub: `docker login ghcr.io -u YOUR_GITHUB_USERNAME` + +### Verifying Installation + +You can verify your installation by checking if the package can be imported: +```bash +# Check production version +docker run --rm ghcr.io/souzatharsis/podcastfy:latest python3 -c "import podcastfy" + +# Check development setup +docker-compose exec podcastfy-dev python3 -c "import podcastfy" +``` + +## System Requirements + +Minimum requirements: +- Docker Engine 20.10.0 or later +- Docker Compose 2.0.0 or later +- Sufficient disk space for Ubuntu base image (~400MB) +- Additional space for Python packages and FFmpeg + +## Support + +If you encounter any issues: +1. Check the container logs: `docker-compose logs` +2. Verify all prerequisites are installed +3. Ensure all required environment variables are set +4. Open an issue on the [Podcastfy GitHub repository](https://github.com/souzatharsis/podcastfy/issues) + +## Appendix A: Detailed Installation Guide + +### Installing Docker + +#### Windows +1. Download and install [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) + - For Windows 10/11 Pro, Enterprise, or Education: Enable WSL 2 and Hyper-V + - For Windows 10 Home: Enable WSL 2 +2. After installation, start Docker Desktop +3. Verify installation: +```bash +docker --version +``` + +#### macOS +1. Download and install [Docker Desktop for Mac](https://docs.docker.com/desktop/install/mac-install/) + - For Intel chip: Download Intel package + - For Apple chip: Download Apple Silicon package +2. After installation, start Docker Desktop +3. Verify installation: +```bash +docker --version +``` + +#### Ubuntu/Debian +```bash +# Remove old versions +sudo apt-get remove docker docker-engine docker.io containerd runc + +# Install prerequisites +sudo apt-get update +sudo apt-get install \ + ca-certificates \ + curl \ + gnupg \ + lsb-release + +# Add Docker's official GPG key +sudo mkdir -p /etc/apt/keyrings +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg + +# Set up repository +echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ + $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + +# Install Docker Engine +sudo apt-get update +sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin + +# Add your user to docker group (optional, to run docker without sudo) +sudo usermod -aG docker $USER +newgrp docker + +# Verify installation +docker --version +``` + +#### Other Linux Distributions +- [CentOS](https://docs.docker.com/engine/install/centos/) +- [Fedora](https://docs.docker.com/engine/install/fedora/) +- [RHEL](https://docs.docker.com/engine/install/rhel/) + +### Installing Docker Compose + +Docker Compose is included with Docker Desktop for Windows and macOS. For Linux: + +```bash +# Download the current stable release +sudo curl -L "https://github.com/docker/compose/releases/download/v2.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose + +# Apply executable permissions +sudo chmod +x /usr/local/bin/docker-compose + +# Verify installation +docker-compose --version +``` + +### Post-Installation Steps + +1. Verify Docker is running: +```bash +docker run hello-world +``` + +2. Configure Docker to start on boot (Linux only): +```bash +sudo systemctl enable docker.service +sudo systemctl enable containerd.service +``` + +## Appendix B: Getting API Keys + +### Google Gemini API Key +1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey) +2. Create or sign in to your Google account +3. Click "Create API Key" +4. Copy and save your API key + +### OpenAI API Key +You only need an OpenAI API key if you want to use the OpenAI Text-to-Speech model. +1. Visit [OpenAI API Keys](https://platform.openai.com/api-keys) +2. Create or sign in to your OpenAI account +3. Click "Create new secret key" +4. Copy and save your API key + +## Appendix C: Installation Validation + +After installing all prerequisites, verify everything is set up correctly: + +```bash +# Check Docker version +docker --version + +# Check Docker Compose version +docker-compose --version + +# Verify Docker daemon is running +docker ps + +# Test Docker functionality +docker run hello-world +``` diff --git a/docs/source/usage/how-to.md b/docs/source/usage/how-to.md new file mode 100644 index 0000000..a825820 --- /dev/null +++ b/docs/source/usage/how-to.md @@ -0,0 +1,117 @@ +# How to + +All assume you have podcastfy installed and running. + +## Table of Contents + +- [Custom LLM Support](#custom-llm-support) +- [Running Local LLMs](#running-local-llms) +- [How to use your own voice in audio podcasts](#how-to-use-your-own-voice-in-audio-podcasts) +- [How to customize the conversation](#how-to-customize-the-conversation) +- [How to generate multilingual content](#how-to-generate-multilingual-content) +- [How to steer the conversation](#how-to-steer-the-conversation) + + +## Custom LLM Support + +Podcastfy offers a range of LLM models for generating transcripts including OpenAI, Anthropic, Google as well as local LLM models. + +### Cloud-based LLMs + +By default, Podcastfy uses Google's `gemini-1.5-pro-latest` model. To select a particular cloud-based LLM model, users can pass the `llm_model_name` and `api_key_label` parameters to the `generate_podcast` function. + +For example, to use OpenAI's `gpt-4-turbo` model, users can pass `llm_model_name="gpt-4-turbo"` and `api_key_label="OPENAI_API_KEY"`. + +```python +audio_file = generate_podcast( + urls=["https://en.wikipedia.org/wiki/Artificial_intelligence"], + llm_model_name="gpt-4-turbo", + api_key_label="OPENAI_API_KEY" +) +``` + +Remember to have the correct API key label and value in your environment variables (`.env` file). + +### Running Local LLMs + +See [local_llm.md](local_llm.md) for more details. + +## How to use your own voice in audio podcasts + +You just need to use ElevenLabs TSS backend and pass a custom config to use your voice instead of podcastfy's default: + +1. Create elevenlabs account, get and [set up](https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md) eleven labs API KEY + +2. Clone your voice on elevenlabs website (let's say its name is 'Robbert') + +4. Create custom conversation config (let's call it custom_config.yaml) to use your voice name instead of the default as described [here](https://github.com/souzatharsis/podcastfy/blob/main/usage/conversation_custom.md#text-to-speech-tts-settings). Set either question or answer voice below to 'Robbert' in elevenlabs > default_voices. + +6. Run podcastfy with tts-model param as elevenlabs + +CLI + ``` + python -m podcastfy.client --url https://example.com/article1 --url https://example.com/article2 --tts-model elevenlabs --conversation-config path/to/custom_config.yaml + ``` +For Python example, checkout Customization section at [python notebook](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy.ipynb). + +## How to customize the conversation + +You can customize the conversation by passing a custom [conversation_config.yaml](https://github.com/souzatharsis/podcastfy/blob/main/podcastfy/conversation_config.yaml) file to the CLI: + +``` +python -m podcastfy.client --url https://example.com/article1 --url https://example.com/article2 --tts-model elevenlabs --conversation-config path/to/custom_config.yaml +``` + +You can also pass a dictionary with the custom config to the python interface generate_podcast function: + +```python +from podcastfy.client import generate_podcast + +custom_config = { + "word_count": 200, + "conversation_style": ["casual", "humorous"], + "podcast_name": "Tech Chuckles", + "creativity": 0.7 +} + +generate_podcast( + urls=["https://example.com/tech-news"], + conversation_config=custom_config +) +``` +For more details, checkout [conversation_custom.md](https://github.com/souzatharsis/podcastfy/blob/main/usage/conversation_custom.md). + +## How to generate multilingual content + +In order to generate transcripts in a target language, simply set `output_language` = your target language. See [How to customize the conversation](#how-to-customize-the-conversation) on how to pass custom configuration to podcastfy. Set --transcript-only to get only the transcript without audio generation. + +In order to generation audio, you can simply use openai TTS model which by default is multilingual. However, in my experience OpenAI's TTS multilingual quality is subpar. Instead, consdier using elevenlabs backend. See [How to use your own voice in audio podcasts](#how-to-use-your-own-voice-in-audio-podcasts) but instead of using your own voice you should download and set a voice in your target language for it to work. + +Sample audio: +- [French](https://github.com/souzatharsis/podcastfy/blob/main/data/audio/podcast_FR_AGRO.mp3) +- [Portugue-BR](https://github.com/souzatharsis/podcastfy/blob/main/data/audio/podcast_thatupiso_BR.mp3) + +The PT-BR audio actually uses my own cloned voice as AI Host 2. + + +## How to steer the conversation + +You can guide the conversation focus and topics by setting the `user_instructions` parameter in your custom configuration. This allows you to provide specific instructions to the AI hosts about what aspects they should emphasize or explore. + +Things to try: +- Focus on a specific topic (e.g. "Focus the discussion on key capabilities and limitations of modern AI models") +- Target a specific audience (e.g. "Explain concepts in a way that's accessible to someone new to Computer Science") + +For example, using the CLI with a custom YAML: + +```yaml +user_instructions: "Make connections with quantum computing" +``` + +``` +python -m podcastfy.client --url https://en.wikipedia.org/wiki/Artificial_intelligence --conversation-config path/to/custom_config.yaml +``` + + + + diff --git a/docs/source/usage/license-guide.md b/docs/source/usage/license-guide.md new file mode 100644 index 0000000..5ad358b --- /dev/null +++ b/docs/source/usage/license-guide.md @@ -0,0 +1,32 @@ +Podcastfy is licensed under Apache 2.0. The Apache License 2.0 is a permissive free software license that allows you to use this sotfware for both non-commercial or commercial purposes. +Please review the [License](../LICENSE) in order to know your obligations. +here is a set of steps I will list without any warranty or liability: + +1. Include a copy of the license in your project: + +In your project root, create a NOTICE.txt or THIRD_PARTY_LICENSES.txt file and include the content from the file [NOTICE](../NOTICE) + +2. Add attribution in your README.md: +```markdown +## Acknowledgments + +This project includes code from Podcastfy(https://github.com/souzatharsis/podcastfy/), licensed under the Apache License 2.0. +``` + +3. Keep the original copyright notices in any files you copy/modify + +4. If you modified the code, indicate your changes: +```python +# Modified from original source: [Podcastfy](https://github.com/souzatharsis/podcastfy/) +# Changes made: +# - Added feature X +# - Modified function Y +# - Removed component Z +``` + +Important points: +- You don't need to use the same license for your project +- You must preserve all copyright, patent, trademark notices +- State significant modifications you made +- Include the original Apache 2.0 license text +- Attribution should be clear and reasonable diff --git a/docs/source/usage/local_llm.md b/docs/source/usage/local_llm.md new file mode 100644 index 0000000..e85f1ab --- /dev/null +++ b/docs/source/usage/local_llm.md @@ -0,0 +1,68 @@ +# Local LLM Support + +Running local LLMs can offer several advantages such as: +- Enhanced privacy and data security +- Cost control and no API rate limits +- Greater customization and fine-tuning options +- Reduced vendor lock-in + +We enable serving local LLMs with [llamafile](https://github.com/Mozilla-Ocho/llamafile). In the API, local LLM support is available through the `is_local` parameter. If `is_local=True`, then a local (llamafile) LLM model is used to generate the podcast transcript. Llamafiles of LLM models can be found on [HuggingFace, which today offers 156+ models](https://huggingface.co/models?library=llamafile). + +All you need to do is: + +1. Download a llamafile from HuggingFace +2. Make the file executable +3. Run the file + +Here's a simple bash script that shows all 3 setup steps for running TinyLlama-1.1B locally: + +```bash +# Download a llamafile from HuggingFace +wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile + +# Make the file executable. On Windows, instead just rename the file to end in ".exe". +chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile + +# Start the model server. Listens at http://localhost:8080 by default. +./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser +``` + +Now you can use the local LLM to generate a podcast transcript (or audio) by setting the `is_local` parameter to `True`. + +## Python API + +```python +from podcastfy import generate_podcast + +# Generate a tech debate podcast about artificial intelligence +generate_podcast( + urls=["www.souzatharsis.com"], + is_local=True # Using a local LLM +) +``` + +## CLI + +To use a local LLM model via the command-line interface, you can use the `--local` or `-l` flag. Here's an example of how to generate a transcript using a local LLM: + +```bash +python -m podcastfy.client --url https://example.com/article1 --transcript-only --local +``` + +## Notes of caution + +When using local LLM models versus widely known private large language models: + +1. Performance: Local LLMs often have lower performance compared to large private models due to size and training limitations. + +2. Resource requirements: Running local LLMs can be computationally intensive, requiring significant CPU/GPU resources. + +3. Limited capabilities: Local models may struggle with complex tasks or specialized knowledge that larger models handle well. + +5. Reduced multimodal abilities: Local LLMs will be assumed to be text-only capable + +6. Potential instability: Local models may produce less consistent or stable outputs compared to well-tested private models oftentimes producing transcripts that cannot be used for podcast generation (TTS) out-of-the-box + +7. Limited context window: Local models often have smaller context windows, limiting their ability to process long inputs. + +Always evaluate the trade-offs between using local LLMs and private models based on your specific use case and requirements. We highly recommend extensively testing your local LLM before productionizing an end-to-end podcast generation and/or manually checking the transcript before passing to TTS model. diff --git a/podcastfy/__init__.py b/podcastfy/__init__.py index 6ceef23..9de7a5d 100644 --- a/podcastfy/__init__.py +++ b/podcastfy/__init__.py @@ -1,2 +1,2 @@ # This file can be left empty for now -__version__ = "0.2.19" # or whatever version you're on +__version__ = "0.3.0" # or whatever version you're on diff --git a/pyproject.toml b/pyproject.toml index e96ce7f..d618950 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "podcastfy" -version = "0.2.19" +version = "0.3.0" description = "An Open Source alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI" authors = ["Tharsis T. P. Souza"] license = "Apache-2.0"