souzatharsis · souzatharsis · Nov 6, 2024 · Nov 6, 2024
diff --git a/docs/source/usage/api.md b/docs/source/usage/api.md
@@ -0,0 +1,179 @@
+
+# Podcastfy REST API Documentation
+
+## Overview
+
+The Podcastfy API allows you to programmatically generate AI podcasts from various input sources. This document outlines the API endpoints and their usage.
+
+## Using cURL with Podcastfy API
+
+### Prerequisites
+1. Confirm cURL installation:
+```bash
+curl --version
+```
+
+### API Request Flow
+Making a prediction requires two sequential requests:
+1. POST request to initiate processing - returns an `EVENT_ID`
+2. GET request to fetch results - uses the `EVENT_ID` to fetch results
+
+Between step 1 and 2, there is a delay of 1-3 minutes. We are working on reducing this delay and implementing a way to notify the user when the podcast is ready. Thanks for your patience!
+
+### Basic Request Structure
+```bash
+# Step 1: POST request to initiate processing
+# Make sure to include http:// or https:// in the URL
+curl -X POST https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": [
+      "text_input",
+      "https://yourwebsite.com",
+      [],  # pdf_files
+      [],  # image_files
+      "gemini_key",
+      "openai_key",
+      "elevenlabs_key",
+      2000,  # word_count
+      "engaging,fast-paced",  # conversation_style
+      "main summarizer",  # roles_person1
+      "questioner",  # roles_person2
+      "Introduction,Content,Conclusion",  # dialogue_structure
+      "PODCASTFY",  # podcast_name
+      "YOUR PODCAST",  # podcast_tagline
+      "openai",  # tts_model
+      0.7,  # creativity_level
+      ""  # user_instructions
+    ]
+  }'
+
+# Step 2: GET request to fetch results
+curl -N https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs/$EVENT_ID
+
+
+# Example output result
+event: complete
+data: [{"path": "/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "url": "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "size": null, "orig_name": "podcast_81106b4ca62542f1b209889832a421df.mp3", "mime_type": null, "is_stream": false, "meta": {"_type": "gradio.FileData"}}]
+
+```
+
+You can download the file by extending the URL prefix "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=" with the path to the file in variable `path`. (Note: The variable "url" above has a bug introduced by Gradio, so please ignore it.)
+
+### Parameter Details
+| Index | Parameter | Type | Description |
+|-------|-----------|------|-------------|
+| 0 | text_input | string | Direct text input for podcast generation |
+| 1 | urls_input | string | URLs to process (include http:// or https://) |
+| 2 | pdf_files | array | List of PDF files to process |
+| 3 | image_files | array | List of image files to process |
+| 4 | gemini_key | string | Google Gemini API key |
+| 5 | openai_key | string | OpenAI API key |
+| 6 | elevenlabs_key | string | ElevenLabs API key |
+| 7 | word_count | number | Target word count for podcast |
+| 8 | conversation_style | string | Conversation style descriptors (e.g. "engaging,fast-paced") |
+| 9 | roles_person1 | string | Role of first speaker |
+| 10 | roles_person2 | string | Role of second speaker |
+| 11 | dialogue_structure | string | Structure of dialogue (e.g. "Introduction,Content,Conclusion") |
+| 12 | podcast_name | string | Name of the podcast |
+| 13 | podcast_tagline | string | Podcast tagline |
+| 14 | tts_model | string | Text-to-speech model ("gemini", "openai", "elevenlabs", or "edge") |
+| 15 | creativity_level | number | Level of creativity (0-1) |
+| 16 | user_instructions | string | Custom instructions for generation |
+
+
+## Using Python
+
+### Installation
+
+```bash
+pip install gradio_client
+```
+
+### Quick Start
+
+```python
+from gradio_client import Client, handle_file
+
+client = Client("thatupiso/Podcastfy.ai_demo")
+```
+
+### API Endpoints
+
+#### Generate Podcast (`/process_inputs`)
+
+Generates a podcast from provided text, URLs, PDFs, or images.
+
+##### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| text_input | str | Yes | - | Raw text input for podcast generation |
+| urls_input | str | Yes | - | Comma-separated URLs to process |
+| pdf_files | List[filepath] | Yes | None | List of PDF files to process |
+| image_files | List[filepath] | Yes | None | List of image files to process |
+| gemini_key | str | No | "" | Google Gemini API key |
+| openai_key | str | No | "" | OpenAI API key |
+| elevenlabs_key | str | No | "" | ElevenLabs API key |
+| word_count | float | No | 2000 | Target word count for podcast |
+| conversation_style | str | No | "engaging,fast-paced,enthusiastic" | Conversation style descriptors |
+| roles_person1 | str | No | "main summarizer" | Role of first speaker |
+| roles_person2 | str | No | "questioner/clarifier" | Role of second speaker |
+| dialogue_structure | str | No | "Introduction,Main Content Summary,Conclusion" | Structure of dialogue |
+| podcast_name | str | No | "PODCASTFY" | Name of the podcast |
+| podcast_tagline | str | No | "YOUR PERSONAL GenAI PODCAST" | Podcast tagline |
+| tts_model | Literal['openai', 'elevenlabs', 'edge'] | No | "openai" | Text-to-speech model |
+| creativity_level | float | No | 0.7 | Level of creativity (0-1) |
+| user_instructions | str | No | "" | Custom instructions for generation |
+
+##### Returns
+
+| Type | Description |
+|------|-------------|
+| filepath | Path to generated audio file |
+
+##### Example Usage
+
+```python
+from gradio_client import Client, handle_file
+
+client = Client("thatupiso/Podcastfy.ai_demo")
+
+# Generate podcast from URL
+result = client.predict(
+    text_input="",
+    urls_input="https://example.com/article",
+    pdf_files=[],
+    image_files=[],
+    gemini_key="your-gemini-key",
+    openai_key="your-openai-key",
+    word_count=1500,
+    conversation_style="casual,informative",
+    podcast_name="Tech Talk",
+    tts_model="openai",
+    creativity_level=0.8
+)
+
+print(f"Generated podcast: {result}")
+```
+
+### Error Handling
+
+The API will return appropriate error messages for:
+- Invalid API keys
+- Malformed input
+- Failed file processing
+- TTS generation errors
+
+### Rate Limits
+
+Please be aware of the rate limits for the underlying services:
+- Gemini API
+- OpenAI API
+- ElevenLabs API
+
+## Notes
+
+- At least one input source (text, URL, PDF, or image) must be provided
+- API keys are required for corresponding services
+- The generated audio file format is MP3
diff --git a/docs/source/usage/cli.md b/docs/source/usage/cli.md
@@ -3,7 +3,7 @@
 Podcastfy can be used as a command-line interface (CLI) tool. See below some usage examples.
 Please make sure you follow configuration instructions first - [See Setup](README.md#setup).
 
-1. Generate a podcast from URLs using OpenAI TTS (default):
+1. Generate a podcast from URLs (using OpenAI TTS by default):
    ```
    python -m podcastfy.client --url https://example.com/article1 --url https://example.com/article2
    ```
@@ -47,8 +47,18 @@ Please make sure you follow configuration instructions first - [See Setup](READM
    ```
    python -m podcastfy.client --url https://example.com/article1 --image path/to/image1.jpg
    ```
+
+10. Generate a transcript using a local LLM:
+   ```
+   python -m podcastfy.client --url https://example.com/article1 --transcript-only --local
+   ```
 
 For more information on available options, use:
    ```
    python -m podcastfy.client --help
    ```
+
+11. Generate a podcast from raw text input:
+   ```
+   python -m podcastfy.client --text "Your raw text content here that you want to convert into a podcast"
+   ```
diff --git a/docs/source/usage/config.md b/docs/source/usage/config.md
@@ -8,25 +8,61 @@ The project uses a combination of a `.env` file for managing API keys and sensit
 2. Add your API keys and other sensitive information to the `.env` file. For example:
 
    ```
-   JINA_API_KEY=your_jina_api_key_here
    GEMINI_API_KEY=your_gemini_api_key_here
    ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
    OPENAI_API_KEY=your_openai_api_key_here
    ```
 API Key Requirements:
-- `JINA_API_KEY`: Required only for parsing website content as input. (get your [free API key](https://jina.ai/reader/#apiform))
-- `GEMINI_API_KEY`: Mandatory for all operations. (get your [free API key](aistudio.google.com/app/apikey))
-- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation (paid service). `Edge TTS` can be also used for audio generation without an API key.
+- `GEMINI_API_KEY`: Required for transcript generation if not using a [local llm](local_llm.md). (get your [free API key](aistudio.google.com/app/apikey))
+- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation if not using Microsoft Edge TTS `tts_model=edge`.
 
 Ensure you have the necessary API keys based on your intended usage of Podcastfy.
 
 > [!Note]
 > Never share your `.env` file or commit it to version control. It contains sensitive information that should be kept private. The `config.yaml` file can be shared and version-controlled as it doesn't contain sensitive data.
 
+## Example Configurations
+
+Here's a table showing example configurations:
+
+| Configuration | Base LLM | TTS Model | API Keys Required |
+|---------------|----------|-----------|-------------------|
+| Default | Gemini | OpenAI | GEMINI_API_KEY and OPENAI_API_KEY |
+| No API Keys Required | Local LLM | Edge | None |
+| Recommended | Gemini | 'gemini' (Google) | GEMINI_API_KEY |
+
+In our experience, ElevenLabs and Google TTS model are the best models in terms quality of audio generation with the latter having an edge over the former due to its multispeaker capability. ElevenLabs is the most expensive but it's easy to setup and offers great customization (voice options and multilingual capability). Google TTS model is cheaper but is limited to English only and requires some extra steps to set up.
+
+## Setting up Google TTS Model
+
+You can use Google TTS model by setting the `tts_model` parameter to `gemini` in `Podcastfy`.
+
+Google TTS model requires a Google Cloud API key, you can use the same API key you are already using for Gemini or create a new one. After you have secured your API Key there are two additional steps in order to use Google Multispeaker TTS model:
+
+- Step 1: You will need to enable the Cloud Text-to-Speech API on the API key.
+   - Go to "https://console.cloud.google.com/apis/dashboard"
+   - Select your project (or create one by clicking on project list and then on "new project")
+   - Click "+ ENABLE APIS AND SERVICES" at the top of the screen
+   - Enter "text-to-speech" into the search box
+   - Click on "Cloud Text-to-Speech API" and then on "ENABLE"
+   - You should be here: "https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=..."
+
+- Step 2: You need to add the Cloud Text-to-Speech API permission to the API KEY you're using on the Google Cloud console.
+
+   - Go to https://console.cloud.google.com/apis/credentials
+   - Click on whatever key you're using for Gemini
+   - Go down to API Restrictions and add the Cloud Text-to-Speech API
+
+Phew!!! That was a lot of steps but you only need to do it once and you might be impressed with the quality of the audio. See [Google TTS](https://cloud.google.com/text-to-speech) for more details. Thank you @mobarski and @evandempsey for the help!
+
 ## Conversation Configuration
 
 See [conversation_custom.md](conversation_custom.md) for more details.
 
+## Running Local LLMs
+
+See [local_llm.md](local_llm.md) for more details.
+
 ## Optional configuration
 
 The `config.yaml` file in the root directory contains non-sensitive configuration settings. You can modify this file to adjust various parameters such as output directories, text-to-speech settings, and content generation options.

diff --git a/docs/source/usage/config_custom copy.md b/docs/source/usage/config_custom copy.md
@@ -0,0 +1,63 @@
+# Podcastfy Advanced Configuration Guide
+
+Podcastfy uses a `config.yaml` file to manage various settings and parameters. This guide explains each configuration option available in the file.
+
+
+
+## Content Generator
+
+- `gemini_model`: "gemini-1.5-pro-latest"
+  - The Gemini AI model used for content generation.
+- `max_output_tokens`: 8192
+  - Maximum number of tokens for the output generated by the AI model.
+- `temperature`: 1
+  - Controls randomness in the AI's output. 0 means deterministic responses. Range for gemini-1.5-pro: 0.0 - 2.0 (default: 1.0)
+- `langchain_tracing_v2`: false
+  - Enables LangChain tracing for debugging and monitoring. If true, requires langsmith api key
+
+## Content Extractor
+
+- `youtube_url_patterns`:
+  - Patterns to identify YouTube URLs.
+  - Current patterns: "youtube.com", "youtu.be"
+
+## Website Extractor
+
+- `markdown_cleaning`:
+  - `remove_patterns`:
+    - Patterns to remove from extracted markdown content.
+    - Current patterns remove image links, hyperlinks, and URLs.
+
+## YouTube Transcriber
+
+- `remove_phrases`:
+  - Phrases to remove from YouTube transcriptions.
+  - Current phrase: "[music]"
+
+## Logging
+
+- `level`: "INFO"
+  - Default logging level.
+- `format`: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+  - Format string for log messages.
+
+
+## Website Extractor
+
+- `markdown_cleaning`:
+	- `remove_patterns`:
+		- Additional patterns to remove from extracted markdown content:
+		- '\[.*?\]': Remove square brackets and their contents
+		- '\(.*?\)': Remove parentheses and their contents
+		- '^\s*[-*]\s': Remove list item markers
+		- '^\s*\d+\.\s': Remove numbered list markers
+		- '^\s*#+': Remove markdown headers
+- `unwanted_tags`:
+	- HTML tags to be removed during extraction:
+		- 'script', 'style', 'nav', 'footer', 'header', 'aside', 'noscript'
+- `user_agent`: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
+	- User agent string to be used for web requests
+- `timeout`: 10
+	- Request timeout in seconds for web scraping
+
+