Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs #154

Merged
merged 1 commit into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 179 additions & 0 deletions docs/source/usage/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@

# Podcastfy REST API Documentation

## Overview

The Podcastfy API allows you to programmatically generate AI podcasts from various input sources. This document outlines the API endpoints and their usage.

## Using cURL with Podcastfy API

### Prerequisites
1. Confirm cURL installation:
```bash
curl --version
```

### API Request Flow
Making a prediction requires two sequential requests:
1. POST request to initiate processing - returns an `EVENT_ID`
2. GET request to fetch results - uses the `EVENT_ID` to fetch results

Between step 1 and 2, there is a delay of 1-3 minutes. We are working on reducing this delay and implementing a way to notify the user when the podcast is ready. Thanks for your patience!

### Basic Request Structure
```bash
# Step 1: POST request to initiate processing
# Make sure to include http:// or https:// in the URL
curl -X POST https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs \
-H "Content-Type: application/json" \
-d '{
"data": [
"text_input",
"https://yourwebsite.com",
[], # pdf_files
[], # image_files
"gemini_key",
"openai_key",
"elevenlabs_key",
2000, # word_count
"engaging,fast-paced", # conversation_style
"main summarizer", # roles_person1
"questioner", # roles_person2
"Introduction,Content,Conclusion", # dialogue_structure
"PODCASTFY", # podcast_name
"YOUR PODCAST", # podcast_tagline
"openai", # tts_model
0.7, # creativity_level
"" # user_instructions
]
}'

# Step 2: GET request to fetch results
curl -N https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs/$EVENT_ID


# Example output result
event: complete
data: [{"path": "/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "url": "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "size": null, "orig_name": "podcast_81106b4ca62542f1b209889832a421df.mp3", "mime_type": null, "is_stream": false, "meta": {"_type": "gradio.FileData"}}]

```

You can download the file by extending the URL prefix "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=" with the path to the file in variable `path`. (Note: The variable "url" above has a bug introduced by Gradio, so please ignore it.)

### Parameter Details
| Index | Parameter | Type | Description |
|-------|-----------|------|-------------|
| 0 | text_input | string | Direct text input for podcast generation |
| 1 | urls_input | string | URLs to process (include http:// or https://) |
| 2 | pdf_files | array | List of PDF files to process |
| 3 | image_files | array | List of image files to process |
| 4 | gemini_key | string | Google Gemini API key |
| 5 | openai_key | string | OpenAI API key |
| 6 | elevenlabs_key | string | ElevenLabs API key |
| 7 | word_count | number | Target word count for podcast |
| 8 | conversation_style | string | Conversation style descriptors (e.g. "engaging,fast-paced") |
| 9 | roles_person1 | string | Role of first speaker |
| 10 | roles_person2 | string | Role of second speaker |
| 11 | dialogue_structure | string | Structure of dialogue (e.g. "Introduction,Content,Conclusion") |
| 12 | podcast_name | string | Name of the podcast |
| 13 | podcast_tagline | string | Podcast tagline |
| 14 | tts_model | string | Text-to-speech model ("gemini", "openai", "elevenlabs", or "edge") |
| 15 | creativity_level | number | Level of creativity (0-1) |
| 16 | user_instructions | string | Custom instructions for generation |


## Using Python

### Installation

```bash
pip install gradio_client
```

### Quick Start

```python
from gradio_client import Client, handle_file

client = Client("thatupiso/Podcastfy.ai_demo")
```

### API Endpoints

#### Generate Podcast (`/process_inputs`)

Generates a podcast from provided text, URLs, PDFs, or images.

##### Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| text_input | str | Yes | - | Raw text input for podcast generation |
| urls_input | str | Yes | - | Comma-separated URLs to process |
| pdf_files | List[filepath] | Yes | None | List of PDF files to process |
| image_files | List[filepath] | Yes | None | List of image files to process |
| gemini_key | str | No | "" | Google Gemini API key |
| openai_key | str | No | "" | OpenAI API key |
| elevenlabs_key | str | No | "" | ElevenLabs API key |
| word_count | float | No | 2000 | Target word count for podcast |
| conversation_style | str | No | "engaging,fast-paced,enthusiastic" | Conversation style descriptors |
| roles_person1 | str | No | "main summarizer" | Role of first speaker |
| roles_person2 | str | No | "questioner/clarifier" | Role of second speaker |
| dialogue_structure | str | No | "Introduction,Main Content Summary,Conclusion" | Structure of dialogue |
| podcast_name | str | No | "PODCASTFY" | Name of the podcast |
| podcast_tagline | str | No | "YOUR PERSONAL GenAI PODCAST" | Podcast tagline |
| tts_model | Literal['openai', 'elevenlabs', 'edge'] | No | "openai" | Text-to-speech model |
| creativity_level | float | No | 0.7 | Level of creativity (0-1) |
| user_instructions | str | No | "" | Custom instructions for generation |

##### Returns

| Type | Description |
|------|-------------|
| filepath | Path to generated audio file |

##### Example Usage

```python
from gradio_client import Client, handle_file

client = Client("thatupiso/Podcastfy.ai_demo")

# Generate podcast from URL
result = client.predict(
text_input="",
urls_input="https://example.com/article",
pdf_files=[],
image_files=[],
gemini_key="your-gemini-key",
openai_key="your-openai-key",
word_count=1500,
conversation_style="casual,informative",
podcast_name="Tech Talk",
tts_model="openai",
creativity_level=0.8
)

print(f"Generated podcast: {result}")
```

### Error Handling

The API will return appropriate error messages for:
- Invalid API keys
- Malformed input
- Failed file processing
- TTS generation errors

### Rate Limits

Please be aware of the rate limits for the underlying services:
- Gemini API
- OpenAI API
- ElevenLabs API

## Notes

- At least one input source (text, URL, PDF, or image) must be provided
- API keys are required for corresponding services
- The generated audio file format is MP3
12 changes: 11 additions & 1 deletion docs/source/usage/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Podcastfy can be used as a command-line interface (CLI) tool. See below some usage examples.
Please make sure you follow configuration instructions first - [See Setup](README.md#setup).

1. Generate a podcast from URLs using OpenAI TTS (default):
1. Generate a podcast from URLs (using OpenAI TTS by default):
```
python -m podcastfy.client --url https://example.com/article1 --url https://example.com/article2
```
Expand Down Expand Up @@ -47,8 +47,18 @@ Please make sure you follow configuration instructions first - [See Setup](READM
```
python -m podcastfy.client --url https://example.com/article1 --image path/to/image1.jpg
```

10. Generate a transcript using a local LLM:
```
python -m podcastfy.client --url https://example.com/article1 --transcript-only --local
```

For more information on available options, use:
```
python -m podcastfy.client --help
```

11. Generate a podcast from raw text input:
```
python -m podcastfy.client --text "Your raw text content here that you want to convert into a podcast"
```
44 changes: 40 additions & 4 deletions docs/source/usage/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,61 @@ The project uses a combination of a `.env` file for managing API keys and sensit
2. Add your API keys and other sensitive information to the `.env` file. For example:

```
JINA_API_KEY=your_jina_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
```
API Key Requirements:
- `JINA_API_KEY`: Required only for parsing website content as input. (get your [free API key](https://jina.ai/reader/#apiform))
- `GEMINI_API_KEY`: Mandatory for all operations. (get your [free API key](aistudio.google.com/app/apikey))
- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation (paid service). `Edge TTS` can be also used for audio generation without an API key.
- `GEMINI_API_KEY`: Required for transcript generation if not using a [local llm](local_llm.md). (get your [free API key](aistudio.google.com/app/apikey))
- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`: Required for audio generation if not using Microsoft Edge TTS `tts_model=edge`.

Ensure you have the necessary API keys based on your intended usage of Podcastfy.

> [!Note]
> Never share your `.env` file or commit it to version control. It contains sensitive information that should be kept private. The `config.yaml` file can be shared and version-controlled as it doesn't contain sensitive data.

## Example Configurations

Here's a table showing example configurations:

| Configuration | Base LLM | TTS Model | API Keys Required |
|---------------|----------|-----------|-------------------|
| Default | Gemini | OpenAI | GEMINI_API_KEY and OPENAI_API_KEY |
| No API Keys Required | Local LLM | Edge | None |
| Recommended | Gemini | 'gemini' (Google) | GEMINI_API_KEY |

In our experience, ElevenLabs and Google TTS model are the best models in terms quality of audio generation with the latter having an edge over the former due to its multispeaker capability. ElevenLabs is the most expensive but it's easy to setup and offers great customization (voice options and multilingual capability). Google TTS model is cheaper but is limited to English only and requires some extra steps to set up.

## Setting up Google TTS Model

You can use Google TTS model by setting the `tts_model` parameter to `gemini` in `Podcastfy`.

Google TTS model requires a Google Cloud API key, you can use the same API key you are already using for Gemini or create a new one. After you have secured your API Key there are two additional steps in order to use Google Multispeaker TTS model:

- Step 1: You will need to enable the Cloud Text-to-Speech API on the API key.
- Go to "https://console.cloud.google.com/apis/dashboard"
- Select your project (or create one by clicking on project list and then on "new project")
- Click "+ ENABLE APIS AND SERVICES" at the top of the screen
- Enter "text-to-speech" into the search box
- Click on "Cloud Text-to-Speech API" and then on "ENABLE"
- You should be here: "https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=..."

- Step 2: You need to add the Cloud Text-to-Speech API permission to the API KEY you're using on the Google Cloud console.

- Go to https://console.cloud.google.com/apis/credentials
- Click on whatever key you're using for Gemini
- Go down to API Restrictions and add the Cloud Text-to-Speech API

Phew!!! That was a lot of steps but you only need to do it once and you might be impressed with the quality of the audio. See [Google TTS](https://cloud.google.com/text-to-speech) for more details. Thank you @mobarski and @evandempsey for the help!

## Conversation Configuration

See [conversation_custom.md](conversation_custom.md) for more details.

## Running Local LLMs

See [local_llm.md](local_llm.md) for more details.

## Optional configuration

The `config.yaml` file in the root directory contains non-sensitive configuration settings. You can modify this file to adjust various parameters such as output directories, text-to-speech settings, and content generation options.
Expand Down
63 changes: 63 additions & 0 deletions docs/source/usage/config_custom copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Podcastfy Advanced Configuration Guide

Podcastfy uses a `config.yaml` file to manage various settings and parameters. This guide explains each configuration option available in the file.



## Content Generator

- `gemini_model`: "gemini-1.5-pro-latest"
- The Gemini AI model used for content generation.
- `max_output_tokens`: 8192
- Maximum number of tokens for the output generated by the AI model.
- `temperature`: 1
- Controls randomness in the AI's output. 0 means deterministic responses. Range for gemini-1.5-pro: 0.0 - 2.0 (default: 1.0)
- `langchain_tracing_v2`: false
- Enables LangChain tracing for debugging and monitoring. If true, requires langsmith api key

## Content Extractor

- `youtube_url_patterns`:
- Patterns to identify YouTube URLs.
- Current patterns: "youtube.com", "youtu.be"

## Website Extractor

- `markdown_cleaning`:
- `remove_patterns`:
- Patterns to remove from extracted markdown content.
- Current patterns remove image links, hyperlinks, and URLs.

## YouTube Transcriber

- `remove_phrases`:
- Phrases to remove from YouTube transcriptions.
- Current phrase: "[music]"

## Logging

- `level`: "INFO"
- Default logging level.
- `format`: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
- Format string for log messages.


## Website Extractor

- `markdown_cleaning`:
- `remove_patterns`:
- Additional patterns to remove from extracted markdown content:
- '\[.*?\]': Remove square brackets and their contents
- '\(.*?\)': Remove parentheses and their contents
- '^\s*[-*]\s': Remove list item markers
- '^\s*\d+\.\s': Remove numbered list markers
- '^\s*#+': Remove markdown headers
- `unwanted_tags`:
- HTML tags to be removed during extraction:
- 'script', 'style', 'nav', 'footer', 'header', 'aside', 'noscript'
- `user_agent`: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
- User agent string to be used for web requests
- `timeout`: 10
- Request timeout in seconds for web scraping


Loading
Loading