Skip to content

Commit

Permalink
Merge pull request #158 from souzatharsis/feat/podbytopic
Browse files Browse the repository at this point in the history
Add generate podcast by topic #126
  • Loading branch information
souzatharsis authored Nov 7, 2024
2 parents 0773cac + 8612346 commit 2eb9b83
Show file tree
Hide file tree
Showing 19 changed files with 409 additions and 1,006 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Changelog

## [0.3.1] - 2024-11-07

### Breaking Changes
- Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.

### Added
- Add podcast generation from topic "Latest News in U.S. Politics"
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
- Integrate with Google's Multispeaker TTS model for high-quality audio generation
- Deploy [REST API](https://github.com/souzatharsis/podcastfy/blob/main/usage/api.md) with FastAPI
- Support for raw text as input
- Add PRIVACY_POLICY.md
- Start TESTIMONIALS.md
- Add apps using Podcastfy to README.md

## [0.2.3] - 2024-10-15

### Added
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ Podcastfy offers a range of customization options to tailor your AI-generated po
- Choose to run [Local LLMs](usage/local_llm.md) (156+ HuggingFace models)
- Set [System Settings](usage/config_custom.md) (e.g. output directory settings)

## Built with Podcastfy 🛠️

- [OpenNotebook](www.open-notebook.ai)
- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)
- [Podcastfy-Gradio App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)

## License

This software is licensed under [Apache 2.0](LICENSE). [Here](usage/license-guide.md) are a few instructions if you would like to use podcastfy in your software.
Expand Down
2 changes: 2 additions & 0 deletions TESTIMONIALS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- "Love that you casually built an open source version of the most popular product Google built in the last decade"
- "I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an *incredible* job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"
Binary file not shown.
17 changes: 17 additions & 0 deletions data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<Person1> "Welcome to PODCASTFY - Your Personal Generative AI Podcast! Hot off the digital press, we're diving into OpenAI's latest power move: snatching up Chat.com! Can you believe it?"
</Person1><Person2> "Seriously?! Chat.com? That's like owning prime real estate in the internet world. It's gotta be worth a fortune!"
</Person2><Person1> "Well, rumors are swirling around the $15 million mark, maybe even more! Think about it, it went for that much just last year to HubSpot's CTO, Dharmesh Shah, and he just sold it to OpenAI! Apparently even got some OpenAI shares in the deal. Pretty sweet, huh?"
</Person1><Person2> "Wow, OpenAI shares as part of the deal? That's insightful! But why Chat.com? Don't they already have ChatGPT?"
</Person2><Person1> "Exactly! It's all about accessibility, baby! Making ChatGPT even easier to find. Right now, it's just a redirect, but who knows what the future holds? Maybe a whole new platform built around it!"
</Person1><Person2> "Ooh, interesting. So, it's less about a new product, more about grabbing that sweet, sweet keyword: 'chat'."
</Person2><Person1> "Precisely! It's like buying the best billboard on the digital highway. Everyone searching for 'chat' might just stumble upon OpenAI's goldmine."
</Person1><Person2> "Smart move. But grabbing Chat.com isn't the only thing they've been up to, is it?"
</Person2><Person1> "Oh no, not even close! They're on a roll! ChatGPT search, their own built-in search engine—taking on Google, no less! And Canvas?! A brand-new way to use ChatGPT for writing and coding? Game changer!"
</Person1><Person2> "Hold on, Canvas? I haven't heard about that one. Fill me in!"
</Person2><Person1> "Think of it as a more interactive space within ChatGPT. Perfect for crafting documents, collaborative coding, you name it! It's like they're building a whole ecosystem around ChatGPT. Plus, they just dropped OpenAI o1, whatever *that* is! "
</Person1><Person2> "They're certainly not resting on their laurels! A for-profit transition in California? Hiring the former Pebble CEO, Gabor Cselle, for a 'secret project'? What's next, world domination? "
</Person2><Person1> "Haha, right? And let's not forget SimpleQA! OpenAI is pushing the boundaries of AI research left and right! I'm slightly concerned about these developments though, don't you think they are going a little too fast?"
</Person1><Person2> "I see your point. It *is* a lot, and fast. While innovation is exciting, responsible development is crucial. We need to make sure these advancements benefit humanity, not the other way around."
</Person2><Person1> "Absolutely. But hey, with all this happening, the AI landscape is definitely anything but boring! It'll be interesting to see how these moves play out, especially against giants like Google."
</Person1><Person2> "Couldn't agree more! OpenAI is certainly one to watch. This is just the beginning, folks. Buckle up!"
</Person2><Person1> "And that’s a wrap for today’s episode on OpenAI's strategic moves in the AI arena! Until next time, stay tuned to PODCASTFY!" </Person1>
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
project = 'podcastfy'
copyright = '2024, Tharsis T. P. Souza'
author = 'Tharsis T. P. Souza'
release = 'v0.2.10'
release = 'v0.3.1'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
823 changes: 0 additions & 823 deletions docs/source/podcastfy_demo.ipynb

This file was deleted.

127 changes: 77 additions & 50 deletions podcastfy.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion podcastfy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This file can be left empty for now
__version__ = "0.3.0" # or whatever version you're on
__version__ = "0.3.1" # or whatever version you're on
36 changes: 26 additions & 10 deletions podcastfy/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,18 @@


def process_content(
urls=None,
transcript_file=None,
tts_model="edge",
generate_audio=True,
config=None,
urls: Optional[List[str]] = None,
transcript_file: Optional[str] = None,
tts_model: Optional[str] = None,
generate_audio: bool = True,
config: Optional[Dict[str, Any]] = None,
conversation_config: Optional[Dict[str, Any]] = None,
image_paths: Optional[List[str]] = None,
is_local: bool = False,
text: Optional[str] = None,
model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
topic: Optional[str] = None,
):
"""
Process URLs, a transcript file, image paths, or raw text to generate a podcast or transcript.
Expand Down Expand Up @@ -68,16 +69,21 @@ def process_content(
)

combined_content = ""
if urls or topic:
content_extractor = ContentExtractor()

if urls:
logger.info(f"Processing {len(urls)} links")
content_extractor = ContentExtractor()
contents = [content_extractor.extract_content(link) for link in urls]
combined_content += "\n\n".join(contents)

if text:
combined_content += f"\n\n{text}"

if topic:
topic_content = content_extractor.generate_topic_content(topic)
combined_content += f"\n\n{topic_content}"

# Generate Q&A content using output directory from conversation config
random_filename = f"transcript_{uuid.uuid4().hex}.txt"
transcript_filepath = os.path.join(
Expand Down Expand Up @@ -162,6 +168,9 @@ def main(
api_key_label: str = typer.Option(
None, "--api-key-label", "-k", help="Environment variable name for LLMAPI key"
),
topic: str = typer.Option(
None, "--topic", "-tp", help="Topic to generate podcast about"
),
):
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, image files, or raw text.
Expand Down Expand Up @@ -194,15 +203,16 @@ def main(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
)
else:
urls_list = urls or []
if file:
urls_list.extend([line.strip() for line in file if line.strip()])

if not urls_list and not image_paths and not text:
if not urls_list and not image_paths and not text and not topic:
raise typer.BadParameter(
"No input provided. Use --url to specify URLs, --file to specify a file containing URLs, --transcript for a transcript file, --image for image files, or --text for raw text input."
"No input provided. Use --url, --file, --transcript, --image, --text, or --topic."
)

final_output = process_content(
Expand All @@ -216,6 +226,7 @@ def main(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
)

if transcript_only:
Expand Down Expand Up @@ -247,6 +258,7 @@ def generate_podcast(
text: Optional[str] = None,
llm_model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
topic: Optional[str] = None,
) -> Optional[str]:
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, or image files.
Expand All @@ -264,6 +276,7 @@ def generate_podcast(
text (Optional[str]): Raw text input to be processed.
llm_model_name (Optional[str]): LLM model name for content generation.
api_key_label (Optional[str]): Environment variable name for LLM API key.
topic (Optional[str]): Topic to generate podcast about.
Returns:
Optional[str]: Path to the final podcast audio file, or None if only generating a transcript.
Expand Down Expand Up @@ -310,16 +323,18 @@ def generate_podcast(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
)
else:
urls_list = urls or []
if url_file:
with open(url_file, "r") as file:
urls_list.extend([line.strip() for line in file if line.strip()])

if not urls_list and not image_paths and not text:
if not urls_list and not image_paths and not text and not topic:
raise ValueError(
"No input provided. Please provide either 'urls', 'url_file', 'transcript_file', 'image_paths', or 'text'."
"No input provided. Please provide either 'urls', 'url_file', "
"'transcript_file', 'image_paths', 'text', or 'topic'."
)

return process_content(
Expand All @@ -333,6 +348,7 @@ def generate_podcast(
text=text,
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
)

except Exception as e:
Expand Down
56 changes: 28 additions & 28 deletions podcastfy/content_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,20 @@ def __init__(

if is_local:
self.llm = Llamafile()
elif "gemini" in self.model_name.lower(): #keeping original gemini as a special case while we build confidence on LiteLLM
elif (
"gemini" in self.model_name.lower()
): # keeping original gemini as a special case while we build confidence on LiteLLM
self.llm = ChatGoogleGenerativeAI(
model=model_name,
temperature=temperature,
max_output_tokens=max_output_tokens,
)
else: # user should set api_key_label from input
self.llm = ChatLiteLLM(model=self.model_name,
temperature=temperature,
api_key=os.environ[api_key_label])
else: # user should set api_key_label from input
self.llm = ChatLiteLLM(
model=self.model_name,
temperature=temperature,
api_key=os.environ[api_key_label],
)


class ContentGenerator:
Expand Down Expand Up @@ -114,7 +118,7 @@ def __compose_prompt(self, num_images: int):
for i in range(num_images):
key = f"image_path_{i}"
image_content = {
"image_url": {"path": f"{{{key}}}", "detail": "high"},
"image_url": {"url": f"{{{key}}}", "detail": "high"},
"type": "image_url",
}
image_path_keys.append(key)
Expand Down Expand Up @@ -224,7 +228,7 @@ def generate_qa_content(
output_filepath: Optional[str] = None,
is_local: bool = False,
model_name: str = None,
api_key_label: str = "OPENAI_API_KEY"
api_key_label: str = "OPENAI_API_KEY",
) -> str:
"""
Generate Q&A content based on input texts.
Expand All @@ -248,15 +252,15 @@ def generate_qa_content(
)
if is_local:
model_name = "User provided local model"

llmbackend = LLMBackend(
is_local=is_local,
temperature=self.config_conversation.get("creativity", 0),
max_output_tokens=self.content_generator_config.get(
"max_output_tokens", 8192
),
model_name=model_name,
api_key_label=api_key_label
api_key_label=api_key_label,
)

num_images = 0 if is_local else len(image_file_paths)
Expand Down Expand Up @@ -287,48 +291,44 @@ def generate_qa_content(
logger.error(f"Error generating content: {str(e)}")
raise


def __clean_tss_markup(self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]) -> str:
def __clean_tss_markup(
self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]
) -> str:
"""
Remove unsupported TSS markup tags from the input text while preserving supported SSML tags.
Args:
input_text (str): The input text containing TSS markup tags.
additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
Returns:
str: Cleaned text with unsupported TSS markup tags removed.
"""
Returns:
str: Cleaned text with unsupported TSS markup tags removed.
"""
# List of SSML tags supported by both OpenAI and ElevenLabs
supported_tags = [
"speak", "lang", "p", "phoneme",
"s", "sub"
]
supported_tags = ["speak", "lang", "p", "phoneme", "s", "sub"]

# Append additional tags to the supported tags list
supported_tags.extend(additional_tags)

# Create a pattern that matches any tag not in the supported list
pattern = r'</?(?!(?:' + '|'.join(supported_tags) + r')\b)[^>]+>'
pattern = r"</?(?!(?:" + "|".join(supported_tags) + r")\b)[^>]+>"

# Remove unsupported tags
cleaned_text = re.sub(pattern, '', input_text)
cleaned_text = re.sub(pattern, "", input_text)

# Remove any leftover empty lines
cleaned_text = re.sub(r'\n\s*\n', '\n', cleaned_text)
cleaned_text = re.sub(r"\n\s*\n", "\n", cleaned_text)

# Ensure closing tags for additional tags are preserved
for tag in additional_tags:
cleaned_text = re.sub(
f'<{tag}>(.*?)(?=<(?:{"|".join(additional_tags)})>|$)',
f'<{tag}>\\1</{tag}>',
f"<{tag}>\\1</{tag}>",
cleaned_text,
flags=re.DOTALL
flags=re.DOTALL,
)

return cleaned_text.replace('(scratchpad)', '').strip()


return cleaned_text.replace("(scratchpad)", "").strip()


def main(seed: int = 42, is_local: bool = False) -> None:
Expand Down Expand Up @@ -375,4 +375,4 @@ def main(seed: int = 42, is_local: bool = False) -> None:


if __name__ == "__main__":
main()
main()
23 changes: 23 additions & 0 deletions podcastfy/content_parser/content_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,29 @@ def extract_content(self, source: str) -> str:
except Exception as e:
logger.error(f"Error extracting content from {source}: {str(e)}")
raise

def generate_topic_content(self, topic: str) -> str:
"""
Generate content based on a given topic using a generative model.
Args:
topic (str): The topic to generate content for.
Returns:
str: Generated content based on the topic.
"""
try:
import google.generativeai as genai

model = genai.GenerativeModel('models/gemini-1.5-pro-002')
topic_prompt = f'Be detailed. Search for {topic}'
response = model.generate_content(contents=topic_prompt, tools='google_search_retrieval')

return response.candidates[0].content.parts[0].text
except Exception as e:
logger.error(f"Error generating content for topic '{topic}': {str(e)}")
raise


def main(seed: int = 42) -> None:
"""
Expand Down
Loading

0 comments on commit 2eb9b83

Please sign in to comment.