Merge pull request #158 from souzatharsis/feat/podbytopic

Add generate podcast by topic #126
souzatharsis · Nov 7, 2024 · 2eb9b83 · 2eb9b83
2 parents 0773cac + 8612346
commit 2eb9b83
Show file tree

Hide file tree

Showing 19 changed files with 409 additions and 1,006 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,20 @@
 # Changelog
 
+## [0.3.1] - 2024-11-07
+
+### Breaking Changes
+- Loading images from 'path' has been removed for security reasons. Please specify images by passing an 'url'.
+
+### Added
+- Add podcast generation from topic "Latest News in U.S. Politics"
+- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
+- Integrate with Google's Multispeaker TTS model for high-quality audio generation
+- Deploy [REST API](https://github.com/souzatharsis/podcastfy/blob/main/usage/api.md) with FastAPI
+- Support for raw text as input
+- Add PRIVACY_POLICY.md
+- Start TESTIMONIALS.md
+- Add apps using Podcastfy to README.md
+
 ## [0.2.3] - 2024-10-15
 
 ### Added

diff --git a/README.md b/README.md
@@ -121,6 +121,12 @@ Podcastfy offers a range of customization options to tailor your AI-generated po
 - Choose to run [Local LLMs](usage/local_llm.md) (156+ HuggingFace models)
 - Set [System Settings](usage/config_custom.md) (e.g. output directory settings)
 
+## Built with Podcastfy 🛠️
+
+- [OpenNotebook](www.open-notebook.ai)
+- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)
+- [Podcastfy-Gradio App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)
+
 ## License
 
 This software is licensed under [Apache 2.0](LICENSE). [Here](usage/license-guide.md) are a few instructions if you would like to use podcastfy in your software.

diff --git a/TESTIMONIALS.md b/TESTIMONIALS.md
@@ -0,0 +1,2 @@
+- "Love that you casually built an open source version of the most popular product Google built in the last decade"
+- "I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an *incredible* job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"
diff --git a/data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3 b/data/audio/podcast_e9ef119af37c45b6abd8326cb382b3b0.mp3
diff --git a/data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt b/data/transcripts/transcript_c8b400052bbe48fa99b10c93ad8c3576.txt
@@ -0,0 +1,17 @@
+<Person1> "Welcome to PODCASTFY - Your Personal Generative AI Podcast!  Hot off the digital press, we're diving into OpenAI's latest power move: snatching up Chat.com!  Can you believe it?" 
+</Person1><Person2> "Seriously?! Chat.com? That's like owning prime real estate in the internet world.  It's gotta be worth a fortune!" 
+</Person2><Person1> "Well, rumors are swirling around the $15 million mark, maybe even more!  Think about it, it went for that much just last year to HubSpot's CTO, Dharmesh Shah, and he just sold it to OpenAI! Apparently even got some OpenAI shares in the deal. Pretty sweet, huh?"
+</Person1><Person2> "Wow,  OpenAI shares as part of the deal? That's insightful!  But why Chat.com?  Don't they already have ChatGPT?" 
+</Person2><Person1> "Exactly!  It's all about accessibility, baby! Making ChatGPT even easier to find.  Right now, it's just a redirect, but who knows what the future holds? Maybe a whole new platform built around it!"
+</Person1><Person2> "Ooh, interesting.  So, it's less about a new product, more about grabbing that sweet, sweet keyword: 'chat'."
+</Person2><Person1> "Precisely!  It's like buying the best billboard on the digital highway.  Everyone searching for 'chat' might just stumble upon OpenAI's goldmine." 
+</Person1><Person2> "Smart move.  But grabbing Chat.com isn't the only thing they've been up to, is it?"
+</Person2><Person1> "Oh no,  not even close! They're on a roll!  ChatGPT search, their own built-in search engine—taking on Google, no less!  And Canvas?!  A brand-new way to use ChatGPT for writing and coding? Game changer!"
+</Person1><Person2> "Hold on, Canvas?  I haven't heard about that one.  Fill me in!"
+</Person2><Person1> "Think of it as a more interactive space within ChatGPT.  Perfect for crafting documents,  collaborative coding, you name it! It's like they're building a whole ecosystem around ChatGPT. Plus, they just dropped OpenAI o1, whatever *that* is! "
+</Person1><Person2>  "They're certainly not resting on their laurels!  A for-profit transition in California? Hiring the former Pebble CEO, Gabor Cselle, for a 'secret project'? What's next, world domination? "
+</Person2><Person1> "Haha, right? And let's not forget SimpleQA! OpenAI is pushing the boundaries of AI research left and right! I'm slightly concerned about these developments though, don't you think they are going a little too fast?" 
+</Person1><Person2> "I see your point. It *is* a lot, and fast.   While innovation is exciting, responsible development is crucial. We need to make sure these advancements benefit humanity, not the other way around." 
+</Person2><Person1> "Absolutely.  But hey, with all this happening, the AI landscape is definitely anything but boring! It'll be interesting to see how these moves play out, especially against giants like Google." 
+</Person1><Person2> "Couldn't agree more!   OpenAI is certainly one to watch.  This is just the beginning, folks.  Buckle up!"
+</Person2><Person1> "And that’s a wrap for today’s episode on OpenAI's strategic moves in the AI arena! Until next time, stay tuned to PODCASTFY!" </Person1>
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -9,7 +9,7 @@
 project = 'podcastfy'
 copyright = '2024, Tharsis T. P. Souza'
 author = 'Tharsis T. P. Souza'
-release = 'v0.2.10'
+release = 'v0.3.1'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

diff --git a/docs/source/podcastfy_demo.ipynb b/docs/source/podcastfy_demo.ipynb
diff --git a/podcastfy.ipynb b/podcastfy.ipynb
diff --git a/podcastfy/__init__.py b/podcastfy/__init__.py
@@ -1,2 +1,2 @@
 # This file can be left empty for now
-__version__ = "0.3.0"  # or whatever version you're on
+__version__ = "0.3.1"  # or whatever version you're on
diff --git a/podcastfy/client.py b/podcastfy/client.py
@@ -28,17 +28,18 @@
 
 
 def process_content(
-    urls=None,
-    transcript_file=None,
-    tts_model="edge",
-    generate_audio=True,
-    config=None,
+    urls: Optional[List[str]] = None,
+    transcript_file: Optional[str] = None,
+    tts_model: Optional[str] = None,
+    generate_audio: bool = True,
+    config: Optional[Dict[str, Any]] = None,
     conversation_config: Optional[Dict[str, Any]] = None,
     image_paths: Optional[List[str]] = None,
     is_local: bool = False,
     text: Optional[str] = None,
     model_name: Optional[str] = None,
     api_key_label: Optional[str] = None,
+    topic: Optional[str] = None,
 ):
     """
     Process URLs, a transcript file, image paths, or raw text to generate a podcast or transcript.
@@ -68,16 +69,21 @@ def process_content(
             )
 
             combined_content = ""
+            if urls or topic:
+                content_extractor = ContentExtractor()
 
             if urls:
                 logger.info(f"Processing {len(urls)} links")
-                content_extractor = ContentExtractor()
                 contents = [content_extractor.extract_content(link) for link in urls]
                 combined_content += "\n\n".join(contents)
 
             if text:
                 combined_content += f"\n\n{text}"
 
+            if topic:
+                topic_content = content_extractor.generate_topic_content(topic)
+                combined_content += f"\n\n{topic_content}"
+
             # Generate Q&A content using output directory from conversation config
             random_filename = f"transcript_{uuid.uuid4().hex}.txt"
             transcript_filepath = os.path.join(
@@ -162,6 +168,9 @@ def main(
     api_key_label: str = typer.Option(
         None, "--api-key-label", "-k", help="Environment variable name for LLMAPI key"
     ),
+    topic: str = typer.Option(
+        None, "--topic", "-tp", help="Topic to generate podcast about"
+    ),
 ):
     """
     Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, image files, or raw text.
@@ -194,15 +203,16 @@ def main(
                 text=text,
                 model_name=llm_model_name,
                 api_key_label=api_key_label,
+                topic=topic,
             )
         else:
             urls_list = urls or []
             if file:
                 urls_list.extend([line.strip() for line in file if line.strip()])
 
-            if not urls_list and not image_paths and not text:
+            if not urls_list and not image_paths and not text and not topic:
                 raise typer.BadParameter(
-                    "No input provided. Use --url to specify URLs, --file to specify a file containing URLs, --transcript for a transcript file, --image for image files, or --text for raw text input."
+                    "No input provided. Use --url, --file, --transcript, --image, --text, or --topic."
                 )
 
             final_output = process_content(
@@ -216,6 +226,7 @@ def main(
                 text=text,
                 model_name=llm_model_name,
                 api_key_label=api_key_label,
+                topic=topic,
             )
 
         if transcript_only:
@@ -247,6 +258,7 @@ def generate_podcast(
     text: Optional[str] = None,
     llm_model_name: Optional[str] = None,
     api_key_label: Optional[str] = None,
+    topic: Optional[str] = None,
 ) -> Optional[str]:
     """
     Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, or image files.
@@ -264,6 +276,7 @@ def generate_podcast(
         text (Optional[str]): Raw text input to be processed.
         llm_model_name (Optional[str]): LLM model name for content generation.
         api_key_label (Optional[str]): Environment variable name for LLM API key.
+        topic (Optional[str]): Topic to generate podcast about.
 
     Returns:
         Optional[str]: Path to the final podcast audio file, or None if only generating a transcript.
@@ -310,16 +323,18 @@ def generate_podcast(
                 text=text,
                 model_name=llm_model_name,
                 api_key_label=api_key_label,
+                topic=topic,
             )
         else:
             urls_list = urls or []
             if url_file:
                 with open(url_file, "r") as file:
                     urls_list.extend([line.strip() for line in file if line.strip()])
 
-            if not urls_list and not image_paths and not text:
+            if not urls_list and not image_paths and not text and not topic:
                 raise ValueError(
-                    "No input provided. Please provide either 'urls', 'url_file', 'transcript_file', 'image_paths', or 'text'."
+                    "No input provided. Please provide either 'urls', 'url_file', "
+                    "'transcript_file', 'image_paths', 'text', or 'topic'."
                 )
 
             return process_content(
@@ -333,6 +348,7 @@ def generate_podcast(
                 text=text,
                 model_name=llm_model_name,
                 api_key_label=api_key_label,
+                topic=topic,
             )
 
     except Exception as e:

diff --git a/podcastfy/content_generator.py b/podcastfy/content_generator.py
@@ -50,16 +50,20 @@ def __init__(
 
         if is_local:
             self.llm = Llamafile()
-        elif "gemini" in self.model_name.lower(): #keeping original gemini as a special case while we build confidence on LiteLLM
+        elif (
+            "gemini" in self.model_name.lower()
+        ):  # keeping original gemini as a special case while we build confidence on LiteLLM
             self.llm = ChatGoogleGenerativeAI(
                 model=model_name,
                 temperature=temperature,
                 max_output_tokens=max_output_tokens,
             )
-        else: # user should set api_key_label from input
-            self.llm = ChatLiteLLM(model=self.model_name,
-                                   temperature=temperature,
-                                   api_key=os.environ[api_key_label])
+        else:  # user should set api_key_label from input
+            self.llm = ChatLiteLLM(
+                model=self.model_name,
+                temperature=temperature,
+                api_key=os.environ[api_key_label],
+            )
 
 
 class ContentGenerator:
@@ -114,7 +118,7 @@ def __compose_prompt(self, num_images: int):
         for i in range(num_images):
             key = f"image_path_{i}"
             image_content = {
-                "image_url": {"path": f"{{{key}}}", "detail": "high"},
+                "image_url": {"url": f"{{{key}}}", "detail": "high"},
                 "type": "image_url",
             }
             image_path_keys.append(key)
@@ -224,7 +228,7 @@ def generate_qa_content(
         output_filepath: Optional[str] = None,
         is_local: bool = False,
         model_name: str = None,
-        api_key_label: str = "OPENAI_API_KEY"
+        api_key_label: str = "OPENAI_API_KEY",
     ) -> str:
         """
         Generate Q&A content based on input texts.
@@ -248,15 +252,15 @@ def generate_qa_content(
                 )
             if is_local:
                 model_name = "User provided local model"
-                
+
             llmbackend = LLMBackend(
                 is_local=is_local,
                 temperature=self.config_conversation.get("creativity", 0),
                 max_output_tokens=self.content_generator_config.get(
                     "max_output_tokens", 8192
                 ),
                 model_name=model_name,
-                api_key_label=api_key_label
+                api_key_label=api_key_label,
             )
 
             num_images = 0 if is_local else len(image_file_paths)
@@ -287,48 +291,44 @@ def generate_qa_content(
             logger.error(f"Error generating content: {str(e)}")
             raise
 
-
-    def __clean_tss_markup(self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]) -> str:
+    def __clean_tss_markup(
+        self, input_text: str, additional_tags: List[str] = ["Person1", "Person2"]
+    ) -> str:
         """
         Remove unsupported TSS markup tags from the input text while preserving supported SSML tags.
 
         Args:
             input_text (str): The input text containing TSS markup tags.
-			additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
+                        additional_tags (List[str]): Optional list of additional tags to preserve. Defaults to ["Person1", "Person2"].
 
-		Returns:
-			str: Cleaned text with unsupported TSS markup tags removed.
-		"""
+                Returns:
+                        str: Cleaned text with unsupported TSS markup tags removed.
+        """
         # List of SSML tags supported by both OpenAI and ElevenLabs
-        supported_tags = [
-            "speak", "lang", "p", "phoneme",
-            "s", "sub"
-        ]
+        supported_tags = ["speak", "lang", "p", "phoneme", "s", "sub"]
 
         # Append additional tags to the supported tags list
         supported_tags.extend(additional_tags)
 
         # Create a pattern that matches any tag not in the supported list
-        pattern = r'</?(?!(?:' + '|'.join(supported_tags) + r')\b)[^>]+>'
+        pattern = r"</?(?!(?:" + "|".join(supported_tags) + r")\b)[^>]+>"
 
         # Remove unsupported tags
-        cleaned_text = re.sub(pattern, '', input_text)
+        cleaned_text = re.sub(pattern, "", input_text)
 
         # Remove any leftover empty lines
-        cleaned_text = re.sub(r'\n\s*\n', '\n', cleaned_text)
+        cleaned_text = re.sub(r"\n\s*\n", "\n", cleaned_text)
 
         # Ensure closing tags for additional tags are preserved
         for tag in additional_tags:
             cleaned_text = re.sub(
                 f'<{tag}>(.*?)(?=<(?:{"|".join(additional_tags)})>|$)',
-                f'<{tag}>\\1</{tag}>',
+                f"<{tag}>\\1</{tag}>",
                 cleaned_text,
-                flags=re.DOTALL
+                flags=re.DOTALL,
             )
 
-        return cleaned_text.replace('(scratchpad)', '').strip()
-
-
+        return cleaned_text.replace("(scratchpad)", "").strip()
 
 
 def main(seed: int = 42, is_local: bool = False) -> None:
@@ -375,4 +375,4 @@ def main(seed: int = 42, is_local: bool = False) -> None:
 
 
 if __name__ == "__main__":
-    main()
+    main()
diff --git a/podcastfy/content_parser/content_extractor.py b/podcastfy/content_parser/content_extractor.py
@@ -74,6 +74,29 @@ def extract_content(self, source: str) -> str:
 		except Exception as e:
 			logger.error(f"Error extracting content from {source}: {str(e)}")
 			raise
+
+	def generate_topic_content(self, topic: str) -> str:
+		"""
+		Generate content based on a given topic using a generative model.
+
+		Args:
+			topic (str): The topic to generate content for.
+
+		Returns:
+			str: Generated content based on the topic.
+		"""
+		try:
+			import google.generativeai as genai
+
+			model = genai.GenerativeModel('models/gemini-1.5-pro-002')
+			topic_prompt = f'Be detailed. Search for {topic}'
+			response = model.generate_content(contents=topic_prompt, tools='google_search_retrieval')
+
+			return response.candidates[0].content.parts[0].text
+		except Exception as e:
+			logger.error(f"Error generating content for topic '{topic}': {str(e)}")
+			raise
+
 
 def main(seed: int = 42) -> None:
 	"""
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		- "Love that you casually built an open source version of the most popular product Google built in the last decade"
		- "I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an incredible job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"