PaperCast: AI generated podcast for each scientific research article

PaperCast is a project that turns any research articles into podcasts using AI generated audio. It is inspired by Illuminate https://illuminate.withgoogle.com/ and ScienceCast https://sciencecast.org/.

The author doesn't know any people working on Illuminate project nor their methods. The author is still in the waiting list for its beta release.

Compare with Illuminate

Aug 30th: Illuminate eventually becomes available, give it is a try https://illuminate.google.com/

	PaperCast	Illuminate
Open Source	✅ Yes	🟡 Not yet
Fine-grain control	✅ Yes	🟡 Only arxiv links
Research field	✅ Any research	🟡 Only Computer Science
Audio quality	✅ Good	✅ Very good
Voice tone	✅ Conversational	🟡 Flat
Paper source	✅ Any papers	🟡 ArXiv only
Allow multiple papers	🟡 Not yet	✅ Yes
Content understanding	✅ Good	✅ Good
Computing resource	💻 Local	☁️ Cloud
Generation Limit	✅ As many	🟡 5 per day
Has Red Panda?	Yes, Justin and Emma	Only humans🧑‍🎓

Changelogs

July 29th, 2024: refactorize arxiv reader and leverage its HTML render and parse to JSON + Markdown
Jun 16th, 2024: add author interview mode, by adding "author_interview_prompt" in prompt.yaml and additional_questions provided by authors; add PDF mode so it can extract necessary information for any PDF paper from pdfs directory. Check examples/run_cognitive.yaml for example.
Jun 15th, 2024: add subtitle srt file generation. See examples/run_gorilla.yaml to set offset if any intro audio, and example video at PaperCast EP5: "Gorilla: Large Language Model Connected with Massive APIs"

Example

To generate a podcast for "Attention is all you need", you can simply run the following command:

python run.py examples/run_attention.yaml

It should produce 1706.03762.json in the transcript directory and 1706.03762.wav in the audio directory.

Please also try a few example videos on Youtube. The play list link is at here

Installation

Setup OpenAI API key

export OPENAI_API_KEY=sk-xxxx

Check out repo and put ChatTTS in the directory

git clone https://github.com/phunterlau/papercast
cd papercast/
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
pip install -r requirements.txt
cd ..
pip install -r requirements.txt

Please note that ChatTTS is still very experimental. Please refer to its repo for issues and helps.

How to build a podcast

Use examples/run_attention.yaml for example. It contains a few keys:

url: "https://arxiv.org/abs/1706.03762"
use_cache: true
episode: 3
prompt: "dialogue_prompt"
background_knowledge: |
  Current year is 2024. Attention is all you need is known as the transformer paper published in 2017 by Google.
  It is the foundation paper of the current large language model research.

url: an Arxiv URL (abs or pdf) or a local file path of a PDF file.
use_cache: if load the cached LLM-generated transcript or start over.
episode : Episode number.
prompt: refer to prompt.YAML for the podcast style, dialogue or monologue etc.
(optional) background_knowledge: additional knowledge for better context understanding. Use "None" if not available.
(optional) additional_questions: additional research questions for input.

How does it work

I prefer the podcast in the question answering style, so the transcript must include a smooth conversation for a general overview, a few interesting questions, and the discussion onto them. The process includes 3 steps

predicting the research field of given article
LLM role play as a senior researcher in the research field, ask a few questions.
Generate a podcast by addressing these questions

Limitations

The question generation is limited to the article's title and abstract only. A better tree-level question generation using the full text might bring deeper and better questions.
It depends on ChatTTS https://github.com/2noise/ChatTTS for audio generation. The features are still very experimental and the speaker voice lottery is very tricky.

Future ideas

more article readers beyond arxiv loader
a good PDF loader to parse article meta data and sections
Add Chinese voices
Better question generation using full text
Support multi-persons discussions with agentic workflow
Support different interview modes, e.g. host vs author

License and disclaimer

This repo uses MIT License. It uses ChatTTS for audio generation and ChatTTS doesn't allow commercial use. The music in the podcast is generated by Suno.AI.

Acknowledge

Jina.ai has a good reader API https://jina.ai/reader/
ChatTTS https://github.com/2noise/ChatTTS

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
audio		audio
examples		examples
tools		tools
transcript		transcript
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arxiv_reader.py		arxiv_reader.py
llm_funcs.py		llm_funcs.py
papercast.png		papercast.png
pdf_reader.py		pdf_reader.py
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt
run.py		run.py
scidir_reader.py		scidir_reader.py
seed_1509_restored_emb.pt		seed_1509_restored_emb.pt
seed_1742_restored_emb.pt		seed_1742_restored_emb.pt
summarizer.py		summarizer.py
tts_gen.py		tts_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperCast: AI generated podcast for each scientific research article

Compare with Illuminate

Changelogs

Example

Installation

How to build a podcast

How does it work

Limitations

Future ideas

License and disclaimer

Acknowledge

About

Releases

Languages

License

phunterlau/papercast

Folders and files

Latest commit

History

Repository files navigation

PaperCast: AI generated podcast for each scientific research article

Compare with Illuminate

Changelogs

Example

Installation

How to build a podcast

How does it work

Limitations

Future ideas

License and disclaimer

Acknowledge

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages