;TLDR: check out a demo I did for My First Million
Now, this whole flow is all over the place tbh. I hope I can summarize this in a digestable way
git clone
cd podcastsearch && git submodule init && git submodule update
Then get Chroma up and running
- app: Nextjs based frontend
- transcriber: Wrapper around wordcab-transcribe to take care of audio -> text
- main.ipynb: this glues individual pieces together by extracting audio from youtube videos, extracting transcriptions and storing them within Chroma. Load this into Google Colab and mount your Google Drive