Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.
The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.
All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.
A cron job is run every 24 hours to update the database.
The steps involved include:
- Crawling the source Beehiiv newsletter
- Converting each post to markdown
- Extracting and resolving unique links
- Fetching opengraph metadata for each link
- Fetching provider-specific metadata for some links (e.g. tweet text)
- Generating vector embeddings for each link using OpenAI
- Upserting all links into a Pinecone vector database
We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.
Semantic search is powered by OpenAI's `text-embedding-ada-002` embedding model and Pinecone's hosted vector database.
- better search UX so back button works
- show the number of posts / links on the home page so it's clear when it was last updated
- acutally sort by recency instead of faking it
- set up cron to update the DB daily
- test on safari/firefox
- display which newsletter the post first appeared in
- explore hybrid search
- infinite scroll so you can keep scrolling results
MIT © Travis Fischer
All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.
If you found this project interesting, please consider sponsoring me or following me on twitter