Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.
The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.
All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.
A cron job is run every 24 hours to update the database.
The steps involved include:
- Crawling the source Beehiiv newsletter
- Converting each post to markdown
- Extracting and resolving unique links
- Fetching opengraph metadata for each link
- Fetching provider-specific metadata for some links (e.g. tweet text)
- Generating vector embeddings for each link using OpenAI
- Upserting all links into a Pinecone vector database
- Upserting all links into a Meilisearch database
We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into two databases:
- A Pinecone vector database for semantic search
- A Meilisearch database for traditional keyword search
Supporting both of these search indices isn't necessary, but I wanted to have a live comparison of the two approaches in action.
In general, I've found that semantic search is more accurate than keyword search, but keyword search is much faster and can be more intuitive for users.
Semantic search is powered by OpenAI's `text-embedding-ada-002` embedding model and Pinecone's hosted vector database.
Traditional keyword-based search is powered by Meilisearch.
- better search UX so back button works
- show the number of posts / links on the home page so it's clear when it was last updated
- acutally sort by recency instead of faking it
- set up cron to update the DB daily
- test on safari/firefox
- display which newsletter the post first appeared in
- explore hybrid search
- infinite scroll so you can keep scrolling results
MIT © Travis Fischer
All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.
If you found this project interesting, please consider sponsoring me or following me on twitter