Git Product home page Git Product logo

bens-bites-ai-search's Introduction

Ben's Bites

Ben's Bites Link Search

Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.

Build Status MIT License Prettier Code Formatting

Intro

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.

How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

  1. Crawling the source Beehiiv newsletter
  2. Converting each post to markdown
  3. Extracting and resolving unique links
  4. Fetching opengraph metadata for each link
  5. Fetching provider-specific metadata for some links (e.g. tweet text)
  6. Generating vector embeddings for each link using OpenAI
  7. Upserting all links into a Pinecone vector database

We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.

Semantic Search

Semantic search is powered by OpenAI's `text-embedding-ada-002` embedding model and Pinecone's hosted vector database.

TODO

  • better search UX so back button works
  • show the number of posts / links on the home page so it's clear when it was last updated
  • acutally sort by recency instead of faking it
  • set up cron to update the DB daily
  • test on safari/firefox
  • display which newsletter the post first appeared in
  • explore hybrid search
  • infinite scroll so you can keep scrolling results

License

MIT © Travis Fischer

All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.

If you found this project interesting, please consider sponsoring me or following me on twitter twitter

bens-bites-ai-search's People

Contributors

transitive-bullshit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.