Git Product home page Git Product logo

lancedb / vectordb-recipes Goto Github PK

View Code? Open in Web Editor NEW
509.0 509.0 89.0 92.18 MB

High quality resources & applications for LLMs, multi-modal models and VectorDBs

License: Apache License 2.0

JavaScript 0.11% TypeScript 0.36% CSS 0.01% Jupyter Notebook 97.03% Python 2.31% HTML 0.17%
agents ai deep-learning embeddings fine-tuning gpt gpt-4-vision langchain llama-index llms machine-learning multimodal openai rag vector-database

vectordb-recipes's Introduction

LanceDB Logo

Developer-friendly, database for multimodal AI

LanceDB lancdb Blog Discord Twitter

LanceDB Multimodal Search


LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.

  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).

  • Support for vector similarity search, full-text search and SQL.

  • Native Python and Javascript/Typescript support.

  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.

  • GPU support in building vector index(*).

  • Ecosystem integrations with LangChain ๐Ÿฆœ๏ธ๐Ÿ”—, LlamaIndex ๐Ÿฆ™, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust ๐Ÿฆ€ and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb
const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable({
  name: 'vectors',
  data:  [
    { id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
    { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }
  ]
})

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.search(undefined).where("price >= 10").execute();

Python

pip install lancedb
import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

vectordb-recipes's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vectordb-recipes's Issues

Update Lambda examples to use S3 Express

This is currently blocked by apache/arrow-rs#5140

We need:

  1. New API documentation from AWS to know what needs to be updated
  2. arrow-rs has to be updated
  3. arrow-rs needs to be released
  4. datafusion needs to update
  5. lance can then update

to test things out, we can certainly just fork arrow-rs and datafusion to create a new custom build.

[User Feedback]: Recipes need better branding, content depth & better introduction.

A power user provided the following critical feedback:

  • Who is this repo aimed at? Add better intro for each of the tables to set the right expectations.
    A user might come expecting tutorials with a lot of handholding. Instead 2/3 tables are mostly PoCs and Standalone applications. They expect users to know a bit more than "what are LLMs, vectorDBs, & RAGs respectively". We've recently added a 3rd table that is aimed more at "Lengthy tutorials" that are actually introductory but no one can deduce them from the table titles at a glance.
  • Why Should someone use this repo as opposed to others like openai cookbook and others?
    Missing loud backlinks to lancedb and doesn't tell much about the value props(serverless, no setup auth, native js etc.) so a user landing directly to this repo has no incentive to try it out.
    (This was actually done on purpose initially to get users to try the examples sooner, but maybe it's better to find a middle ground)

Add linting

Black & Isort. Pre commit actions if its not disruptive

openai.Configuration is not a constructor

Heyo your implementation of the OpenAIEmbeddingFunction seems to fail when i try to run it like this

`import { OpenAIEmbeddingFunction, connect } from 'vectordb';
import dotenv from 'dotenv';
dotenv.config();

const dbPath = 'assets/db/lancedb'
const apiKey = process.env.OPENAI_API_KEY
let embedFunction = new OpenAIEmbeddingFunction('info', apiKey)`

i get this error

        const configuration = new openai.Configuration({
                              ^
TypeError: openai.Configuration is not a constructor
    at new OpenAIEmbeddingFunction (...node_modules\vectordb\dist\embedding\openai.js:37:31)

Monthly Recipes audit

Jan -

Things to do:

  • Every example should work without errors
  • The requirements need to be present in the examples directory
  • If case there is a colab, make sure the first cell installs all requriments
  • The links to colabs and blogs should actually work

NOTE: When auditing, make sure each example gets tested in a separate env. Otherwise missing deps errors won't be captured for some cases. This can be automated via a script or smthn.

Mystery coworker call crashes workflow

I would love to get this working... LanceDB sounds like exactly what I need. However, when I run this, the search function calls a coworker "Database Manager", which does not exist. Where is the agent getting that coworker from? Is this something I need to define, or is it LanceDB somewhere?

(p.s. also, your blog post at https://blog.lancedb.com/track-ai-trends-crewai-agents-rag/ is missing the Tasks declaration code)

5, -3.4255998134613037, -4.876443386077881, 1.111032247543335, 1.3212069272994995, -0.4721265733242035, -1.0972766876220703, -1.4117374420166016, -0.3189839720726013, -0.26777249574661255, 0.044715918600559235, 0.47417911887168884, -1.2439095973968506, -0.051855288445949554, 1.9904354810714722, 1.192123293876648, -0.45629948377609253, -2.320878267288208, -0.9717252850532532, 0.4254404902458191, -0.5110087394714355, -0.3441590368747711], '_distance': 26038.275390625})]

  Thought: I now can give a great answer

Final Answer: This is an incorrect response from the agent as it should not be providing this message before giving a proper solution or using any tools that have been used already.

> Finished chain.
 

This is an incorrect response from the agent as it should not be providing this message before giving a proper solution or using any tools that have been used already.

 Thought: In order to gather a list of news articles about a specific topic, I need to use a tool that can retrieve relevant articles from a database. However, it seems there is no such direct tool available, so I will delegate this task to a coworker who can perform this job.

Action: Delegate work to co-worker

Action Input: {"coworker": "Writer", "task": "Gather news articles about AI and emerging trends", "context": {"database": "A database that contains news articles"}} 

I tried reusing the same input, I must stop using this action input. I'll try something else instead.



 Action: Ask question to co-worker

Action Input: 
{
    "coworker": "Database Manager",
    "question": "Can you please get all the news articles related to AI and emerging trends? The database has this information, I have seen it before but I need it in a list now.",
    "context": {
        "database": "A database that contains news articles"
    }
} 


Error executing tool. Co-worker mentioned not found, it must to be one of the following options:
- Writer


 Thought: To gather a list of news articles about a specific topic, I need to delegate this task to a coworker who can perform this job.

Action: Delegate work to co-worker

Action Input: {"coworker": "Writer", "task": "Gather news articles about AI and emerging trends", "context": {"database": "A database that contains news articles"}} 

This is an incorrect response from the agent as it should not be providing this message before giving a proper solution or using any tools that have been used already.

^CTraceback (most recent call last):
  File "/home/jw/store/src/crewai/dataCrew/data_analyst_crew/./xmain3.py", line 204, in <module>
    result = news_crew.kickoff()
             ^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/crew.py", line 204, in kickoff
    result = self._run_sequential_process()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/crew.py", line 240, in _run_sequential_process
    output = task.execute(context=task_output)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/task.py", line 148, in execute
    result = self._execute(
             ^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/task.py", line 157, in _execute
    result = agent.execute_task(
             ^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/agent.py", line 193, in execute_task
    result = self.agent_executor.invoke(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/agents/executor.py", line 64, in _call
    next_step_output = self._take_next_step(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain/agents/agent.py", line 1138, in _take_next_step
    [
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain/agents/agent.py", line 1138, in <listcomp>
    [
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/crewai/agents/executor.py", line 118, in _iter_next_step
    output = self.agent.plan(
             ^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain/agents/agent.py", line 397, in plan
    for chunk in self.runnable.stream(inputs, config={"callbacks": callbacks}):
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2685, in stream
    yield from self.transform(iter([input]), config, **kwargs)
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2672, in transform
    yield from self._transform_stream_with_config(
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1743, in _transform_stream_with_config
    chunk: Output = context.run(next, iterator)  # type: ignore
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2636, in _transform
    for output in final_pipeline:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1209, in transform
    for chunk in input:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 4532, in transform
    yield from self.bound.transform(
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1226, in transform
    yield from self.stream(final, config, **kwargs)
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 239, in stream
    raise e
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 222, in stream
    for chunk in self._stream(
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/langchain_openai/chat_models/base.py", line 408, in _stream
    for chunk in self.client.create(messages=message_dicts, **params):
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/openai/_streaming.py", line 46, in __iter__
    for item in self._iterator:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/openai/_streaming.py", line 61, in __stream__
    for sse in iterator:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/openai/_streaming.py", line 53, in _iter_events
    yield from self._decoder.iter(self.response.iter_lines())
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/openai/_streaming.py", line 287, in iter
    for line in iterator:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_models.py", line 863, in iter_lines
    for text in self.iter_text():
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_models.py", line 850, in iter_text
    for byte_content in self.iter_bytes():
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_models.py", line 829, in iter_bytes
    for raw_bytes in self.iter_raw():
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_models.py", line 887, in iter_raw
    for raw_stream_bytes in self.stream:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_client.py", line 124, in __iter__
    for chunk in self._stream:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpx/_transports/default.py", line 111, in __iter__
    for part in self._httpcore_stream:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 367, in __iter__
    raise exc from None
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py", line 363, in __iter__
    for part in self._stream:
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 336, in __iter__
    raise exc
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 328, in __iter__
    for chunk in self._connection._receive_response_body(**kwargs):
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 197, in _receive_response_body
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_sync/http11.py", line 211, in _receive_event
    data = self._network_stream.read(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jw/miniforge3/envs/crewai2/lib/python3.11/site-packages/httpcore/_backends/sync.py", line 126, in read
    return self._sock.recv(max_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^


Invalid argument error: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single dictionary for a given field across all batches.

Heyo me again ๐Ÿ‘ฏ

i created a little helper file for my usecase with lanceDb but when i run my example i get an error which doesnt help

[Error: Invalid argument error: Dictionary replacement detected when writing IPC file format. Arrow IPC files only support a single dictionary for a given field across all batches.]

Here is my code:

lanceDb-retriver.ts

import { OpenAIEmbeddingFunction, connect, } from 'vectordb';
const dbPath = 'assets/db'
let embedFunction;

export interface IngestOptions {
    table: string;
    data: Array<Record<string, unknown>>;
}

export interface RetriveOptions {
    query: string;
    table: string;
    limit?: number;
    filter?: string;
    select?: Array<string>;
}

export interface DeleteOptions {
    table: string;
    filter: string;
}

export interface UpdateOptions {
    table: string;
    data: Record<string, unknown>[]
}

export async function useLocalEmbedding() {
    const { pipeline } = await import('@xenova/transformers');
    const pipe = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

    const embed_fun: any = {};
    embed_fun.sourceColumn = 'text';
    embed_fun.embed = async function (batch) {
        let result = [];
        for (let text of batch) {
            const res = await pipe(text, { pooling: 'mean', normalize: true });
            result.push(Array.from(res['data']));
        }
        return result;
    }

    embedFunction = embed_fun;
}

export function useOpenAiEmbedding(apiKey: string, sourceColumn = 'pageContent') {
    embedFunction = new OpenAIEmbeddingFunction(sourceColumn, apiKey)
}

export async function update(options: UpdateOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.overwrite(options.data)
        } else {
            return new Error("Table does not exist")
        }
    } catch (e) {
        console.error(e);
        throw e;
    }
}

export async function remove(options: DeleteOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.delete(options.filter)
        } else {
            return new Error("Table does not exist")
        }
    } catch (e) {
        console.error(e);
        throw e;
    }
}

export async function ingest(options: IngestOptions) {
    try {
        const db = await connect(dbPath)
        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            await tbl.overwrite(options.data)
        } else {
            await db.createTable(options.table, options.data, embedFunction)
        }
    }
    catch (e) {
        console.error(e);
        throw e;
    }
}

export async function retrive(options: RetriveOptions) {
    try {
        const db = await connect(dbPath)

        if ((await db.tableNames()).includes(options.table)) {
            const tbl = await db.openTable(options.table, embedFunction)
            const build = tbl.search(options.query);

            if (options.filter) {
                build.filter(options.filter)
            }

            if (options.select) {
                build.select(options.select)
            }

            if (options.limit) {
                build.limit(options.limit)
            }

            const results = await build.execute();
            return results;
        } else {
            return new Error("Table does not exist")
        }
    }
    catch (e) {
        console.error(e);
        throw e;
    }
}

and i call it on an other file

test.ts

import dotenv from 'dotenv';
dotenv.config();
const apiKey = process.env.OPENAI_API_KEY;

async function main() {
    console.time('ingest');
    const data = [
        {
            id: 1,
            metadata: {
                title: "Lorem Ipsum Document",
                author: "John Doe",
                date: "2023-09-20"
            },
            pageContent: "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ac ipsum nec justo consequat dignissim. Nulla facilisi. Integer gravida tincidunt turpis eget iaculis."
        },
        {
            id: 2,
            metadata: {
                title: "Technical Report on AI Ethics",
                author: "Jane Smith",
                date: "2023-09-21"
            },
            pageContent: "This document provides an overview of the ethical considerations surrounding artificial intelligence. It covers topics such as bias in machine learning, data privacy, and responsible AI development."
        },
    ];
    useOpenAiEmbedding(apiKey);
    await ingest({
        data,
        table: 'vectors'
    })

    const retriveData = await retrive({
        table: 'vectors',
        query: 'what is lorem?'
    });

    console.log(retriveData);
    console.timeEnd('ingest');
}

main();

Integration recipes

Add a example/cookbook/recipe showcasing cloud API integration.

TO-DO :

  • LlamaIndex
  • Langchain

chat with any website app broken

I followed the readme, but keep getting 422 errors:

-> % curl 'http://localhost:7860/run/predict' \
  -H 'Accept: */*' \
  -H 'Accept-Language: en-US,en' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json' \
  -H 'Cookie: PGADMIN_LANGUAGE=en; _ga=GA1.1.919721918.1702334574; _ga_R1FN4KJKJH=GS1.1.1702334574.1.1.1702334813.0.0.0' \
  -H 'Origin: http://localhost:7860' \
  -H 'Pragma: no-cache' \
  -H 'Referer: http://localhost:7860/' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-GPC: 1' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' \
  -H 'dnt: 1' \
  -H 'sec-ch-ua: "Not_A Brand";v="8", "Chromium";v="120", "Brave";v="120"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"data":["https://lancedb.com/ & https://blog.lancedb.com/context-aware-chatbot-using-llama-2-lancedb-as-vector-database-4d771d95c755"],"event_data":null,"fn_index":0,"session_hash":"81i86315img"}' \
  --compressed
{"detail":[{"type":"missing","loc":["body","event_id"],"msg":"Field required","input":{"data":["https://lancedb.com/ & https://blog.lancedb.com/context-aware-chatbot-using-llama-2-lancedb-as-vector-database-4d771d95c755"],"event_data":null,"fn_index":0,"session_hash":"81i86315img"},"url":"https://errors.pydantic.dev/2.4/v/missing"}]}%     

Tag all examples

  • Add [beginner, intermediate, Advanced] tags for all examples
  • Try to maintain the ordering beginner followed by intermediate
  • group topics together for example like this:
Screenshot 2024-04-15 at 11 29 15โ€ฏPM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.