anush008 / fastembed-rs Goto Github PK

View Code? Open in Web Editor NEW

148.0 3.0 16.0 181 KB

Library to generate vector embeddings. Rust implementation of Qdrant's FastEmbed.

Home Page: https://docs.rs/fastembed

License: Apache License 2.0

Rust 100.00%

embeddings rag retrieval retrieval-augmented-generation vector-search

fastembed-rs's Introduction

fastembed-rs's People

Contributors

Stargazers

Watchers

Forkers

yaman mcmah309 jmn prattcmp joshniemela gigq humanassistedintelligence pykeio jcorrie angelliang congphuong

fastembed-rs's Issues

Linking `cc` failed: exit status: 1

Hi I'm trying to run the sample mentioned in the README file but when I try to build I'm facing error linking cc failed: exit status: 1

sentence-transformers/clip-ViT-B-32-multilingual-v1 model

I was working on exporting sentence-transformers/clip-ViT-B-32-multilingual-v1 model to onnx without success. Finally I figured out by manipulating model with torch.onnx. I can contribute onnx model to fastembed-rs with the custom steps of torch.onnx export(python script).

Should I create a pull request?

Question about the removal of BAAI/bge-small-zh-v1.5 model in v3

Hello,

Firstly, I would like to express my appreciation for your work on the fastembed-rs project. It's been a great tool and has been very helpful in my tasks.

Upon the recent update to version 3, I noticed that the BAAI/bge-small-zh-v1.5 model has been removed. This model has been crucial in my projects and I was wondering why it was removed in the latest version.

Thank you for your time and for the effort you've put into this project. I look forward to your response.

Best regards,
AngelLiang

Falling back to CPU.

Does anyone also have this issue? How can we use GPU?

[2023-12-01T14:17:40Z WARN ort::execution_providers] No execution providers registered successfully. Falling back to CPU.

This is my setup code:

`use anyhow::{Context, Result};
use fastembed::{EmbeddingBase, EmbeddingModel, ExecutionProvider, FlagEmbedding, InitOptions};
use log::{error, info, warn};
use ort::execution_providers::{
ArenaExtendStrategy, CUDAExecutionProviderCuDNNConvAlgoSearch, CUDAExecutionProviderOptions,
};
use std::path::PathBuf;

pub struct EmbeddingGenerator {
model: FlagEmbedding,
}

impl EmbeddingGenerator {
/// Creates a new instance of the EmbeddingGenerator with the specified model.
pub fn new() -> Result {
info!("Initializing the EmbeddingGenerator with CUDA support");

    // Initialize CUDA Execution Provider Options
    let cuda_options = CUDAExecutionProviderOptions {
        device_id: 0,              // GPU device ID, typically 0 for a single-GPU system
        gpu_mem_limit: usize::MAX, // Maximum GPU memory limit
        arena_extend_strategy: ArenaExtendStrategy::NextPowerOfTwo, // Strategy for extending memory arena
        cudnn_conv_algo_search: CUDAExecutionProviderCuDNNConvAlgoSearch::Exhaustive, // Search strategy for cuDNN convolution algorithms
        do_copy_in_default_stream: true, // Whether to do copies in the default stream
        cudnn_conv_use_max_workspace: true, // Whether to use maximum workspace for cuDNN operations
        cudnn_conv1d_pad_to_nc1d: false,    // Padding strategy for 1D convolutions in cuDNN
        enable_cuda_graph: false,           // Whether to enable CUDA Graphs feature
        enable_skip_layer_norm_strict_mode: false, // Whether to use strict mode in SkipLayerNormalization
    };

    let init_options = InitOptions {
        model_name: EmbeddingModel::BGEBaseENV15, // Using the v1.5 release of the Base English model
        execution_providers: vec![ExecutionProvider::CUDA(cuda_options)], // Add CUDA as the execution provider
        max_length: 2048, // Maximum length of tokenized sequences
        cache_dir: PathBuf::from("./local_cache"), // Cache directory for the model
        show_download_message: true,
    };

    match FlagEmbedding::try_new(init_options) {
        Ok(model) => {
            info!("Successfully initialized the EmbeddingGenerator with CUDA");
            Ok(Self { model })
        }
        Err(e) => {
            error!(
                "Failed to initialize the EmbeddingGenerator with CUDA: {}",
                e
            );
            Err(e.into())
        }
    }
}

/// Generates embeddings for a given list of documents.
pub fn generate_embeddings(&self, documents: Vec<&str>) -> Result<Vec<Vec<f32>>> {
    self.model
        .embed(documents, None)
        .context("Failed to generate embeddings")
}

Implement `Display` instead of `ToString`

There is a trait implementation of ToString for EmbeddingModel https://github.com/Anush008/fastembed-rs/blob/main/src/lib.rs#L131.

The documentation for ToString says that you shouldn't implement the trait directly. It recommends to instead implement Display and you get to_string() for free. Plus, EmbeddingModel can then be printed without using the Debug output.

Query embed is very slow

Query embedding is very slow and even gets beaten by python in many cases, this has been found to be due to the padding strategy being fixed which means short queries of a few words will be padded to 512 tokens.
query_embed also uses the embed function which has a lot of overhead due to parallelisation which is not use for a single query, I propose making the query_embed function its own thing.

Benchmarking

The most important thing for this library is being as performant as possible, so it would be sensible to add benchmarks with criterion or a similar package

support building for arm architecture.

Hi,
Just wanted to know to what platforms rust fastembed supports.

Slow inference compared to python version

Hey, I'm new to rust and may be doing something wrong, but I'm getting worse performance in rust that in python. I'm using mac with m1 pro.
Rust version:

init: 138.47ms
passage_embed: 579.84ms
query_embed: 147.26ms

Python version:

init: 106.26ms
passage_embed: 31.68ms
query_embed: 2.82ms

Here is my scripts:

use fastembed::{EmbeddingBase, EmbeddingModel, FlagEmbedding, InitOptions};
use std::time::Instant;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let now = Instant::now();

    // With custom InitOptions
    let model: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
        model_name: EmbeddingModel::BGESmallEN,
        show_download_message: true,
        max_length: 512,
        ..Default::default()
    })?;

    let elapsed = now.elapsed();
    println!("init: {:.2?}", elapsed);

    let now = Instant::now();
    let documents = vec![
        "Maharana Pratap was a Rajput warrior king from Mewar",
        "He fought against the Mughal Empire led by Akbar",
        "The Battle of Haldighati in 1576 was his most famous battle",
        "He refused to submit to Akbar and continued guerrilla warfare",
        "His capital was Chittorgarh, which he lost to the Mughals",
        "He died in 1597 at the age of 57",
        "Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
        "His legacy is celebrated in Rajasthan through festivals and monuments",
        "He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
        "His life has been depicted in various films, TV shows, and books",
    ];
    let embeddings = model.passage_embed(documents, Some(1))?;
    let elapsed = now.elapsed();
    println!("passage_embed: {:.2?}", elapsed);

    let now = Instant::now();
    let query = "Who was Maharana Pratap?";
    let query_embed = model.query_embed(query)?; 
    let elapsed = now.elapsed();
    println!("query_embed: {:.2?}", elapsed);

    Ok(())
}

from typing import List
import numpy as np
from fastembed.embedding import FlagEmbedding as Embedding
import time

t1=time.time()
embedding_model = Embedding(model_name="BAAI/bge-small-en", max_length=512)
print(f'init: {(time.time()-t1)*1000:.02f}ms')

t1=time.time()
documents: List[str] = [
    "Maharana Pratap was a Rajput warrior king from Mewar",
    "He fought against the Mughal Empire led by Akbar",
    "The Battle of Haldighati in 1576 was his most famous battle",
    "He refused to submit to Akbar and continued guerrilla warfare",
    "His capital was Chittorgarh, which he lost to the Mughals",
    "He died in 1597 at the age of 57",
    "Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
    "His legacy is celebrated in Rajasthan through festivals and monuments",
    "He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
    "His life has been depicted in various films, TV shows, and books",
]

embeddings: List[np.ndarray] = list(
    embedding_model.passage_embed(documents)
)  # notice that we are casting the generator to a list
print(f'passage_embed: {(time.time()-t1)*1000:.02f}ms')

t1=time.time()
query = "Who was Maharana Pratap?"
query_embedding = list(embedding_model.query_embed(query))[0]
print(f'query_embed: {(time.time()-t1)*1000:.02f}ms')

Singular test takes too long

The singular test of the library takes around 5 minutes to run, which is too long for a unit test.

Suggestion: Split the unit test into 7 unit tests for each test in the models_and_expected_values vector.

I can code in this change.

Wrong onnxruntime.dll Picked Up When Running Integration Tests

Error

ort 1.16 is not compatible with the ONNX Runtime binary found at `onnxruntime.dll`; expected GetVersionString to return '1.16.x', but got '1.10.0'

How To Replicate

For windows, fastembed does not include onnxruntime.dll in target/debug/deps (which is used by integration tests). Therefore if onnxruntime.dll exists in C:/Windows/System32/ then that version is used. Causing the issue above.

To replicate, ensure a version of onnxruntime.dll exists in C:/Windows/System32/ that is not the same version as fastembed (currently 1.16.x). Adding a fastembed model to a library, like

lazy_static! {
    pub static ref EMBEDDING_MODEL: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
            model_name: EmbeddingModel::BGEBaseEN,
            show_download_message: true,
            ..Default::default()
        }).unwrap();
}

Then importing that module into an integration and attempting to use it that error will cause the error.

How To fix by hand

Add onnxruntime.dll to target/debug/deps

Expected Behavior

If fast fastembed runs correctly in a binary or library crate, it should also run correctly in their integration tests. (Note: I have not tested release mode or regular unit tests with fastembed yet.)

Possible solution

ort has a feature that could be used, but I have not tested. https://github.com/pykeio/ort/blob/4ab57859caa9490473bac3dfcd043dbb1b89d9a5/Cargo.toml#L44

How to load a local embedding model.

I don’t see in the documentation how I can instantiate an embedding model from my local file system give the file path to the downloaded embedding model.

I’m assuming this reaches out to hunggingface and stores the embedding model in local cache somewhere.

Other models

Currently this library has no way of defining an embedding model using ONNX files from a third party, would it make sense to implement a function that does this named try_new_from_file or similar?

Support fot building on Android

Hi, I am trying to build fastembed for Android through cross compilation
But I am getting error:

Error fail to run custom build comment for ort-sys v2. 0.0-rc.0

Just wanted to know if fastembed is supported for Android
Command used for cross compilation

cargo-ndk --target aarch64-linux-android --platform 21 -- build

[issue] Fastembed build is failing

I have created a sample cargo project:
$ cargo init

Added Fastembed dependency in cargo.toml file ..
[dependencies]
fastembed = "3"

$ cargo build

It is failing ..

As per the analysis it is due to yesterday's release of ort.. in which apis got changed..

Registering an ExecutionProvider

Im trying to get FastEmbed to utilise GPU for acceleration using the below code.

let coreml = CoreMLExecutionProvider::default();
if !coreml.is_available().unwrap() {
    eprintln!("Please compile ONNX Runtime with CoreML!");
    std::process::exit(1);
}
let model: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
    model_name: translation,
    show_download_message: true,
    execution_providers: vec![coreml.with_subgraphs().build()],
    ..Default::default()
})?;
let embeddings = model.passage_embed(text, None)?;
Ok(embeddings)

it seems to be making its way to the correct struct

[2024-02-13T05:40:13Z INFO  ort::session] drop; self=SharedSessionInner { session_ptr: 0x10e81f200, allocator: Allocator { ptr: 0x108db2900, is_default: true }, _environment: Environment { execution_providers: [CoreML(CoreMLExecutionProvider { use_cpu_only: false, enable_on_subgraph: true, only_enable_device_with_ane: false })], env_ptr: 0x600001419260 } }

however it still gives me the warning

ort::execution_providers] No execution providers registered successfully. Falling back to CPU.

Not sure what I'm doing wrong here.

Offline models

Following the version 3 changes, I can no longer run the program while my machine is offliine (even though the model is cached, the crate still looks to perform a check against the hosted model). Could this be changed so that, if the machine is offline, it will default to the cached model?

thread 'main' panicked at src\embedding\mod.rs:127:6:
Can't load model: request error: https://huggingface.co/api/models/Qdrant/all-MiniLM-L6-v2-onnx/revision/main: Dns Failed: resolve dns name 'huggingface.co:443': No such host is known. (os error 11001)

Caused by:
0: https://huggingface.co/api/models/Qdrant/all-MiniLM-L6-v2-onnx/revision/main: Dns Failed: resolve dns name 'huggingface.co:443': No such host is known. (os error 11001)
1: No such host is known. (os error 11001)
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Silence ort logging

Hello, is there any way to disable ort logging? Can't see anything useful while embedding calls are running. Thank you!!

V2 has linker error on macOS

Issue

V1 worked like a charm but when I tried to recompile the code with V2, I immediately got the following linker error:

error: linking with `cc` failed: exit status: 1
  |
  = note: env -u IPHONEOS_DEPLOYMENT_TARGET -u TVOS_DEPLOYMENT_TARGET LC_ALL="C" PATH="/Users/kevin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/bin:

...

"-ladd_ort_library_path_or_enable_feature_download-binaries_see_ort_docs" "-lc++" "-framework" "Foundation" "-liconv" "-lSystem" "-lc" "-lm" "-L" "/Users/kevin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib" "-o" "/Volumes/Documents/personal/wordshk-tools/examples/export_api/target/release/deps/export_api-368a468f1f948549" "-Wl,-dead_strip" "-nodefaultlibs"
  = note: ld: library 'add_ort_library_path_or_enable_feature_download-binaries_see_ort_docs' not found
          clang: error: linker command failed with exit code 1 (use -v to see invocation)

The message add_ort_library_path_or_enable_feature_download-binaries_see_ort_docs suggests that I need to have either a custom ONNX build on path or enable the download-binaries feature, which is on by default. I checked the dependencies of fastembed and found that the default features are disabled, which explains why.

Workaround

The current workaround for me is to turn on the download-binaries feature by depending on ort directly in my code.

[dependencies.ort]
version = "2.0.0-alpha.4"
features = ["download-binaries"]

Proposal

I think maybe it makes sense for fastembed to keep some default features like download-binaries so that it can be built without issue on all desktops?

Environment

macOS Sonoma 14.2.1
Apple M1 Max

Rust:

rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.74.1 (a28077b28 2023-12-04)`