anush008 / fastembed-rs Goto Github PK
View Code? Open in Web Editor NEWLibrary to generate vector embeddings. Rust implementation of Qdrant's FastEmbed.
Home Page: https://docs.rs/fastembed
License: Apache License 2.0
Library to generate vector embeddings. Rust implementation of Qdrant's FastEmbed.
Home Page: https://docs.rs/fastembed
License: Apache License 2.0
Hi I'm trying to run the sample mentioned in the README file but when I try to build I'm facing error linking cc failed: exit status: 1
I was working on exporting sentence-transformers/clip-ViT-B-32-multilingual-v1 model to onnx without success. Finally I figured out by manipulating model with torch.onnx. I can contribute onnx model to fastembed-rs with the custom steps of torch.onnx export(python script).
Should I create a pull request?
Hello,
Firstly, I would like to express my appreciation for your work on the fastembed-rs project. It's been a great tool and has been very helpful in my tasks.
Upon the recent update to version 3, I noticed that the BAAI/bge-small-zh-v1.5 model has been removed. This model has been crucial in my projects and I was wondering why it was removed in the latest version.
Thank you for your time and for the effort you've put into this project. I look forward to your response.
Best regards,
AngelLiang
Does anyone also have this issue? How can we use GPU?
[2023-12-01T14:17:40Z WARN ort::execution_providers] No execution providers registered successfully. Falling back to CPU.
This is my setup code:
`use anyhow::{Context, Result};
use fastembed::{EmbeddingBase, EmbeddingModel, ExecutionProvider, FlagEmbedding, InitOptions};
use log::{error, info, warn};
use ort::execution_providers::{
ArenaExtendStrategy, CUDAExecutionProviderCuDNNConvAlgoSearch, CUDAExecutionProviderOptions,
};
use std::path::PathBuf;
pub struct EmbeddingGenerator {
model: FlagEmbedding,
}
impl EmbeddingGenerator {
/// Creates a new instance of the EmbeddingGenerator with the specified model.
pub fn new() -> Result {
info!("Initializing the EmbeddingGenerator with CUDA support");
// Initialize CUDA Execution Provider Options
let cuda_options = CUDAExecutionProviderOptions {
device_id: 0, // GPU device ID, typically 0 for a single-GPU system
gpu_mem_limit: usize::MAX, // Maximum GPU memory limit
arena_extend_strategy: ArenaExtendStrategy::NextPowerOfTwo, // Strategy for extending memory arena
cudnn_conv_algo_search: CUDAExecutionProviderCuDNNConvAlgoSearch::Exhaustive, // Search strategy for cuDNN convolution algorithms
do_copy_in_default_stream: true, // Whether to do copies in the default stream
cudnn_conv_use_max_workspace: true, // Whether to use maximum workspace for cuDNN operations
cudnn_conv1d_pad_to_nc1d: false, // Padding strategy for 1D convolutions in cuDNN
enable_cuda_graph: false, // Whether to enable CUDA Graphs feature
enable_skip_layer_norm_strict_mode: false, // Whether to use strict mode in SkipLayerNormalization
};
let init_options = InitOptions {
model_name: EmbeddingModel::BGEBaseENV15, // Using the v1.5 release of the Base English model
execution_providers: vec![ExecutionProvider::CUDA(cuda_options)], // Add CUDA as the execution provider
max_length: 2048, // Maximum length of tokenized sequences
cache_dir: PathBuf::from("./local_cache"), // Cache directory for the model
show_download_message: true,
};
match FlagEmbedding::try_new(init_options) {
Ok(model) => {
info!("Successfully initialized the EmbeddingGenerator with CUDA");
Ok(Self { model })
}
Err(e) => {
error!(
"Failed to initialize the EmbeddingGenerator with CUDA: {}",
e
);
Err(e.into())
}
}
}
/// Generates embeddings for a given list of documents.
pub fn generate_embeddings(&self, documents: Vec<&str>) -> Result<Vec<Vec<f32>>> {
self.model
.embed(documents, None)
.context("Failed to generate embeddings")
}
}`
There is a trait implementation of ToString
for EmbeddingModel
https://github.com/Anush008/fastembed-rs/blob/main/src/lib.rs#L131.
The documentation for ToString
says that you shouldn't implement the trait directly. It recommends to instead implement Display
and you get to_string()
for free. Plus, EmbeddingModel
can then be printed without using the Debug
output.
Query embedding is very slow and even gets beaten by python in many cases, this has been found to be due to the padding strategy being fixed which means short queries of a few words will be padded to 512 tokens.
query_embed
also uses the embed
function which has a lot of overhead due to parallelisation which is not use for a single query, I propose making the query_embed
function its own thing.
The most important thing for this library is being as performant as possible, so it would be sensible to add benchmarks with criterion
or a similar package
Hi,
Just wanted to know to what platforms rust fastembed supports.
Hey, I'm new to rust and may be doing something wrong, but I'm getting worse performance in rust that in python. I'm using mac with m1 pro.
Rust version:
init: 138.47ms
passage_embed: 579.84ms
query_embed: 147.26ms
Python version:
init: 106.26ms
passage_embed: 31.68ms
query_embed: 2.82ms
Here is my scripts:
use fastembed::{EmbeddingBase, EmbeddingModel, FlagEmbedding, InitOptions};
use std::time::Instant;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let now = Instant::now();
// With custom InitOptions
let model: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
model_name: EmbeddingModel::BGESmallEN,
show_download_message: true,
max_length: 512,
..Default::default()
})?;
let elapsed = now.elapsed();
println!("init: {:.2?}", elapsed);
let now = Instant::now();
let documents = vec![
"Maharana Pratap was a Rajput warrior king from Mewar",
"He fought against the Mughal Empire led by Akbar",
"The Battle of Haldighati in 1576 was his most famous battle",
"He refused to submit to Akbar and continued guerrilla warfare",
"His capital was Chittorgarh, which he lost to the Mughals",
"He died in 1597 at the age of 57",
"Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
"His legacy is celebrated in Rajasthan through festivals and monuments",
"He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
"His life has been depicted in various films, TV shows, and books",
];
let embeddings = model.passage_embed(documents, Some(1))?;
let elapsed = now.elapsed();
println!("passage_embed: {:.2?}", elapsed);
let now = Instant::now();
let query = "Who was Maharana Pratap?";
let query_embed = model.query_embed(query)?;
let elapsed = now.elapsed();
println!("query_embed: {:.2?}", elapsed);
Ok(())
}
from typing import List
import numpy as np
from fastembed.embedding import FlagEmbedding as Embedding
import time
t1=time.time()
embedding_model = Embedding(model_name="BAAI/bge-small-en", max_length=512)
print(f'init: {(time.time()-t1)*1000:.02f}ms')
t1=time.time()
documents: List[str] = [
"Maharana Pratap was a Rajput warrior king from Mewar",
"He fought against the Mughal Empire led by Akbar",
"The Battle of Haldighati in 1576 was his most famous battle",
"He refused to submit to Akbar and continued guerrilla warfare",
"His capital was Chittorgarh, which he lost to the Mughals",
"He died in 1597 at the age of 57",
"Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
"His legacy is celebrated in Rajasthan through festivals and monuments",
"He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
"His life has been depicted in various films, TV shows, and books",
]
embeddings: List[np.ndarray] = list(
embedding_model.passage_embed(documents)
) # notice that we are casting the generator to a list
print(f'passage_embed: {(time.time()-t1)*1000:.02f}ms')
t1=time.time()
query = "Who was Maharana Pratap?"
query_embedding = list(embedding_model.query_embed(query))[0]
print(f'query_embed: {(time.time()-t1)*1000:.02f}ms')
The singular test of the library takes around 5 minutes to run, which is too long for a unit test.
Suggestion: Split the unit test into 7 unit tests for each test in the models_and_expected_values
vector.
I can code in this change.
ort 1.16 is not compatible with the ONNX Runtime binary found at `onnxruntime.dll`; expected GetVersionString to return '1.16.x', but got '1.10.0'
For windows, fastembed
does not include onnxruntime.dll
in target/debug/deps
(which is used by integration tests). Therefore if onnxruntime.dll
exists in C:/Windows/System32/
then that version is used. Causing the issue above.
To replicate, ensure a version of onnxruntime.dll
exists in C:/Windows/System32/
that is not the same version as fastembed (currently 1.16.x). Adding a fastembed
model to a library, like
lazy_static! {
pub static ref EMBEDDING_MODEL: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
model_name: EmbeddingModel::BGEBaseEN,
show_download_message: true,
..Default::default()
}).unwrap();
}
Then importing that module into an integration and attempting to use it that error will cause the error.
Add onnxruntime.dll
to target/debug/deps
If fast fastembed
runs correctly in a binary or library crate, it should also run correctly in their integration tests. (Note: I have not tested release mode or regular unit tests with fastembed
yet.)
ort has a feature that could be used, but I have not tested. https://github.com/pykeio/ort/blob/4ab57859caa9490473bac3dfcd043dbb1b89d9a5/Cargo.toml#L44
I don’t see in the documentation how I can instantiate an embedding model from my local file system give the file path to the downloaded embedding model.
I’m assuming this reaches out to hunggingface and stores the embedding model in local cache somewhere.
Currently this library has no way of defining an embedding model using ONNX files from a third party, would it make sense to implement a function that does this named try_new_from_file
or similar?
Hi, I am trying to build fastembed for Android through cross compilation
But I am getting error:
Error fail to run custom build comment for ort-sys v2. 0.0-rc.0
Just wanted to know if fastembed is supported for Android
Command used for cross compilation
cargo-ndk --target aarch64-linux-android --platform 21 -- build
I have created a sample cargo project:
$ cargo init
Added Fastembed dependency in cargo.toml file ..
[dependencies]
fastembed = "3"
$ cargo build
It is failing ..
As per the analysis it is due to yesterday's release of ort.. in which apis got changed..
Im trying to get FastEmbed
to utilise GPU for acceleration using the below code.
let coreml = CoreMLExecutionProvider::default();
if !coreml.is_available().unwrap() {
eprintln!("Please compile ONNX Runtime with CoreML!");
std::process::exit(1);
}
let model: FlagEmbedding = FlagEmbedding::try_new(InitOptions {
model_name: translation,
show_download_message: true,
execution_providers: vec![coreml.with_subgraphs().build()],
..Default::default()
})?;
let embeddings = model.passage_embed(text, None)?;
Ok(embeddings)
it seems to be making its way to the correct struct
[2024-02-13T05:40:13Z INFO ort::session] drop; self=SharedSessionInner { session_ptr: 0x10e81f200, allocator: Allocator { ptr: 0x108db2900, is_default: true }, _environment: Environment { execution_providers: [CoreML(CoreMLExecutionProvider { use_cpu_only: false, enable_on_subgraph: true, only_enable_device_with_ane: false })], env_ptr: 0x600001419260 } }
however it still gives me the warning
ort::execution_providers] No execution providers registered successfully. Falling back to CPU.
Not sure what I'm doing wrong here.
Following the version 3 changes, I can no longer run the program while my machine is offliine (even though the model is cached, the crate still looks to perform a check against the hosted model). Could this be changed so that, if the machine is offline, it will default to the cached model?
thread 'main' panicked at src\embedding\mod.rs:127:6:
Can't load model: request error: https://huggingface.co/api/models/Qdrant/all-MiniLM-L6-v2-onnx/revision/main: Dns Failed: resolve dns name 'huggingface.co:443': No such host is known. (os error 11001)
Caused by:
0: https://huggingface.co/api/models/Qdrant/all-MiniLM-L6-v2-onnx/revision/main: Dns Failed: resolve dns name 'huggingface.co:443': No such host is known. (os error 11001)
1: No such host is known. (os error 11001)
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Hello, is there any way to disable ort logging? Can't see anything useful while embedding calls are running. Thank you!!
V1 worked like a charm but when I tried to recompile the code with V2, I immediately got the following linker error:
error: linking with `cc` failed: exit status: 1
|
= note: env -u IPHONEOS_DEPLOYMENT_TARGET -u TVOS_DEPLOYMENT_TARGET LC_ALL="C" PATH="/Users/kevin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/bin:
...
"-ladd_ort_library_path_or_enable_feature_download-binaries_see_ort_docs" "-lc++" "-framework" "Foundation" "-liconv" "-lSystem" "-lc" "-lm" "-L" "/Users/kevin/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/aarch64-apple-darwin/lib" "-o" "/Volumes/Documents/personal/wordshk-tools/examples/export_api/target/release/deps/export_api-368a468f1f948549" "-Wl,-dead_strip" "-nodefaultlibs"
= note: ld: library 'add_ort_library_path_or_enable_feature_download-binaries_see_ort_docs' not found
clang: error: linker command failed with exit code 1 (use -v to see invocation)
The message add_ort_library_path_or_enable_feature_download-binaries_see_ort_docs
suggests that I need to have either a custom ONNX build on path or enable the download-binaries
feature, which is on by default. I checked the dependencies of fastembed
and found that the default features are disabled, which explains why.
The current workaround for me is to turn on the download-binaries
feature by depending on ort directly in my code.
[dependencies.ort]
version = "2.0.0-alpha.4"
features = ["download-binaries"]
I think maybe it makes sense for fastembed to keep some default features like download-binaries
so that it can be built without issue on all desktops?
macOS Sonoma 14.2.1
Apple M1 Max
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.74.1 (a28077b28 2023-12-04)`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.