Git Product home page Git Product logo

sif-embedding's Introduction

sif-embedding

actions status   Crates.io version   docs.rs docs

This is a Rust implementation of simple but powerful sentence embedding algorithms based on SIF and uSIF described in the following papers:

Features

  • No GPU required: This library runs on CPU only.
  • Fast embeddings: This library provides fast sentence embeddings thanks to the simple algorithms of SIF and uSIF. We observed that our SIF implementation could process ~80K sentences per second on M2 MacBook Air. (See benchmarks.)
  • Reasonable evaluation scores: The performances of SIF and uSIF on similarity evaluation tasks do not outperform those of SOTA models such as SimCSE. However, they are not so worse. (See evaluations.)

This library will help you if

  • DNN-based sentence embeddings are too slow for your application,
  • you do not have an option using GPUs, or
  • you want baseline sentence embeddings for your development.

Documentation

https://docs.rs/sif-embedding/

Getting started

See tutorial.

Benchmarks

benchmarks provides speed benchmarks.

We observed that, with an English Wikipedia dataset, our SIF implementation could process ~80K sentences per second on MacBook Air (one core of Apple M2, 24 GB RAM).

Evaluations

evaluations provides tools to evaluate sif-embedding on several similarity evaluation tasks.

STS/SICK

evaluations/senteval provides evaluation tools and results for SentEval STS/SICK Tasks.

As one example, the following table shows the evaluation results with the Spearman's rank correlation coefficient for the STS-Benchmark.

Model train dev test Avg.
sif_embedding::Sif 65.2 75.3 63.6 68.0
sif_embedding::USif 68.0 78.2 66.3 70.8
princeton-nlp/unsup-simcse-bert-base-uncased 76.9 81.7 76.5 78.4
princeton-nlp/sup-simcse-bert-base-uncased 83.3 86.2 84.3 84.6

JSTS/JSICK

eveluations/japanese provides evaluation tools and results for JGLUE JSTS and JSICK tasks.

As one example, the following table shows the evaluation results with the Spearman's rank correlation coefficient.

Model JSICK (test) JSTS (train) JSTS (val) Avg.
sif_embedding::Sif 79.7 67.6 74.6 74.0
sif_embedding::USif 79.7 69.3 76.0 75.0
cl-nagoya/unsup-simcse-ja-base 79.0 74.5 79.0 77.5
cl-nagoya/unsup-simcse-ja-large 79.6 77.8 81.4 79.6
cl-nagoya/sup-simcse-ja-base 82.8 77.9 80.9 80.5
cl-nagoya/sup-simcse-ja-large 83.1 79.6 83.1 81.9

Similarity search

qdrant-examples provides an example of using sif-embedding with qdrant/rust-client.

Wiki

Trouble shooting: Tips on how to resolve errors I faced in my environment.

Licensing

Licensed under either of

at your option.

sif-embedding's People

Contributors

kampersanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

sif-embedding's Issues

Suggestions based on the fse package

Hi,

I'm excited to see how this package grows - if I may, I'd suggest a glimpse at the FSE package roadmap in oborchers/Fast_Sentence_Embeddings#49 for some additions to the TODO things on the README. If a solid Rust base is present for multiple extensions of SIF, it'll make a great base to be used via Python.

I can't wait for a Python binding based on a Rust implementation of SIF and its variants!

Cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.