Git Product home page Git Product logo

str_metrics's Introduction

StrMetrics

checks Gem Version license

Ruby gem (native extension in Rust) providing implementations of various string metrics. Current metrics supported are: Sørensen–Dice, Levenshtein, Damerau–Levenshtein, Jaro & Jaro–Winkler. Strings that are UTF-8 encodable (convertible to UTF-8 representation) are supported. All comparison of strings is done at the grapheme cluster level as described by Unicode Standard Annex #29; this may be different from many gems that calculate string metrics. See here for known compatibility.

Getting Started

Prerequisites

Install Rust (tested with version >= 1.47.0) with:

curl https://sh.rustup.rs -sSf | sh

Known compatibility

Ruby

3.1, 3.0, 2.7, 2.6, 2.5, 2.4, 2.3, jruby, truffleruby

Rust

1.60.0, 1.59.0, 1.58.1, 1.57.0, 1.56.1, 1.55.0, 1.54.0, 1.53.0, 1.52.1, 1.51.0, 1.50.0, 1.49.0, 1.48.0, 1.47.0

Platforms

Linux, MacOS, Windows

Installation

With bundler

Add this line to your application's Gemfile:

gem 'str_metrics'

And then execute:

$ bundle install

Without bundler

$ gem install str_metrics

Usage

All you need to do to use the metrics provided in this gem is to make sure str_metrics is required like:

require 'str_metrics'

Each metric is shown below with an example & meanings of optional parameters.

Sørensen–Dice

StrMetrics::SorensenDice.coefficient('abc', 'bcd', ignore_case: false)
 => 0.5

Options:

Keyword Type Default Description
ignore_case boolean false Case insensitive comparison?

Levenshtein

StrMetrics::Levenshtein.distance('abc', 'acb', ignore_case: false)
 => 2

Options:

Keyword Type Default Description
ignore_case boolean false Case insensitive comparison?

Damerau–Levenshtein

StrMetrics::DamerauLevenshtein.distance('abc', 'acb', ignore_case: false)
 => 1

Options:

Keyword Type Default Description
ignore_case boolean false Case insensitive comparison?

Jaro

StrMetrics::Jaro.similarity('abc', 'aac', ignore_case: false)
 => 0.7777777777777777

Options:

Keyword Type Default Description
ignore_case boolean false Case insensitive comparison?

Jaro–Winkler

StrMetrics::JaroWinkler.similarity('abc', 'aac', ignore_case: false, prefix_scaling_factor: 0.1, prefix_scaling_bonus_threshold: 0.7)
 => 0.7999999999999999

StrMetrics::JaroWinkler.distance('abc', 'aac', ignore_case: false, prefix_scaling_factor: 0.1, prefix_scaling_bonus_threshold: 0.7)
 => 0.20000000000000007

Options:

Keyword Type Default Description
ignore_case boolean false Case insensitive comparison?
prefix_scaling_factor decimal 0.1 Constant scaling factor for how much to weight common prefixes. Should not exceed 0.25.
prefix_scaling_bonus_threshold decimal 0.7 Prefix bonus weighting will only be applied if the Jaro similarity is greater given value.

Motivation

The main motivation was to have a central gem which can provide a variety of string metric calculations. Secondary motivation was to experiment with writing a native extension in Rust (instead of C).

Development

Getting started

gem install bundler
git clone https://github.com/anirbanmu/str_metrics.git
cd ./str_metrics
bundle install

Building (for native component)

rake rust_build

Testing (will build native component before running tests)

rake spec

Local installation

rake install

Deploying a new version

To deploy a new version of the gem to rubygems:

  1. Bump version in version.rb according to SemVer.
  2. Get your code merged to main branch
  3. After a git pull on main branch:
rake build && rake release

Authors

See all repo contributors here.

Versioning

SemVer is employed. See tags for released versions.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/anirbanmu/str_metrics.

Code of Conduct

Everyone interacting in this project's codebase, issue trackers etc. are expected to follow the code of conduct.

License

This project is licensed under the MIT License - see the LICENSE file for details

str_metrics's People

Contributors

anirbanmu avatar woidda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

str_metrics's Issues

JRuby?

Hi there!

Does this gem work with JRuby?

Thank you!

Add/verify Windows compatibility

Since this was all bootstrapped on linux & all automated tests run on linux, it's compatibility with Windows is somewhat unknown (and likely broken). If possible, tests on Windows should be added & made to pass. If automated testing is not possible, manual validation will have to do.

Add/verify OSX compatibility

Since this was all bootstrapped on linux & all automated tests run on linux, it's compatibility with OSX is somewhat unknown (and likely broken). If possible, tests on OSX should be added & made to pass. If automated testing is not possible, manual validation will have to do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.