Git Product home page Git Product logo

pynndescent's Introduction

Travis Build Status AppVeyor Build Status Test Coverage Status LGTM Alerts LGTM Grade

PyNNDescent

A Python nearest neighbor descent for approximate nearest neighbors. This is a relatively straightforward python implementation of Nearest Neighbor Descent for k-neighbor-graph construction and approximate nearest neighbor search, as per the paper:

Dong, Wei, Charikar Moses, and Kai Li. "Efficient k-nearest neighbor graph construction for generic similarity measures." Proceedings of the 20th international conference on World wide web. ACM, 2011.

This library supplements that approach with the use of random projection trees for initialisation. This can be particularly useful for the metrics that are amenable to such approaches (euclidean, minkowski, angular, cosine, etc.).

Currently this library targets relatively high accuracy (90%-99% accuracy rate) approximate nearest neighbor searches.

Why use PyNNDescent?

PyNNDescent provides fast approximate nearest neighbor queries. The ann-benchmarks system puts it solidly in the mix of top performing ANN libraries:

GIST-960 Euclidean

ANN benchmark performance for GIST 960 dataset

NYTimes-256 Angular

ANN benchmark performance for NYTimes 256 dataset

While PyNNDescent is not the fastest ANN library, it is both easy to install (pip installable) with no platform or compilation issues, and very flexible, supporting a wide variety of distance metrics by default:

Minkowski style metrics

  • euclidean
  • manhattan
  • chebyshev
  • minkowski

Miscellaneous spatial metrics

  • canberra
  • braycurtis
  • haversine

Normalized spatial metrics

  • mahalanobis
  • wminkowski
  • seuclidean

Angular and correlation metrics

  • cosine
  • correlation
  • spearmanr

Metrics for binary data

  • hamming
  • jaccard
  • dice
  • russelrao
  • kulsinski
  • rogerstanimoto
  • sokalmichener
  • sokalsneath
  • yule

and also custom user defined distance metrics while still retaining performance.

PyNNDescent also integrates well with Scikit-learn, including providing support for the upcoming KNeighborTransformer as a drop in replacement for algorithms that make use of nearest neighbor computations.

How to use PyNNDescent

PyNNDescent aims to have a very simple interface. It is similar to (but more limited than) KDTrees and BallTrees in sklearn. In practice there are only two operations -- index construction, and querying an index for nearest neighbors.

To build a new search index on some training data data you can do something like

from pynndescent import NNDescent
index = NNDescent(data)

You can then use the index for searching (and can pickle it to disk if you wish). To search a pynndescent index for the 15 nearest neighbors of a test data set query_data you can do something like

index.query(query_data, k=15)

and that is pretty much all there is to it.

Installing

PyNNDescent is designed to be easy to install being a pure python module with relatively light requirements:

  • numpy
  • scipy
  • scikit-learn >= 0.18
  • numba >= 0.37

all of which should be pip installable. The easiest way to install should be

pip install pynndescent

To manually install this package:

wget https://github.com/lmcinnes/pynndescent/archive/master.zip
unzip master.zip
rm master.zip
cd pynndescent-master
python setup.py install

Help and Support

This project is still very young. I am currently trying to get example notebooks and documentation prepared, but it may be a while before those are available. In the meantime please open an issue and I will try to provide any help and guidance that I can. Please also check the docstrings on the code, which provide some descriptions of the parameters.

License

The pynndescent package is 2-clause BSD licensed. Enjoy.

Contributing

Contributions are more than welcome! There are lots of opportunities for potential projects, so please get in touch if you would like to help out. Everything from code to notebooks to examples and documentation are all equally valuable so please don't feel you can't contribute. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

pynndescent's People

Contributors

ericmjl avatar ivirshup avatar jlmelville avatar lmcinnes avatar sleighsoft avatar takebayashi avatar toddrme2178 avatar tomwhite avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.