Git Product home page Git Product logo

pynear's Introduction

Introduction

PyNear is a python library, internally built in C++, for efficient KNN search using metric distance function such as L2 distance (see VPTreeL2Index) or Hamming distances (VPTreeBinaryIndex and BKTreeBinaryIndex) as well as other distance functions. It uses AVX2 instructions to optimize search performance.

PyNear aims on providing different efficient algorithms for nearest neighbor search. One of the differentials of PyNear is the adoption of Vantage Point Tree in order to mitigate (up to some point) the curse of dimensionality for high dimensional features (see VPTree* indices for more information in docs).

Installation

pip install pynear

Performance can dramatically decrease if this library is compiled without support to Open MP and AVX.

Requirement

This library needs OpenMP support to be built and installed. The whole compilation proccess occur automatically by performing the installation step above.

Features

For more features and all available index types, refer to docs.

Pickle serialization

vptree indices are pickle serializable:

import numpy as np
import pynear

np.random.seed(seed=42)

num_points = 20000
dimension = 32
num_queries = 2
data = np.random.rand(num_points, dimension).astype(dtype=np.uint8)

queries = np.random.rand(num_queries, dimension).astype(dtype=np.uint8)

vptree = pynear.VPTreeBinaryIndex()
vptree.set(data)

data = pickle.dumps(vptree)
recovered = pickle.loads(data)

String serialization

Sometimes to check state of tree is interesting to be able to print the whole tree including information about the size and balancing. By using to_string() method one can print the whole tree to string. Be aware that this method is really slow and should not be used for any performance demanding tasks.

print(vptree.to_string())

Output:

####################
# [VPTree state]
Num Data Points: 100
Total Memory: 8000 bytes
####################
[+] Root Level:
 Depth: 0
 Height: 14
 Num Sub Nodes: 100
 Index Start: 0
 Index End:   99
 Left Subtree Height: 12
 Right Subtree Height: 12
 [+] Left children:
.... Depth: 1
.... Height: 12
.... Num Sub Nodes: 49
.... Index Start: 1
.... Index End:   49
.... Left Subtree Height: 10
.... Right Subtree Height: 10
.... [+] Left children:
........ Depth: 2
........ Height: 10
........ Num Sub Nodes: 24
........ Index Start: 2
........ Index End:   25
........ Left Subtree Height: 8
........ Right Subtree Height: 8
........ [+] Left children:
............ Depth: 3
............ Height: 8
............ Num Sub Nodes: 11
............ Index Start: 3
............ Index End:   13
............ Left Subtree Height: 6
............ Right Subtree Height: 6
............ [+] Left children:

...

Notice that this output can be very large.

Benchmarks

To visualize, customize or regenerate the benchmarks as well as to see benchmark results, see benchmarks session.

Development

Building and installing localy

pip install .

Running Python Tests

make test

Debugging and Running C++ Code on Unix

For debugging and running C++ code independently from python module, CMake config files are provided in pynear/CMakeLists.txt. For building and running C++ tests run:

make cpp-test

Since tests are built in Debug mode (default CMakeLists build mode), one can debug tests with gdb using built test binary:

gdb ./build/tests/vptree-tests

Debugging and Running C++ Code on Windows

Install CMake (for example py -m pip install cmake) and pybind11 (py -m pip install pybind11).

mkdir build
cd build
cmake ..\pynear

You may have to specify some arguments like the correct generator -G "Visual Studio 15 2017 Win64" or paths for Python -DPYTHON_EXECUTABLE="C:\Program Files\Python38\python.exe" and pybind11 -Dpybind11_DIR="C:\Program Files\Python38\Lib\site-packages\pybind11\share\cmake\pybind11" for CMake to work correctly.

Build generated files using Visual Studio (or whichever generator you chose) and run vptree-tests.exe.

Formatting code

make fmt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.