Git Product home page Git Product logo

kgraph's Introduction

KGraph: A Library for Approximate Nearest Neighbor Search

Introduction

KGraph is a library for k-nearest neighbor (k-NN) graph construction and online k-NN search using a k-NN Graph as index. KGraph implements heuristic algorithms that are extremely generic and fast:

  • KGraph works on abstract objects. The only assumption it makes is that a similarity score can be computed on any pair of objects, with a user-provided function.
  • KGraph is among the fastest of libraries for k-NN search according to recent benchmark.

For best generality, the C++ API should be used. A python wrapper is provided under the module name kgraph, which supports Euclidean and Angular distances on rows of NumPy matrices.

!!!pykgraph has been renamed to kgraph.

Building and Installation

KGraph depends on a recent version of GCC with C++11 support, cmake and the Boost library. The package can be built and installed with

cmake -DCMAKE_BUILD_TYPE=release .
make
sudo make install

A Makefile.plain is also provided in case cmake is not available.

The Python API can be installed with

python setup.py install

Python Quick Start

from numpy import random
import kgraph

dataset = random.rand(1000000, 16)
query = random.rand(1000, 16)

index = kgraph.KGraph(dataset, 'euclidean')  # another option is 'angular'
index.build(reverse=-1)                        #
index.save("index_file");
# load with index.load("index_file");

knn = index.search(query, K=10)                       # this uses all CPU threads
knn = index.search(query, K=10, threads=1)            # one thread, slower
knn = index.search(query, K=1000, P=100)              # search for 1000-nn, no need to recompute index.

Both index.build and index.search supports a number of optional keywords arguments to fine tune the performance. The default values should work reasonably well for many datasets. One exception is that reverse=-1 should be added if the purpose of building index is to speedup search, which is the typical case, rather than to obtain the k-NN graph itself.

Two precautions should be taken:

  • Although matrices of both float32 and float64 are supported, the latter is not optimized. It is recommened that matrices be converted to float32 before being passed into kgraph.
  • The dimension (columns of matrices) should be a multiple of 4. If not, zeros must be padded.

For performance considerations, the Python API does not support user-defined similarity function, as the callback function is invoked in such a high frequency that, if written in Python, speedup will inevitably be brought down. For the full generality, the C++ API should be used.

C++ Quick Start

The KGraph C++ API is based on two central concepts: the index oracle and the search oracle. (Oracle is a fancy way of calling a user-defined callback function that behaves like a black box.) KGraph works solely with object IDs from 0 to N-1, and relies on the oracles to map the IDs to actual data objects and to compute the similarity. To use KGraph, the user has to extend the following two abstract classes

    class IndexOracle {
    public:
        // returns size N of dataset
        virtual unsigned size () const = 0;
        // computes similarity of object 0 <= i and j < N
        virtual float operator () (unsigned i, unsigned j) const = 0;
    };

    class SearchOracle {
    public:
        /// Returns the size N of the dataset.
        virtual unsigned size () const = 0;
	/// Computes similarity of query and object 0 <= i < N.
        virtual float operator () (unsigned i) const = 0;
    };

The similarity values computed by the oracles must satisfy the following two conditions:

  • The more similar the objects are, the smaller the similarity value (0.1 < 10, -10 < 1).
  • Similarity must be symmetric, i.e. f(a, b) = f(b, a).

KGraph's heuristic algorithm does not make assumption about properties such as triangle-inequality. If the similarity is ill-defined, the worst it can do is to lower the accuracy and to slow down computation.

With the oracle classes defined, index construction and online search become straightfoward:

#include <kgraph.h>

KGraph *index = KGraph::create();

if (need_to_create_new_index) {
    MyIndexOracle oracle(...);	// subclass of kgraph::IndexOracle
    KGraph::IndexParams params;  
    params.reverse = -1;
    index->build(oracle, params);
    index->save("some_path");
}
else {
    index->load("some_path");
}

MySearchOracle oracle(...);	// subclass of kgraph::SearchOracle

KGraph::SearchParams params;
params.K = K;
vector<unsigned> knn(K);    	// to save K-NN ids.
index->search(oracle, params, &knn[0]);
// knn now contains the IDs of k-NNs, highest similarity in the front

delete index;

Note that the search API does not directly imply nearest neighbor search. Rather it is a generic API for minimizing a function on top of a graph, and finds the K nodes where the function assumes minimal values.

More Documentation

Oracles for Common Tasks

KGraph provides a number of efficient oracle implementation for common tasks.

kgraph's People

Contributors

aaalgo avatar cynthia avatar erikbern avatar jonbakerfish avatar polarnick239 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kgraph's Issues

index.search(query)

Is that possible to know that what KNN search algorithm was used for KGraph when I invoke index.search() ?

question about rnn

Hi, my question is why put a condition
if (nn.dist > nhood_o.radiusM)
when form reverse nodes.
Thanks!

image

[Improvement] batch methods for oracle interfaces

I see that both IndexOracle and SearchOracle compare one entity pair at a time, this leaves on the table room for these backends amortize access overheads over a batch of entities to retrieve, specially if the entities have to be retrieved from an key-value object store

`Floating point exception (core dumped)` during searching

I got an error when using the python search api, run index.search(dataset, query, K=100, P=1000, threads=0) and the program crashed down and got Floating point exception (core dumped). However, when P equals to other values, say 1001, 2000, 10000 etc, there's no such an error. Here's the code:

from numpy import random
import pykgraph
dataset = random.rand(1000, 4096)
index = pykgraph.KGraph()
index.build(dataset)
query = random.rand(1, 4096)
r1      = index.search(dataset, query, K=100, P=1000, threads=0, withDistance=False)
r2,dist = index.search(dataset, query, K=100, P=1000, threads=0, withDistance=True)

kgraph.cpp has a minor bug or not a bug

line 822 and 823 of kgraph.cpp:
nhoods[j].parallel_try_insert(i, dist);
if (r < params.K) ++uu;
should be
r = nhoods[j].parallel_try_insert(i, dist);
if (r < params.K) ++uu;
kgraph

How to compile kgraph with OpenBLAS?

I'm using the Python binding of kgraph.
I've tried the following to enable OpenBLAS:

  1. Update the OpenBLAS LDLIBS variable in this Makefile
  2. Run python setup.py install
  3. Pass blas=True when searching

But the speed was the same with or without the blas=True argument.
What can I do to make sure OpenBLAS is used?

By the way, kgraph is the fastest ANN (assuming the same level of recall) package I have discovered so far for my data.
But it's not as fast as I expect it to be, it's about ten times faster than my highly optimized bruteforce implementation with sklearn, while I expect it to be at least 50 times faster.
I built numpy with OpenBLAS in my bruteforce implementation, so I want to make sure kgraph also uses OpenBLAS and try it again.

Thanks.

Sample command lines

I would like to request sample command lines for running ./search and ./index. In what format do I need to provide data set for both executables and in what format do I need to give query inputs?

How should we feed a dataset as input to the search and index excutables. What should format of dataset ?

python api

hi
i am trying to use the python API but cannot seem to make the pykgraph package.
I am using a linux16.04 machine.
i cloned the repository onto my local computer in a folder and ran the initial compilation steps:
cmake -DCMAKE_BUILD_TYPE=release .
make
sudo make install
These went fine. when i run python setup.py install the only file that is created is pykgraph-2.0.egg-info in a folder where I have other python packages. There is no package folder 'pykgraph' that i can import into a python script which would then make a KNN graph. (I am using python 3.5)
when i try to import pykgraph i get a segmentation default
shobi@shobi:~/Documents/kgraph$ python
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import pykgraph
Segmentation fault (core dumped)
shobi@shobi:~/Documents/kgraph$

Also, The first time i run setup.py install i get the following warnings, but no errors:
running install
running build
running build_ext
building 'pykgraph' extension
creating build
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/python
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -I/home/shobi/.local/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -c kgraph.cpp -o build/temp.linux-x86_64-3.5/kgraph.o -O3 -std=c++11 -msse2 -fopenmp -DKGRAPH_VERSION=b'2372db6\n'
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -I/home/shobi/.local/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -c metric.cpp -o build/temp.linux-x86_64-3.5/metric.o -O3 -std=c++11 -msse2 -fopenmp -DKGRAPH_VERSION=b'2372db6\n'
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -I/home/shobi/.local/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -c python/pykgraph.cpp -o build/temp.linux-x86_64-3.5/python/pykgraph.o -O3 -std=c++11 -msse2 -fopenmp -DKGRAPH_VERSION=b'2372db6\n'
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/shobi/.local/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1816:0,
from /home/shobi/.local/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from python/pykgraph.cpp:5:
/home/shobi/.local/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
python/pykgraph.cpp: In function ‘PyObject* {anonymous}::load_lshkit(const string&, PyObject*)’:
python/pykgraph.cpp:64:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (!(descr->elsize == elsize)) throw kgraph::io_error(path);
^
python/pykgraph.cpp:66:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if ((!nd) || (PyArray_ITEMSIZE(nd) != elsize)) throw kgraph::runtime_er
^
python/pykgraph.cpp: In function ‘void* init()’:
python/pykgraph.cpp:459:26: warning: control reaches end of non-void function [-Wreturn-type]
init() { import_array(); }
^
creating build/lib.linux-x86_64-3.5
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/kgraph.o build/temp.linux-x86_64-3.5/metric.o build/temp.linux-x86_64-3.5/python/pykgraph.o -lboost_python -lboost_timer -o build/lib.linux-x86_64-3.5/pykgraph.cpython-35m-x86_64-linux-gnu.so -fopenmp
running install_lib
copying build/lib.linux-x86_64-3.5/pykgraph.cpython-35m-x86_64-linux-gnu.so -> /usr/local/lib/python3.5/dist-packages
running install_egg_info
Removing /usr/local/lib/python3.5/dist-packages/pykgraph-2.0.egg-info
Writing /usr/local/lib/python3.5/dist-packages/pykgraph-2.0.egg-info
Thank you for your help
Shobi

Header conflict when building with new versions of Boost

When building with Boost 1.74, I ran into an issue where there is a conflicting header file with the version file found in the top level of this repository.

I resolved the issue by renaming the version file [(linked here)] (https://github.com/aaalgo/kgraph/blob/master/version) to a different name (e.g. _version).

Turns out that version is now a standard Cpp header, re: this discussion here: boostorg/config#345

Changing the version filename permanently would resolve this issue for any users compiling with newer versions of Boost.

Thank you!

Returning both IDs and distances in python api

Is it possible to return both ids and distances in python api? Can I follow this line of code but replace the NULL to something likeconst_cast<float *>(distmatrix[i]) where:

PyObject *distance =  PyArray_SimpleNew(2, dims, NPY_FLOAT);
kgraph::MatrixProxy<float, 1> distmatrix(reinterpret_cast<PyArrayObject *>(distance));

What else should I change?

Python API: directly loading saved index without dataset

Thank you for sharing this project. It is very fast and powerful.

I wonder if there is a way to use the saved index without loading the raw features in python API.
I mean in the current python version, we need to doing the following steps to save the index.
--------------- first time ----------------------
index = pykgraph.KGraph(dataset, 'euclidean') # another option is 'angular'
index.build(reverse=-1)
index.save("index_file");

and when we use it next time, we need to
--------------next time--------------------
index = pykgraph.KGraph(dataset, 'euclidean')
index.load("index_file")

I wonder if we can do something like this to create the index and load the saved index without loading the raw feature again? Because in my program, raw feature is really big and it takes a lot memory....
--------------something like this--------------------
index = pykgraph.KGraph()
index.load("index_file")

P.S. http://www.kgraph.org/ seems not working. Is there any document available online about the usages of kgraph?
Thank you so much.

performance regressions

I'm rerunning ann-benchmarks and I'm seeing some issues with kgraph. Not sure what's going on but thought I'd mention it. Basically queries are very slow now

image

Example for CPP

Hello aaalgo,

I'm a CPP noob and I'm wondering how to use your code in CPP. (I have no doubt for python's API)

Could you provide an example of how to extend class IndexOracle and SearchOracle in CPP (maybe with Euclidean distance)? Do I have to save such class in additional .h .cpp?

graph construction

hi,
I am using kgraph for a fast KNN graph construction. I used to use scikit learn KNN_graph and get a csr sparse matrix with weighted edges (used L2 distance). I was wondering if it is possible to get a similar KNN Graph representation using kgraph (using python api).
I could otherwise use query to find the K nearest neighbor indices and then compute the L2 distances based on my original input data which has the coordinates of the corresponding nodes in ndim space. But this doesnt seem like a good way of doing it.
Thanks for your help
Shobi

python search wrapper error

Hi, I'm using the python wrapper index.search to retrieve nearest neighbor, but got this error:

python: ../kgraph-data.h:191: kgraph::MatrixProxy<DATA_TYPE, A>::MatrixProxy(PyArrayObject*) [with DATA_TYPE = float; unsigned int A = 16u; PyArrayObject = tagPyArrayObject_fields]: Assertion`stride % A == 0' failed.

Any idea?

kgraph-data.h error

I'm trying building and installation.

but I have an error and can not build it. The file is kgraph-data.h.

The error is as follows.

In file included from /root/Regard3D/src/thirdparty/kgraph/metric.cpp:2:0:
/root/Regard3D/src/thirdparty/kgraph/kgraph-data.h: In member function ‘void kgraph::Matrix<T, A>::reset(unsigned int, unsigned int)’:
/root/Regard3D/src/thirdparty/kgraph/kgraph-data.h:109:28: error: there are no arguments to ‘memalign’ that depend on a template parameter, so a declaration of ‘memalign’ must be available [-fpermissive]
data = (char *)memalign(A, row * stride); // SSE instruction needs data to be aligned
^~~~~~~~
/root/Regard3D/src/thirdparty/kgraph/kgraph-data.h:109:28: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
thirdparty/kgraph/CMakeFiles/kgraph.dir/build.make:86: recipe for target 'thirdparty/kgraph/CMakeFiles/kgraph.dir/metric.cpp.o' failed
make[2]: *** [thirdparty/kgraph/CMakeFiles/kgraph.dir/metric.cpp.o] Error 1
CMakeFiles/Makefile2:403: recipe for target 'thirdparty/kgraph/CMakeFiles/kgraph.dir/all' failed
make[1]: *** [thirdparty/kgraph/CMakeFiles/kgraph.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make: *** [all] Error 2

website down

Hi,
this isn't related directly to this repo, but this seemed to be the easiest way of letting you know...

The kgraph.org website shows error "403 Forbidden".

Trouble running "make release"

If I clone the repo right now and run "make release", this is what happens:

$ make release
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -c kgraph.cpp 
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -c metric.cpp 
g++ -shared -o libkgraph.so kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -static -fopenmp -o index index.cpp kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -static -fopenmp -o search search.cpp kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -static -fopenmp -o prune prune.cpp kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -static -fopenmp -o split split.cpp kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
g++ -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\" -fPIC -Wall -g -std=c++11 -I. -fopenmp -O3 -msse2  -static -fopenmp -o fvec2lshkit fvec2lshkit.cpp kgraph.o metric.o -lboost_timer -lboost_chrono -lboost_system -lboost_program_options -lgomp -lm -lrt
make -C python
make[1]: Entering directory `/home/ubuntu/kgraph/python'
g++ -shared -Wl,--export-dynamic -o pykgraph.so -g -fPIC -std=c++11 -I/usr/include/python2.7 -I.. pykgraph.cpp -L../bin -L.. -lkgraph -lboost_python -lpython2.7
In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1761:0,
                 from /usr/include/python2.7/numpy/ndarrayobject.h:17,
                 from pykgraph.cpp:5:
/usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^
make[1]: Leaving directory `/home/ubuntu/kgraph/python'
echo -DKGRAPH_VERSION=\"0ffe16c\" -DKGRAPH_BUILD_ID=\"\" -DKGRAPH_BUILD_NUMBER=\"\"
-DKGRAPH_VERSION="0ffe16c" -DKGRAPH_BUILD_ID="" -DKGRAPH_BUILD_NUMBER=""
rm -rf kgraph-release
mkdir kgraph-release
cp Makefile LICENSE kgraph.h kgraph-data.h index.cpp prune.cpp search.cpp flann_index.cpp flann_search.cpp split.cpp fvec2lshkit.cpp kgraph-release
cp Makefile.sdk kgraph-release/Makefile
mkdir kgraph-release/bin
cp libkgraph.so index search prune split fvec2lshkit  flann_index flann_search kgraph-release/bin
cp: cannot stat ‘flann_index’: No such file or directory
cp: cannot stat ‘flann_search’: No such file or directory
make: *** [release] Error 1

nhoods[n].found in wrong place

In my opinion, the line
nhoods[n].found = uu > 0;
should be deleted, but put a line in parallel_try_insert()
if(l<params.K) found = true
before the last line
return l;
image
Becasue update() checks nhood.found to find a new radiusM. These actions (check and find) are about the node itself, not its neighbors.
image

Setup.py issue

I am currently trying to run the setup.py file, but when I run the python setup.py install command, I get this error: CalledProcessError: Command 'git describe --always' returned non-zero exit status 128. Any help you can offer would be greatly appreciated.

What does the function prune2() do?

From the wiki, I know the function prune2() is to improve online search speed.
But when I read the code, I do not know what happened, and how does the function improve online search speed?
Can you give more details?

PyPI package

Would it be possible to submit the python wrapper to PyPI?

RuntimeError: FATAL: module compiled as little endian, but detected different endianness at runtime

I'm using Ubuntu 20.04, and installed the kgraph package following the instructions in the README file. The only small tweak I had to do in order to compile was to create a link to like so:

sudo ln -s libboost_python38.so libboost_python3.so

after that, the setup.py script executed correctly (I used the --user flag to install without root privilege).
When I try to import pykgraph as in the example, I get:

RuntimeError: FATAL: module compiled as little endian, but detected different endianness at runtime
Segmentation fault (core dumped)

Any ideas how to fix this?

suggestion to merge rnn_new and rnn_old in kgraph.c

rnn_new is a list of objects whose distance to nhood_o is larger than nhood_o.radiusM. Each of rnn_new is an object taking nhood_o as a member of pool.
rnn_old has the same meaning but nn.flag is FALSE, which has no difference to nhood_o.
image
I would suggest using one list 'rnn' to replace rnn_new and rnn_old.
In the latter part, insert rnn to nn_old because rnn is not so important to objects.
Thus way, I think, can reduce the workload of join().

Dynamic index update

Hi,

I'm retrieving nearest neighbors to generate mini-batches to train a neural network. A side effect is that embeddings of the data points are changing constantly. Can kgraph support dynamic updates of embeddings of data points?

I assume, though, that the distribution of the data does not change drastically, so helper structures splitting the input space may not need to be rebuilt.

Thanks!

8M 3D points cause out of memory?

I am trying to use kgraph to search nearest neighbors in large numbers of 3d points (around 8M, float).
But the memory consumption went to 12G when I build them, and killed itself for some reason.
In general, does kgraph good at low dimensional data?

Thank you.

Generating control...
Initializing...
iteration: 1 recall: 0 accuracy: 5598.69 cost: 4.60438e-05 M: 8.81314 delta: 1 time: 56.845 one-recall: 0 one-ratio: 1681.14
iteration: 2 recall: 0.0004 accuracy: 653.841 cost: 6.64835e-05 M: 8.81314 delta: 0.958052 time: 97.4261 one-recall: 0 one-ratio: 239.893
iteration: 3 recall: 0.0056 accuracy: 78.795 cost: 9.06406e-05 M: 10.6469 delta: 0.957443 time: 141.039 one-recall: 0.04 one-ratio: 37.5534
iteration: 4 recall: 0.0696 accuracy: 8.98511 cost: 0.000114621 M: 10.6864 delta: 0.951843 time: 184.422 one-recall: 0.26 one-ratio: 5.3301
iteration: 5 recall: 0.48 accuracy: 0.913621 cost: 0.00013838 M: 10.82 delta: 0.897214 time: 226.866 one-recall: 0.85 one-ratio: 1.21359
iteration: 6 recall: 0.9696 accuracy: 0.0216504 cost: 0.000161372 M: 11.7296 delta: 0.556845 time: 264.739 one-recall: 1 one-ratio: 1
iteration: 7 recall: 1 accuracy: 0 cost: 0.000199729 M: 18.025 delta: 0.205682 time: 316.05 one-recall: 1 one-ratio: 1
Killed

I am wondering is this normal, maybe the way I am using is wrong?

typedef kgraph::MatrixOracle<float, kgraph::metric::l2sqr> Oracle;
typedef kgraph::Matrix<float> Matrixf;
void InitCopyData(const std::vector<Eigen::Vector3f>& points,
    const std::vector<Eigen::Vector3f>* normals) {
  Matrixf data(points.size(), 3);
  for (uint32_t i = 0; i < points.size(); i++) {
    float* row = data[i];
    for (int j = 0; j < 3; j++, row++) {
      *row = points[i](j);
    }
  }
  oracle_.reset(new Oracle(data));
  LOG(INFO) << "Finished Copying";
  kgraph::KGraph::IndexParams params;
  params.K = 25;
  params.L = 100;
  params.S = 10;
  params.R = 100;
  params.controls = 100;
  params.recall = 0.99;
  params.delta = 0.002;
  params.reverse = 0;

  LOG(INFO) << "Building kgraph";
  index_->build(*oracle_.get(), params);
  LOG(INFO) << "Finished building kgraph";
}

how to update the radius in local join?

I am reading the source code. But I have a question about the local join in KGraphConstruction class.
I don't understand the update progress of the radius of Nhood. In init(), the radius is set to numeric_limits::max(). And the size of nhood.pool is set to param.L. After the join process, the radius is updated to nhood.pool.back().dist. But if the neighbors can not fill the pool size(param.L), the radius would be set to 0. This is confusing. Can you please explain the updating of radius in the construction process of KGraphConstruction?

Usage of Vector oracle

I need some help to use Kgraph and oracle.

I have Two data set.
A, B

Each data set has thousands of 30 dimension data.
A[30][N] B[30][M] N is not equal to M

I want to find the nearest neighbor(k=1) of A[30][i] in B

and vice-versa.

So I made this vector oracle

`

	        using typeECSAD = std::array<float, 30>;
               auto fn_l2_euclid = [](typeECSAD const &a, typeECSAD const &b)
 		{
 			float sum = 0.0f;
 			for (int i = 0; i < 30; i++)
 			{
 				sum += pow((a[i] - b[i]), 2);
 			}
 			sum = sqrt(sum);
 			return sum;
 		};
	using kgraph_ECSAD_oracle = kgraph::VectorOracle<std::vector<typeECSAD>, typeECSAD>;
	kgraph_ECSAD_oracle oracle_fI(m_FeaturesSet[fi], fn_l2_euclid);
	kgraph_ECSAD_oracle oracle_fJ(m_FeaturesSet[fj], fn_l2_euclid);

	kgraph::KGraph  *index_fI = kgraph::KGraph::create();
	kgraph::KGraph  *index_fJ = kgraph::KGraph::create();
	kgraph::KGraph::IndexParams params;

	index_fI->build(oracle_fI, params, NULL);
	index_fJ->build(oracle_fJ, params, NULL);`

I am not sure that I did it in the right way.

Anyway, it makes the distance between two data during the index building.

But, I have a problem.

It holds at parallel_try_insert in sudden.
It's ok couple of times, but suddenly it stops

Environment: visual studio 2013 update 5, win7 64bit

       unsigned parallel_try_insert (unsigned id, float dist) {
                if (dist > radius) return pool.size();
                LockGuard guard(lock);
                unsigned l = UpdateKnnList(&pool[0], L, Neighbor(id, dist, true));
                if (l <= L) { // inserted
                    if (L + 1 < pool.size()) { // if l == L + 1, there's a duplicate
                        ++L;
                    }
                    else {
                        radius = pool[L-1].dist;
                    }
                }
                return l;
            }

Finding out number of distances computed when `reverse=-1`

We were trying estimate the scan rate (as defined in your paper) of your algorithm on tiny-imagenet.

On setting reverse=0 , this is the number of total distance evaluations iterations*K^2 if I understand correctly. However, I was wondering how one could obtain the total number of distance evaluations when reverse=-1 from the code.

Also what do the parameters L, M, P, S, T mean?

Obtain edge weights of kNN graph

I'm using the Python API of the library.
I correctly generated the kNN using the instruction index.search(query, k=100), but I also want the edge weights of the kNN graph.
How is it possible to obtain them?
Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.