Git Product home page Git Product logo

m-tree's Introduction

An m-tree is a data structure which indexes objects according to their relative
distances. It is efficient for nearest-neighbor queries.

This implementation follows the content of the article
http://www.vldb.org/conf/1997/P426.PDF, with the following highlights:
 * The data structure is the same described in the article.
 * The algorithm for adding data had adaptations to handle with some corner cases.
 * The algorithm for removing data was designed from scratch, since it was not
   described on the article.
 * Both query algorithms (by-range and k-nearest-neighbor) were merged into one.

The query algorithm can have as criteria either the range or the maximum number
of resulting items, or both. The algorithm processes the results as needed, as
the resulting items are fetched. The results are returned in non-decreasing distance
from the query data parameter.

MTree is the core class that implements an m-tree. The data objects can be
any object that are understood by the distance function.
Some examples:
 * Data objects could be N-dimensional-space coordinates and the distance function
   could return the euclidean distance between two data objects.
 * Data objects could be regular strings and the distance function could calculate
   the edit (Levenshtein) distance between two strings.
The distance function must be provided by the user.
Two equal objects should not be added to the same MTree instance. There is no
validation regarding this and, if done, the behavior of the tree is undefined.

Given a data object type and a distance function, some aspects can directly
impact the performance of an m-tree: the constraints on the number of children for
the nodes and the support functions (split, promotion, and partition functions).
An m-tree is implemented as a tree (really!? :-P) and the minimum and maximum
number of children on each node can be customized. When the maximum capacity of
a node is exceeded, a split function is used to split the node, which is replaced
by two new nodes. The split function can be seen as being composed by a promotion
function and a partition function. The promotion function must choose (promote)
two children from the node that will be split. The partition function must partition
the set of children into two sets corresponding to the promoted children. The two
promoted children and their corresponding partitions will be used to create the
two nodes that will replace the exceeding node.
There are some pre-defined implementations for promotion and partition functions,
and the user can implement new ones at will.


The yet-to-be-implemented MTreeDict class wraps an MTree and a mapping structure,
so it helps on keeping track of objects already added. The data objects work
as keys for associated value objects, so MTreeDict can be used as a dictionary.


The yet-to-be-implemented MTreeMultiDict works like the MTreeDict, but allows
associating more than one value object to the same key. The number of values
associated to the same key are taken into account when performing queries
constrained by the number of results.



See the README file specific for each language:
 * README-py for Python.
 * README-cpp for C++.
	

For more information see:
	http://en.wikipedia.org/wiki/M-tree
	http://www.vldb.org/conf/1997/P426.PDF

m-tree's People

Contributors

erdavila avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.