ubilabs / kd-tree-javascript Goto Github PK

JavaScript k-d Tree Implementation

License: MIT License

JavaScript 100.00%

kd-tree-javascript's Introduction

k-d Tree JavaScript Library

A basic but super fast JavaScript implementation of the k-dimensional tree data structure.

As of version 1.01, the library is defined as an UMD module (based on https://github.com/umdjs/umd/blob/master/commonjsStrict.js).

In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). k-d trees are a special case of binary space partitioning trees.

Demos

Spiders - animated multiple nearest neighbour search
Google Map - show nearest 20 out of 3000 markers on mouse move
Colors - search color names based on color space distance
Mutable - dynamically add and remove nodes

Usage

Using global exports

When you include the kd-tree script via HTML, the global variables kdTree and BinaryHeap will be exported.

// Create a new tree from a list of points, a distance function, and a
// list of dimensions.
var tree = new kdTree(points, distance, dimensions);

// Query the nearest *count* neighbours to a point, with an optional
// maximal search distance.
// Result is an array with *count* elements.
// Each element is an array with two components: the searched point and
// the distance to it.
tree.nearest(point, count, [maxDistance]);

// Insert a new point into the tree. Must be consistent with previous
// contents.
tree.insert(point);

// Remove a point from the tree by reference.
tree.remove(point);

// Get an approximation of how unbalanced the tree is.
// The higher this number, the worse query performance will be.
// It indicates how many times worse it is than the optimal tree.
// Minimum is 1. Unreliable for small trees.
tree.balanceFactor();

Using RequireJS

requirejs(['path/to/kdTree.js'], function (ubilabs) {
	// Create a new tree from a list of points, a distance function, and a
	// list of dimensions.
	var tree = new ubilabs.kdTree(points, distance, dimensions);

	// Query the nearest *count* neighbours to a point, with an optional
	// maximal search distance.
	// Result is an array with *count* elements.
	// Each element is an array with two components: the searched point and
	// the distance to it.
	tree.nearest(point, count, [maxDistance]);

	// Insert a new point into the tree. Must be consistent with previous
	// contents.
	tree.insert(point);

	// Remove a point from the tree by reference.
	tree.remove(point);

	// Get an approximation of how unbalanced the tree is.
	// The higher this number, the worse query performance will be.
	// It indicates how many times worse it is than the optimal tree.
	// Minimum is 1. Unreliable for small trees.
	tree.balanceFactor();
});

Example

var points = [
  {x: 1, y: 2},
  {x: 3, y: 4},
  {x: 5, y: 6},
  {x: 7, y: 8}
];

var distance = function(a, b){
  return Math.pow(a.x - b.x, 2) +  Math.pow(a.y - b.y, 2);
}

var tree = new kdTree(points, distance, ["x", "y"]);

var nearest = tree.nearest({ x: 5, y: 5 }, 2);

console.log(nearest);

About

Developed at Ubilabs. Released under the MIT Licence.

kd-tree-javascript's People

Contributors

Stargazers

Watchers

Forkers

nakosung sesamesushi bily suite117 mariorossano estebanz01 oss ajay1900 fth-ship philogb imclab fungiboletus factoidforrest manfredik ouhouhsami linusvanelswijk lucas-solutions madelfio jolsalazar makir215 dalequark curzonj cnc-viewer roma07 wellic jjviscomi davoli mcanthony csbrandt rwzhao ottojursch altafganihar sangkyunyoon zhonghai exialym christopher-fu allandragoon wegiangb smandyaj patricknelson meodai dingmingxin badforlabor nis24jit horasio rainerhessmer monad-one wahabtobibello swustlqsh rstallman lp249839965 jdthorpe daxianyu dieguito12 haoyaogang khaledsmq aweffr hyeong-jin xevrem shenhuang gifrancohe jrodriguesimg zuchuanzhang pandinosaurus bssrdf elmarti rpopovici sarthaklaghate katacka semirelezovikj rozhkovv mookat19 elijahkotyluk polaris-here haoyu2 jaemin032 mootari hellicesaouli thinkbotsarefree cujhg wwwk zhaolihang ebhsgit xeolabs ue007 amber-372 lisihang biomotion musixnotmusic ehtick wja36rus jupyterjones mariosgit b4park shortshor mzeey learningunity bluntworks vitaly-z easytree

kd-tree-javascript's Issues

Improper node placement

Using a large dataset ( > 500 ) and removing nodes at random will cause the KDTree to not be segmented correctly. This will cause nodes to not be found correctly.

Create a set of random 2d points
Find the single nearest point within a random 2d point.
Remove the single nearest point.
Repeat from step 2 until you have no more points.

Expected outcome. We are able to remove all records from the tree.

Actual: There are records left, as the tree has made some nodes "unreachable" by not placing them correctly due to the remove function.

Let me know if you want a proof of bug.

Distance metrics such as cosine similarity don't work

When using a metric such as Cosine similarity, the returned vector is almost never the one with the smallest distance value.
Example:

Dataset

var points = [
{ 'x': 61, 'y': 54, 'id': 'A' },
{ 'x': 7, 'y': 54, 'id': 'B' },
{ 'x': 62, 'y': 58, 'id': 'C' },
{ 'x': 75, 'y': 48, 'id': 'D' },
{ 'x': 76, 'y': 52, 'id': 'E' },
{ 'x': 41, 'y': 16, 'id': 'F' },
{ 'x': 35, 'y': 35, 'id': 'G' },
{ 'x': 34, 'y': 63, 'id': 'H' },
{ 'x': 22, 'y': 32, 'id': 'I' },
{ 'x': 0, 'y': 53, 'id': 'J' },
{ 'x': 13, 'y': 91, 'id': 'K' },
{ 'x': 87, 'y': 24, 'id': 'L' },
{ 'x': 63, 'y': 2, 'id': 'M' },
{ 'x': 10, 'y': 76, 'id': 'N' },
{ 'x': 85, 'y': 98, 'id': 'O' },
{ 'x': 94, 'y': 90, 'id': 'P' },
{ 'x': 52, 'y': 47, 'id': 'Q' },
{ 'x': 100, 'y': 38, 'id': 'R' },
{ 'x': 95, 'y': 47, 'id': 'S' },
{ 'x': 62, 'y': 15, 'id': 'T' },
{ 'x': 93, 'y': 66, 'id': 'U' },
{ 'x': 46, 'y': 54, 'id': 'V' },
{ 'x': 85, 'y': 99, 'id': 'W' },
{ 'x': 32, 'y': 53, 'id': 'X' },
{ 'x': 11, 'y': 37, 'id': 'Y' },
{ 'x': 0, 'y': 54, 'id': 'Z' },
{ 'x': 90, 'y': 100, 'id': 's' },
{ 'x': 84, 'y': 58, 'id': 't' },
{ 'x': 97, 'y': 35, 'id': 'u' },
{ 'x': 24, 'y': 34, 'id': 'v' },
{ 'x': 67, 'y': 70, 'id': 'w' },
{ 'x': 16, 'y': 7, 'id': 'x' },
{ 'x': 27, 'y': 73, 'id': 'a' },
{ 'x': 0, 'y': 35, 'id': 'b' },
{ 'x': 97, 'y': 47, 'id': 'c' },
{ 'x': 44, 'y': 56, 'id': 'd' },
{ 'x': 23, 'y': 11, 'id': 'e' },
{ 'x': 3, 'y': 80, 'id': 'f' },
{ 'x': 87, 'y': 27, 'id': 'g' },
{ 'x': 42, 'y': 29, 'id': 'h' },
{ 'x': 77, 'y': 72, 'id': 'i' },
{ 'x': 40, 'y': 10, 'id': 'j' },
{ 'x': 86, 'y': 91, 'id': 'k' },
{ 'x': 43, 'y': 23, 'id': 'l' },
{ 'x': 55, 'y': 13, 'id': 'm' },
{ 'x': 88, 'y': 14, 'id': 'n' },
{ 'x': 67, 'y': 22, 'id': 'o' },
{ 'x': 88, 'y': 91, 'id': 'p' },
{ 'x': 82, 'y': 33, 'id': 'q' },
{ 'x': 97, 'y': 29, 'id': 'r' }
]

Cosine similarity (note the negative sign at the end of the function, as it would normally increase with similar vectors):

    const cosineSimilarity = (vec1, vec2) => {
        let dotProduct = 0;
        let magnitude1 = 0;
        let magnitude2 = 0;

        for (const dim of Object.keys(vec1)) {
            if (typeof (vec1[dim]) != 'number') continue;

            dotProduct += vec1[dim] * vec2[dim];
            magnitude1 += Math.pow(vec1[dim], 2);
            magnitude2 += Math.pow(vec2[dim], 2);
        }

        return -dotProduct / (Math.sqrt(magnitude1) * Math.sqrt(magnitude2));
    };

Inefficient function the get the closest vectors from an array.

    const closestVectors = (vec, vectors) => {
        const distances = vectors.map(v => cosineSimilarity(vec, v));
        const min = Math.min(...distances);
        // Check if multiple with same distance
        const minIndices = distances.reduce((a, e, i) => {
            if (e === min) a.push(i);
            return a;
        }, []);
        // Return list of closest vectors
        return { vectors: minIndices.map(i => vectors[i]), distances: minIndices.map(i => distances[i]) };
    };

Testing

    const test = { x: 200, y: 123, label: "TEST" }
    var tree = new kdTree(points, cosineSimilarity, ['x', 'y']);

    const nearest = tree.nearest(test, 1);
    const nearest2 = closestVectors(test, points);

    console.log("K-D Tree")
    console.log(nearest)
    console.log("Custom")
    console.log(nearest2)

  // > K-D Tree
  // > [ [ { x: 76, y: 52, id: 'E' }, -0.998815642613221 ] ]
  // > Custom
  // > {
  //     vectors: [ { x: 75, y: 48, id: 'D' } ],
  //     distances: [ -0.999839132267399 ]
  //   }

Clearly the tree returns a vector with a slightly larger distance (0.998 vs 0.999).

A question about Google Map demo

When I download the code about "Google Map" demo and run, there is question that

Refused to execute script from 'http://localhost:63342/kd-tree-javascript-master/examples/map/data.json' because its MIME type ('application/json') is not executable, and strict MIME type checking is enabled.

I guess that the file "data.json" is not strictly a json file. So I recommand we rename the file with 'data.js', and this can solve the problem.

bug!bug!bug!

const points = [{ 'y': 2, 'x': 110 }, { 'y': 2, 'x': 222 }, { 'y': 2, 'x': 333 }, { 'y': 2, 'x': 444 }, { 'y': 2, 'x': 555 }, { 'y': 2, 'x': 666 }, { 'y': 2, 'x': 777 }, { 'y': 2, 'x': 888 }, { 'y': 2, 'x': 999 }, { 'y': 2, 'x': 1110 }]

R = 1000

            const tree = new kdTree([...points], distance, ["x", "z"]);

            const obj = {};
            
                const nearest = tree.nearest(points[0], 10, R);

// const nearest = tree.nearest(point[points.length-1],10,R)

In a line points , the first point and the last point , get the error data

Not choosing random node if multiple nodes have same distance

I originally posted this issue to knn, but I realised that the issue was due to a limitation here (#19).

If there are 2 nearest nodes with the exact same distance to the test node, the returned node will always be the last one.
Example:

var points = [
    { x: 1, y: 2, label: 'A' },
    { x: 3, y: 4, label: 'B' },
    { x: 5, y: 6, label: 'C' },
    { x: 5, y: 6, label: 'E' },
    { x: 7, y: 8, label: 'D' },
];

var distance = function (a, b) {
    return Math.pow(a.x - b.x, 2) + Math.pow(a.y - b.y, 2);
}

var tree = new kdTree(points, distance, ["x", "y"]);

var nearest = tree.nearest({ x: 5, y: 5 }, 1);

console.log(nearest);
// > [ [ { x: 5, y: 6, label: 'E' }, 1 ] ]

Both node E and node C have the same distance from the test node (distance=1), yet node E is always returned no matter what.

Expected behaviour:
If multiple nodes could be chosen as the nearest node and maxNodes is set to 1, the returned node should be chosen at random.

Load from existing tree

Functionality to recreate a KDTree index from raw Array Buffer data
Like this: KDTree.from(data)

A question about dimension parameter

I have items like that:

[{
   someField1:"someValue",
   someField2:"someValue",
   coordinate: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, ...]  //more than 100 dimensions
}, 
....
]

What I have to set in dimension parameter for correct work?

Readme has the MIT license, the package.json license field says "UNLICENSED"

It's not clear whether this project is meant to be dual licensed under both the MIT license and the Unlicense, or if there was a change at some point and nobody updated the package.json file.

Publish to NPM

Is it possible to publish this library to NPM?

Tttt

Bugs with quickly removing and inserting elements.

Hello and thank you for creating this library. I am creating a game that uses this library to store all enemy locations for fast nearest neighbor search. Each enemies location changes about once a second and so I remove them and insert them back into the graph to update (this may not be the best way to do it). The following coffeescript demonstrates some errors that are occurring.

enemyCount: () ->
  countR = (n) ->
    return 0 if n is null
    return 1 + countR(n.left) + countR(n.right)
  countR(@enemyKdTree.root)

#using axial coord hex grid distance function
distanceFun = (a,b) -> Hex::distanceTo.call a, b
@enemyKdTree = new kdTree [], distanceFun, ['q', 'r']

#create 
foo = []
for i in [0..200] by 1
  foo[i] = { q: Math.randInt(0,30), r: Math.randInt(0,30) }
  @enemyKdTree.insert foo[i]

for i in [0..200] by 1
  @enemyKdTree.remove foo[i]
  #update fields leaving obj ref the same
  foo[i].q = Math.randInt(0,30)
  foo[i].r = Math.randInt(0,30)
  @enemyKdTree.insert foo[i]

console.log enemyCount()
#This should remain 200 butgets number that are super off and different each time!
# First few runs = [285, 281, 290, 235]

The error rate being proportional to scale and being non deterministic makes it feel like a concurrency issue. Any thoughts? I'll try going through the code also.

Thanks

Add functions to insert and remove elements after the tree has been created

I'm looking to us a KD-tree for spatial clustering of markers on Node.js. For this the KD-tree would need to support inserting and removing elements after it has been created because it needs to be uopdated when markers are either removed from or inserted to the database.

Thanks, Rob

.

Nearest function not operating as intended

I am trying to use this library to solve the all-nearest-neighbors problem in a function I am using, but I am receiving unexpected output from the library. I have attached a codepen showing my process and the output of the codepen is wrong unless I am using this library incorrectly.

Thank you!

Add TS Support

Adding TS Typing adds many benefits such as improved IDE intellisense and general TS functionality

Proposed approach

Migrate the current project to use a basic implementation of TypeScript
Remove UMD, add UMD as a TS build target.
Use TS/ES6 class for the kdTree class
Use a class-level generic type to specify the return type for method calls. - this should not affect JS execution, but will provide a useful failure to typescript users.

I will PR for this, but I may publish an NPM package if I don't hear back within a month.

The remove method has errors (not really removing some nodes)

After having removed some nodes, they can still be found in the kdTree in some situations.

Here is an example of the remove method errors:

https://gist.github.com/santoxi/ba8d283f9a21523fa45d23d8e9c3046e

as a temporary work-around, I added a fourth argument to "nearest" to pass a predicate to filter the matched nodes, which might be also useful to add:

    this.nearest = function (point, maxNodes, maxDistance, predicate) {
			predicate = predicate || (x => true);
....
        linearDistance = metric(linearPoint, node.obj);

        if (node.right === null && node.left === null) {
          if (predicate(node.obj) && (bestNodes.size() < maxNodes || ownDistance < bestNodes.peek()[1])) {
            saveNode(node, ownDistance);
          }
          return;
        }
....
        nearestSearch(bestChild);

        if (predicate(node.obj) && (bestNodes.size() < maxNodes || ownDistance < bestNodes.peek()[1])) {
          saveNode(node, ownDistance);
        }

bug!bug!bug!

R = 1000

            const tree = new kdTree([...points], distance, ["x", "z"]);

            const obj = {};
            
                const nearest = tree.nearest(points[0], 10, R);

// const nearest = tree.nearest(point[points.length-1],10,R)

In a line points , the first point and the last point , get the error data

when building kd-tree, sort break the binding between indces and points

I think I'm removing nodes too quickly

I've run into a problem where I do something like:

for node in nodesNotInTree {
   var closestNode = tree.nearest(node, 1)[0][0];
   tree.remove(closestNode);
}

This is an attempt to match every node in nodesNotInTree to a unique node in tree. However, for some reason, this method of removal isn't working properly. Instead, closestNode is the closest node from the original tree, instead what it should be: of the closest node in the tree that hasn't yet been accessed.

Hope that explained my problem clearly. Am I doing things in the wrong order?

var kdtree = require('./kdtree'); // an npm module would be awesome ;-)
var Benchmark = require('benchmark');

var points = [
  {x: 1, y: 2},
  {x: 3, y: 4},
  {x: 5, y: 6},
  {x: 7, y: 8}
];

var distance = function(a, b){
  return Math.pow(a.x - b.x, 2) +  Math.pow(a.y - b.y, 2);
}
var tree = new kdTree(points, distance, ["x", "y"]);

var suite = new Benchmark.Suite();
suite.add('Tree#nearest', function() {
    tree.nearest({ x: 5, y: 5 }, 2);
})
.on('cycle', function(event) {
    console.log(String(event.target));
})
.run({ 'async': true });

On node 0.10.24 (tested on 0.10.39 with similar results)

Tree#nearest x 321,548 ops/sec ±1.68% (92 runs sampled)

On node 0.12.6 (tested on 0.12.0 with similar results)

Tree#nearest x 82,008 ops/sec ±3.13% (75 runs sampled)

Any idea on the matter?