Git Product home page Git Product logo

rann's Introduction

RANN

Release Version CRAN_Status_Badge Build Status Downloads

Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount's ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and bd as well as kd trees.

This package implements nearest neighbors for the Euclidean (L2) metric. For the Manhattan (L1) metric, install the RANN1 package.

For further details on the underlying ANN library, see http://www.cs.umd.edu/~mount/ANN.

Installation

Released versions

The recommendation is to install the released version from CRAN by doing:

install.packages("RANN")

Bleeding Edge

You can, however, download the tar ball, and run R CMD INSTALL on it, or use the devtools package to install the development version:

# install.packages("devtools")

devtools::install_github("jefferis/RANN")

Note: Windows users need Rtools and devtools to install this way.

Feedback

Please feel free to:

Copyright and License

see inst/COPYRIGHT and DESCRIPTION files for copyright and license information.

rann's People

Contributors

jefferis avatar krlmlr avatar mattbk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rann's Issues

weighted distance

I enjoy using the package and wonder if you can add the weight feature into the NN2 function where distances are normalised/adjusted according the weights ? (meanwhile, if there is any hack I can do, I appreciate any comments)

RANN.L1 remote dependency

I'm creating a package which requires "RANN.L1" as a dependency. The package "RANN.L1" is therefore listed under Imports: in the DESCRIPTION file of the package, and the remote dependency location is specified as jefferis/RANN@master-L1 under Remotes: in the same file. However, as RANN.L1 is a branch of the RANN repo, the package build under Travis CI fails, presumably as it can't associate the "RANN.L1" import with the "RANN" remote. Any suggestions? Having the jefferis/RANN@master-L1 branch as a separate repo under jefferis/RANN.L1 could be a quick fix? Thanks.

Objects with zero distance

I am using the version in CRAN, 2.4.1. I have a dataset that can contain objects with distances of zero. When I run a nearest neighbor search for all objects within a specified radius, the indexes can become switched. For example:

nn.idx[70,] = 71 70
instead of the expected 70 71.

I'm not sure if objects with zero distance are not allowed, or if its a bug. I can supply example data if necessary.

Solaris compile error ERR

On 11 Dec 2018, at 09:00, Prof Brian Ripley [email protected] wrote:

This concerns packages

FUNLDA RANN RJafroc SGL distances mixAK projmanr yaImpute

R-devel has switched to C++11 as the default standard, and your packages failed to compile on Solaris: see
https://www.stats.ox.ac.uk/pub/bdr/Solaris-gcc/ for the installation logs. (g++ 5.2 was used: Solaris is not often
updated ....)

You were warned about ERR in 'Writing R Extensions' ยง1.6.4 : it looks like 'EBP' gives another header clash.

For 'distances', you are trying to compile in a sub-directory with C++98 yet the src directory with C++11: they cannot
portably be mixed. Use one or the other for both.

Please correct ASAP and before Jan 11 to safely retain the package on CRAN.

points to line distance

Hi, thanks for the nn2 function, it has transformed the efficiency of distance calculations for large datasets. I have a related problem, where I would like to calculate the distance from each point to the closest point on a line. I can convert the line to a set of coordinates and then run nn2, however, this is sort of cheating as it doesn't formally calculate distance to the line that connects each of those coordinates. I just wondered whether you were working on (or could be persuaded to! :)) a function to do this? At the moment dist2line from geosphere is the best option I have, but it is very slow, even if parallelized, on large datasets. I'm working on a function to first find the nearest line, then sample points along that line and run nn2, but its pretty inelegant...

Thanks again!

return the number of point within a radius

In page 10 of the manual of ANN (https://www.cs.umd.edu/~mount/ANN/Files/1.1.2/ANNmanual_1.1.pdf) . It mentioned that we can return the number of points within a radius when setting the ANNdist to sqRad and k = 0. This is very useful when we are trying to count the number of points for each sample whose distance is within the radius. Can you help to also return this value?

I think ANNkdFRPtsInRange is actually the value we want. But I don't know how can we return that value in the get_NN_2Set and nn2 function

int ANNkd_tree::annkFRSearch(
	ANNpoint			q,				// the query point
	ANNdist				sqRad,			// squared radius search bound
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor indices (returned)
	ANNdistArray		dd,				// the approximate nearest neighbor
	double				eps)			// the error bound
{
	ANNkdFRDim = dim;					// copy arguments to static equivs
	ANNkdFRQ = q;
	ANNkdFRSqRad = sqRad;
	ANNkdFRPts = pts;
	ANNkdFRPtsVisited = 0;				// initialize count of points visited
	ANNkdFRPtsInRange = 0;				// ...and points in the range

	ANNkdFRMaxErr = ANN_POW(1.0 + eps);
	ANN_FLOP(2)							// increment floating op count

	ANNkdFRPointMK = new ANNmin_k(k);	// create set for closest k points
										// search starting at the root
	root->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));

	for (int i = 0; i < k; i++) {		// extract the k-th closest points
		if (dd != NULL)
			dd[i] = ANNkdFRPointMK->ith_smallest_key(i);
		if (nn_idx != NULL)
			nn_idx[i] = ANNkdFRPointMK->ith_smallest_info(i);
	}

	delete ANNkdFRPointMK;				// deallocate closest point set
	return ANNkdFRPtsInRange;			// return final point count
}

Please ship declarations in inst/include so we can linkingTo your c++ functions

Hi @jefferis, it would be great if your c++ functions were exported by adding // [[Rcpp::interfaces(r, cpp)]] to the relevant c++ functions for using your indexing methods.

I've already used those in my projects because it uses one of the fastest implementations of indexing, but currently I'm just copying your code and using it my personal projects.

Thanks a lot!

release RANN1 to CRAN

RANN1 (by @krlmlr on top of the original RANN) supports the Manhattan (L1) norm rather than the standard Euclidean (L2) metric. It should be pretty much for release to CRAN โ€“ and I propose that he is the maintainer! Presumably the version numbers should be kept in sync as much as possible.

Consider provinding L1 metric in the RANN package

This was difficult to do, because the metric is defined via preprocessor macros. I still think we could make it work, by including the library twice in two namespaces. This would be a cleaner approach, one package less to maintain.

Support other Minkowski metrics

The ANN docs suggest that the Euclidean metric is used by default. Some digging in the code and in above docs suggest that this is also true for RANN.

According to Section 2.2.1 (p. 14 ff.), only the macros in the code linked above need to be redefined to use other Minkowski norms. To define a code base where Manhattan, Euclidean and "max" metric are simultaneously accessible, this probably needs to be replaced by a template parameter that defines the implementation of the four operators.

The purpose is to accelerate statistical matching using the Gower distance: I think I can transform a "Gower distance problem" to a "Manhattan distance problem" but then RANN should be able to solve it (it is called by the StatMatch package).

Does this make sense to you?

bd tree

the following line leaves my R process unresponsive, no segfault, no 100% cpu:

RANN::nn2(iris[,1:4], treetype = "bd")

treetype="kd" works fine.

tic::use_tic() for automatic deployment of pkgdown and testing on AppVeyor

Could you please install

remotes::install_github(c("ropenscilabs/tic", "ropenscilabs/travis"))

and run

tic::use_tic()

If asked to overwrite .travis.yml, select "yes". You may need to set up a Travis access token with travis::browse_travis_token() .

Benefits:

  1. Automatic building of pkgdown documentation
  2. We can build pkgdown for {RANN} and {RANN.L1} and host in parallel
  3. Testing on Windows

I can also do it if you increase my permissions to this repository.

I just noticed {RANN.L1} has been orphaned on CRAN, I'd like to update.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.