Git Product home page Git Product logo

dmaps's Introduction

DMAPS is a C++ powered Python library implementing the diffusion maps manifold learning algorithm. It provides fast multi-threaded calculations for distances matrices and diffusion coordinates.

Prerequisites

DMAPS includes all external dependencies in the code base (see acknowledgements). To build and install DMAPS you need the following:

  • A compiler with C++11 support
  • CMake >= 2.8.12
  • Python development libraries

Installation

Make sure CMake and Python development libraries are installed. On a Debian-based distributions, you can install them using the apt package manager.

$ sudo apt install cmake python-dev

With the prerequisites satisfied, all you need to do is clone the repository and pip install.

$ git clone https://github.com/hsidky/dmaps.git
$ pip install ./dmaps

Features

DMAPS is designed to perform nonlinear dimensionality reduction of high-dimensional data sets using the diffusion maps [1] algorithm. In particular, this implementation is geared towards the analysis of molecular trajectories as first described in Ref. [2]. Various metrics are provided to compute the distance matrix, with the ability to save and load data from disk. Both standard and locally-scaled diffusion maps can be generated. For local scaling, values of the kernel bandwidth are calculated using the scheme in Ref. [3]. It is also possible to weight the kernel for biased input data.

Calculations of the distance matrix and local scale estimates are accelerated using OpenMP multi-threading. To improve performance for large datasets, Spectra is used to compute only the top k eigenvectors requested by the user, since the desired k is usually a small number. Eigen also provides SIMD instructions for efficient linear algebra operations.

Examples

Below is an example that demonstrates basic usage of DMAPS on the classic Swiss roll dataset. For more detailed examples see the examples folder.

import dmaps
import numpy as np
import matplotlib.pyplot as plt

# Assume we have the following numpy arrays:
# coords contains the [n, 3] generated coordinates for the Swiss roll dataset.
# color contains the position of the points along the main dimension of the roll. 
dist = dmaps.DistanceMatrix(coords)
dist.compute(metric=dmaps.metrics.euclidean)

# Compute top three eigenvectors. 
# Here we assume a good value for the kernel bandwidth is known.
dmap = dmaps.DiffusionMap(dist)
dmap.set_kernel_bandwidth(3)
dmap.compute(3)

# Plot result. Scale by top eigenvector.
v = dmap.get_eigenvectors()
w = dmap.get_eigenvalues()
plt.scatter(v[:,1]/v[:,0], v[:,2]/v[:,0], c=color)
plt.xlabel('$\Psi_2$')
plt.ylabel('$\Psi_3$')

The above code produces the diffusion map below. diffswiss

That's pretty much it! Be sure to take a look in the examples folder for more sophisticated applications.

Acknowledgements

DMAPS makes use of the following open source libraries:

License

DMAPS is provided under an MIT license that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license.

References

[1] Coifman, R. R., & Lafon, S. (2006). Appl. Comput. Harmon. Anal., 21(1), 5–30.

[2] Ferguson, A. L., et al. (2010). PNAS, 107(31), 13597–602.

[3] Zheng, W., Rohrdanz, M. a, & Clementi, C. (2013). J. Phys. Chem. B, 117(42), 12769–12776.

dmaps's People

Contributors

hsidky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dmaps's Issues

Question on re-weighting biased data

Hi, in the README.md, it's mentioned that the package is able to re-weight the kernel for biased input. I wonder how to use that feature since it's not mentioned in the README.

I also can't find the "example folder" that's mentioned in the README. The link to the blog that's mentioned in other issues is also down. Also, is the reweighting scheme the same as the umbrella integrated dmap paper?

How to set adequate kernel bandwith

Hello,

I'm trying to use your C++ implementation of Diffusion Maps with some of my data, which is really big (37k x 60k matrix). In readme you select the kernel bandwith as 3. How can I properly select a kernel bandwith on my data?

Best!
Davi

Basic example: need some clearing explanations

Hello and thank you for providing access to your library!
I have been playing around with python version of the library, trying to figure out how it works and repeating the basic example; It seems to be that I am doing something wrong, but I cannnot figure out what as the resulting diffusion map is definately not right. I will be gratefull if you could explain where I am mistaken.
Thank you!

Here is the code:

import dmaps
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

length_phi = 12
length_Z = 12
sigma = 0.1
m = 10000

phi = length_phi * np.random.rand(m)
xi = np.random.rand(m)
Z = length_Z * np.random.rand(m)
X = 1./6 * (phi + sigmaxi) * np.sin(phi)
Y = 1./6 * (phi + sigma
xi) * np.cos(phi)

swiss_roll = np.array([X, Y, Z]).transpose()
print(swiss_roll.shape)

dist = dmaps.DistanceMatrix(swiss_roll)
dist.compute(metric=dmaps.metrics.euclidean)
dist.save('distMetr.jpeg')

diffMap = dmaps.DiffusionMap(dist)
diffMap.set_kernel_bandwidth(3)
diffMap.compute(3)

v = diffMap.get_eigenvectors()
w = diffMap.get_eigenvalues()

plt.rcParams["figure.figsize"] = (8, 12)
fig = plt.figure()
Axes3D
ax = fig.add_subplot(211, projection='3d')
ax.scatter(swiss_roll[:, 0], swiss_roll[:, 1], swiss_roll[:, 2], c=swiss_roll[:, 1], cmap=plt.cm.get_cmap("Spectral"))
ax.set_title("Original data")

ax = fig.add_subplot(212)
arr0 = ax.scatter(v[:, 1]/v[:, 0], v[:, 2]/v[:, 0], c=swiss_roll[:, 1], cmap=plt.cm.get_cmap("Spectral"))
plt.xlabel('$\Psi_2$')
plt.ylabel('$\Psi_3$')
plt.title('Projected data')
plt.show()

result1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.