Git Product home page Git Product logo

megaman's People

Contributors

harryahh avatar jakevdp avatar jerryzcn avatar jmcq89 avatar mmp2 avatar ohkhan avatar yuchaz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

megaman's Issues

Cyflann - ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

Hi there!
I am trying to use Spectral embedding "predict" function. I have some training data I want to use to create embedding and some testing data I would like to project onto this new embedding.

radius = 0.5 adjacency_method = 'cyflann' adjacency_kwds = {'radius':radius} affinity_method = 'gaussian' affinity_kwds = {'radius':radius} laplacian_method = 'symmetricnormalized' laplacian_kwds = {'scaling_epps':radius} color = 'b'

geom = Geometry(adjacency_kwds=adjacency_kwds, affinity_kwds=affinity_kwds) dr_technique = SpectralEmbedding(n_components=target_dim, eigen_solver='auto',geom=geom, drop_first=False) # use 3 for spectral dr_technique.fit(data) final_array = dr_technique.predict(data)
Produces:

File "/usr/local/lib/python2.7/dist-packages/megaman/embedding/spectral_embedding.py", line 428, in predict X_test,adjacency_kwds) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/complete_adjacency_matrix.py", line 12, in complete_adjacency_matrix train_index = Cyflann.build_index(Xtrain) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/adjacency.py", line 113, in build_index return self._get_built_index(X) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/adjacency.py", line 105, in _get_built_index **(self.cyflann_kwds or {})) File "index.pyx", line 18, in megaman.geometry.cyflann.index.Index.__cinit__ ValueError: Buffer dtype mismatch, expected 'double' but got 'float'

What am I doing wrong?

Can projecting test data on already created embedding be done using other techniques in the megaman package like LLE, LTSA or Isomap?

Many Thanks!

different results with the same input

Hello,
I run file example.py in folder examples.
With the same input, e.g

X=
[[ 0.44399868  1.58345008 -0.10397256]
 [ 0.89724097  1.05778984 -1.44154121]
 [ 0.8240493   1.13608912 -0.43348191]
 [ 0.41051068  1.85119328 -0.08814421]
 [-0.65903619  0.14207212  0.24788877]
 [ 0.98089687  0.1742586  -0.80547154]
 [-0.55488662  0.04043679  0.16807402]
 [-0.5233528   1.66523969 -1.8521161 ]
 [-0.94192794  1.5563135  -1.33581507]
 [-0.89054316  1.7400243   0.54510124]]

I run twice, and there are 2 different resultsresults after the function spectral.fit_transform(X).
for example
Y1=

[[ 0.2921505   0.1212865 ]
 [ 0.27392757  0.02372519]
 [ 0.5506981   0.22735989]
 [-0.05899231 -0.14670033]
 [-0.48160085 -0.21859889]
 [ 0.03642044  0.07639132]
 [ 0.46814198 -0.15150964]
 [ 0.23721618 -0.6066226 ]
 [ 0.15258655 -0.33629939]
 [ 0.02977191 -0.5948518 ]]

Y2=

[[ 0.22423667 -0.37421422]
 [ 0.25427333  0.42450319]
 [ 0.27039709  0.55471119]
 [ 0.07265556 -0.44721124]
 [-0.36659607  0.1178414 ]
 [-0.1421814  -0.14854459]
 [ 0.445397    0.05315703]
 [ 0.60791817 -0.29251698]
 [ 0.14910215  0.21826375]
 [ 0.24877077  0.00868079]]

even I set the seed as below:

import random
random.seed(1)
spectral = SpectralEmbedding(n_components=n_components, eigen_solver='arpack',
							 geom=geom)
embed_spectral = spectral.fit_transform(X)
print(X)
print(embed_spectral)

How do I use megaman so that I will have the same output with one input for different run times?
Thank you

relative imports

We should try to use relative imports within the package; e.g. in isomap.py use

from ..utils.eigendecomp import eigen_decomposition

rather than

from Mmani.utils.eigendecomp import eigen_decomposition

The latter can bring up weird issues at times.

No operator available that can perform this conversion

When trying to compile on Windows with MSVC I get the following error:

megaman\geometry\cyflann\index.cxx(2973): error C2664: 
'int CyflannIndex::knnSearch(const std::vector<float,std::allocator<float>> &,std::vector<std::vector<int,std::allocator<int>>,std::allocator<std::vector<int,std::allocator<int>>>>,std::vector<std::vector<float,std::allocator<float>>,std::allocator<std::vector<float,std::allocator<float>>>>,int,int,int)':
cannot convert argument 2 from
std::vector<std::vector<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t,std::allocator<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t>>,std::allocator<std::vector<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t,std::allocator<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t>>>>' 
to 'std::vector<std::vector<int,std::allocator<int>>,std::allocator<std::vector<int,std::allocator<int>>>>'
megaman\geometry\cyflann\index.cxx(2973): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
\path-to-megaman\megaman-master\megaman\geometry\cyflann\cyflann_index.h(32): note: see declaration of 'CyflannIndex::knnSearch'

Looks like a type-conversion issue with Cython, not sure whether it's just my box?

errors when running "make test"

Dear all,
I installed megaman, and it can be appeared by "conda list"
megaman 0.3.dev0
megaman 0.2 np111py27_1 conda-forge

However, I use "make test" to check, the error as below displayed:
Anyone could help me? Thank you.

make test
mkdir -p /tmp/megaman
python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/anaconda2
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler
cd /tmp/megaman && nosetests megaman
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
EEEEEEE.E........
======================================================================
ERROR: Failure: ImportError (numpy.core.multiarray failed to import)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/datasets/__init__.py", line 1, in <module>
    from .datasets import (get_megaman_image, generate_megaman_data,
  File "/anaconda2/lib/python2.7/site-packages/megaman/datasets/datasets.py", line 8, in <module>
    from sklearn.utils import check_random_state
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 57, in <module>
    from .base import clone
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/base.py", line 12, in <module>
    from .utils.fixes import signature
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/utils/__init__.py", line 10, in <module>
    from .murmurhash import murmurhash3_32
ImportError: numpy.core.multiarray failed to import

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/__init__.py", line 7, in <module>
    from .locally_linear import LocallyLinearEmbedding
  File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/locally_linear.py", line 17, in <module>
    from ..embedding.base import BaseEmbedding
  File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/base.py", line 9, in <module>
    from sklearn.base import BaseEstimator, TransformerMixin
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
    from .geometry import Geometry
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
    from .adjacency import compute_adjacency_matrix
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
    from sklearn import neighbors
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/relaxation/__init__.py", line 3, in <module>
    from .riemannian_relaxation import *
  File "/anaconda2/lib/python2.7/site-packages/megaman/relaxation/riemannian_relaxation.py", line 9, in <module>
    from megaman.geometry import RiemannMetric
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
    from .geometry import Geometry
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
    from .adjacency import compute_adjacency_matrix
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
    from sklearn import neighbors
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_analyze_dimension_and_radius.py", line 4, in <module>
    import megaman.utils.analyze_dimension_and_radius as adar
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/analyze_dimension_and_radius.py", line 16, in <module>
    from megaman.geometry.adjacency import compute_adjacency_matrix
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
    from .geometry import Geometry
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
    from .adjacency import compute_adjacency_matrix
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
    from sklearn import neighbors
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_eigendecomp.py", line 3, in <module>
    from megaman.utils.eigendecomp import (eigen_decomposition, null_space,
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/eigendecomp.py", line 8, in <module>
    from sklearn.utils.validation import check_random_state
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_estimate_radius.py", line 4, in <module>
    from megaman.utils.estimate_radius import run_estimate_radius
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/estimate_radius.py", line 3, in <module>
    from megaman.geometry.rmetric import riemann_metric_lazy
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
    from .geometry import Geometry
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
    from .adjacency import compute_adjacency_matrix
  File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
    from sklearn import neighbors
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_spectral_clustering.py", line 1, in <module>
    from sklearn import neighbors
  File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
    from . import __check_build
ImportError: cannot import name __check_build

----------------------------------------------------------------------
Ran 17 tests in 0.084s

FAILED (errors=8)
make: *** [test] Error 1

Megaman benchmarks

I'm looking at applying megaman to cluster on a large set of sentence embeddings, but I was hoping for more detail than an ordinal ranking of the algorithms. Is there a source that has benchmarked times on the different megaman algorithms?

Thanks,

Andrew

Manifold dimension in rmetric computation

In the riemann_metric() method, n_dim is used to iterate over embedding dimensions (line 62-65). However, n_dim, when inherited from mdimG through the get_dual_rmetric() method, refers to the manifold dimension. Therefore, I think that iteration over embedding dimensions in metric computation should be done using mdimY and mdimG should be used to select a subset of singular values.

example workflow

Dear Mega-team,
Great job with the code. Might you be able to help with some conceptual difficulties that I'm having.

I'd like to take a data set of size (rows=250M, features=5) and perform SpectralEmbedding into 2 or 3 dimensions. I'm finding very long computation times. Does it make more sense to perform the fit_transform() method on a sub set of data, and then apply this mapping to all the data in the sample? If so, I can't figure out how to do this from the documentation.

I'm following the bare-bones example I find here (recreating the mega-man image), and am quite new to many of the concepts that mega-man has to offer.

I see a fit_transform() method, but nothing like a sk-learn transform() or prediction()

Thanks,

Ben

Geometry

The handling of the Geometry object within the embeddings needs some work & some consistency. I've done a bit of tweaking in 3fcc3d8, but there are still problems. In particular:

  • we need a way to instantiate the estimator without passing the data.
  • we should be able to pass a dictionary of geometry properties rather than a geometry object
  • each embedding algorithm seems to treat missing geometries differently. Some call fit_geometry, some use a manual default, and some fail (I've fixed some of this in 3fcc3d8)

Conda package: broken or missing pyflann dependency

Pyflann doesn't seem to be installed properly on MacOS after installing megaman.

Steps to recreate:

  1. Create a conda python 3.5 environment.
  2. Run conda install megaman --channel=conda-forge.
  3. Open python and run the command from megaman.geometry import Geometry.

On my computer, this fails with the following error message:

In [4]: from megaman.geometry import Geometry
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-4-3d5e7524d7bf> in <module>()
----> 1 from megaman.geometry import Geometry

/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/__init__.py in <module>()
      2
      3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
      5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
      6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods

/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/geometry.py in <module>()
     34 from scipy import sparse
     35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
     37 from .affinity import compute_affinity_matrix
     38 from .laplacian import compute_laplacian_matrix

/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/adjacency.py in <module>()
      5 from scipy import sparse
      6
----> 7 from .cyflann.index import Index as CyIndex
      8 from .utils import RegisterSubclasses
      9

ImportError: dlopen(/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
  Referenced from: /usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so
  Reason: image not found

Installing pyflann using conda install --channel=jakevdp pyflann resolves this.

  • Amit Moscovich

Make test failed with ERROR: Failure: ImportError (No module named _check_build

Hi there, I am having trouble installing the package either from conda or source following the instructions. I am using macosx-10.7-x86_64 with Python 2.7. Also failed on Python 3.5 and 3.6.

(manifold_env) /tmp/megaman(master) $make test
mkdir -p /tmp/megaman
python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/Users/fche0019/miniconda3/envs/manifold_env
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler
cd /tmp/megaman && nosetests megaman
E

ERROR: Failure: ImportError (No module named _check_build


Contents of /private/tmp/megaman/megaman/__check_build:
init.py setup.pyc _check_build.c
setup.py init.pyc _check_build.pyx


It seems that megaman has not been built correctly.

If you have installed megaman from source, please do not forget
to build the package before using it: run python setup.py install
in the source directory.

If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.)

Traceback (most recent call last):
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/private/tmp/megaman/megaman/init.py", line 7, in
from . import __check_build
File "/private/tmp/megaman/megaman/__check_build/init.py", line 56, in
raise_build_error(e)
File "/private/tmp/megaman/megaman/__check_build/init.py", line 51, in raise_build_error
msg=msg))
ImportError: No module named _check_build


Contents of /private/tmp/megaman/megaman/__check_build:
init.py setup.pyc _check_build.c
setup.py init.pyc _check_build.pyx


It seems that megaman has not been built correctly.

If you have installed megaman from source, please do not forget
to build the package before using it: run python setup.py install
in the source directory.

If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.


Ran 1 test in 0.001s

FAILED (errors=1)
make: *** [test] Error 1

Example for performing Riemannian relaxation?

Hello,

If you get the chance, could you please provide an example to explain the usage of the function run_riemannian_relaxation()? I've utilized the code as follows:

rr = run_riemannian_relaxation(laplacian_matrix, embed_tsne, 2, dict(niter=250, verbose=True))
rr.relax_isometry()

But am getting the following output and error:

Making Lk and nbhds
Iteration number: 0
Last step size eta: 0.0
current loss (before gradient step): 21.929018683354258
minimum loss: 21.929018683354258, at iteration: 0

Traceback (most recent call last):
File "", line 142, in
rr.relax_isometry()
File "/megaman/relaxation/riemannian_relaxation.py", line 83, in relax_isometry
self.trace_var.update(ii,self.H,self.Y,self.eta,self.loss)
File "/megaman/relaxation/trace_variable.py", line 60, in update
self.H[iiter] = H
IndexError: index 1 is out of bounds for axis 0 with size 1

Any guidance for performing the Riemannian relaxation would be appreciated. I've run your other example cases without problems. Thanks!

All the best,
Julienne

errors in OSX and Ubuntu, when `from megaman.embedding import SpectralEmbedding`

errors in OSX, when from megaman.embedding import SpectralEmbedding

the errors are following:

ImportError Traceback (most recent call last)
in ()
----> 1 from megaman.embedding import SpectralEmbedding

/Users/myusernname/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/init.py in ()
5 # LICENSE: Simplified BSD https://github.com/mmp2/megaman/blob/master/LICENSE
6
----> 7 from .locally_linear import LocallyLinearEmbedding
8 from .isomap import Isomap
9 from .ltsa import LTSA

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/locally_linear.py in ()
15 from scipy.linalg import eigh, svd, qr, solve
16 from scipy.sparse import eye, csr_matrix
---> 17 from ..embedding.base import BaseEmbedding
18 from ..utils.validation import check_array, check_random_state
19 from ..utils.eigendecomp import null_space, check_eigen_solver

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/base.py in ()
10 from sklearn.utils.validation import check_array
11
---> 12 from ..geometry.geometry import Geometry
13
14 # from sklearn.utils.validation import FLOAT_DTYPES

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/init.py in ()
2
3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/geometry.py in ()
34 from scipy import sparse
35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
37 from .affinity import compute_affinity_matrix
38 from .laplacian import compute_laplacian_matrix

/Users/mysername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/adjacency.py in ()
5 from scipy import sparse
6
----> 7 from .cyflann.index import Index as CyIndex
8 from .utils import RegisterSubclasses
9

ImportError: dlopen(/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
Referenced from: /Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so
Reason: image not found

Any easy way to apply learned mapping on different set of points?

Thanks for a great library!

I wonder if there is an easy way to perform transform() only? I have sets of points A and B_t for t in [0..T] and I want to plot how B changes over time, so I was wondering if I could train an embedding on A and see B_t projections onto A's coordinates.

Thank you!

rmetric test failed on my machine

rmetric did not pass the test

FAIL: Loads the results from a matlab run and checks that our results

Traceback (most recent call last):
File "/home/jerry/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/jerry/anaconda/lib/python2.7/site-packages/megaman/geometry/tests/test_rmetric.py", line 72, in test_equal_original
assert_allclose( Gtest, G, tol)
File "/home/jerry/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 1391, in assert_allclose
verbose=verbose, header=header)
File "/home/jerry/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 733, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1.04681e+11, atol=0

(mismatch 0.25%)
x: array([[[ 3.104310e+15, 2.125112e+01],
[ 2.119583e+01, 5.400388e+01]],
...
y: array([[[ 4.421211e+15, 4.985776e+01],
[ 5.055786e+01, 5.400388e+01]],
...
-------------------- >> begin captured stdout << ---------------------
('phi.shape = ', (200, 2))
('G.shape = ', (200, 2, 2))
('H.shape = ', (200, 2, 2))
('L.shape = ', (200, 200))

--------------------- >> end captured stdout << ----------------------


Ran 140 tests in 7.220s

FAILED (failures=1)
make: *** [test] Error 1

errors in OSX and Ubuntu when `from megaman.geometry import Geometry`

errors in OSX when from megaman.geometry import Geometry

the errors are following:

`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in ()
----> 1 from megaman.geometry import Geometry

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/init.py in ()
2
3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/geometry.py in ()
34 from scipy import sparse
35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
37 from .affinity import compute_affinity_matrix
38 from .laplacian import compute_laplacian_matrix

/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/adjacency.py in ()
5 from scipy import sparse
6
----> 7 from .cyflann.index import Index as CyIndex
8 from .utils import RegisterSubclasses
9

ImportError: dlopen(/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
Referenced from: /Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so
Reason: image not found
`

TODO Future projects

Write to @jmcq89 or @mmp2 if you would like to contribute
This list is only very slightly prioritized (i.e we think the first two tasks are the most important currently). There are almost no dependencies between tasks, so any task can be undertaken at any time.

  • lazy R-metric evaluation (small to moderate) (Xiao Wang, UW)
  • selecting the neighborhood radius (moderate to large) (@jmcq89)
    Requires: notions of high dimensional geometry, local weighted PCA, writing some visualization tools
    Resources: matlab code already written, can be "directly" ported to python (Potential for a publication too)
  • directed graph embedding (moderate or possibly small, if no visualization is included)
    This is a fun project, especially if you add the specific visualization tools that would make the results shine. There exists a matlab implementation and a published paper. All the tools needed are already in megaman.
  • Riemannian relaxation (moderate) Matlab code exists @yuchaz
  • Principal curves and surfaces after Ozertem and Erdogmus JMLR (moderate if no scalability required, large otherwise) Matlab code exists
  • applications of manifold learning to various data sets and problems (small, moderate or large). Below is a sample list.
  • spectra of galaxies
  • representations obtained by deep neural networks
  • musical recordings
  • brain activity recordings
  • hand movement data (possibly other robotic data)
  • outputs of MCMC runs
  • BYOD
  • applications related to the other tasks below, e.g GP regression, directed embedding, spectral clustering of networks
  • Nystrom extension embedding new points into the existing coordinate system (moderate) (Xiao Wang)
    Requires - linear algebra, some reading
  • dimension estimation (moderate to large)
    This is more than an implementation task, although just implementing existing methods is a possibility. Best done in conjunction with reading research papers. High potential for resulting in a publication.
  • manifold represented by *patches *(large, probably a research project)
  • implement distance and area computations (moderate)
    Shortest path distances on the graph, corrected by the Rmetric. Some matlab code exists. Some independence and experimentation required as there are subtle aspects to this shortest path problem.
    Area computation would be nice but is secondary, could be a separate project.
  • implement gaussian process regression on a manifold (large)
    Matlab code exists. Good understanding of math and computational linear algebra necessary. Also some basics of machine learning, e.g semi-supervised learning; these could be acquired.
    To investigate if one can use existing GP packages (george) or implement from scratch (using computational linear algebra tools)
  • spectral clustering for millions of points (moderate) (Xiao Wang)
    Requires: using k-means (from sklearn), some understanding of spectral clustering (there are tutorials), and of k-means. (Possible extension, not done yet: build a small library of similarity functions.)
  • k-means initializations K-log K initialization, kmeans++ (Hui Pang)
  • Visualization tools (some are related to various tasks above) - small if otherwise noted
  1. covar_plotter3 a 3D covar_plotter to display the R-metric with 3D embeddings
  2. locally isometric visualization (rescale the data so that R-metric is identity at one fixed point, display it)
  3. display a vector field on a manifold
  4. display a point cloud without outliers

Refactoring

Ping @jmcq89

I've been going through the package and adding some tests of corner-cases.

There are a lot of commonalities between the four embedding classes, and a lot of code that is repeated. By writing common unit tests for these, I found a number of inconsistencies with how they handle input arguments.

I think this is a case where building the four algorithms from a common base class would be very helpful, both in keeping the code organized & terse, and in making sure all estimators are behaving as expected.

Initialization for k_means_clustering

Currently only one initialization method provided orthogonal_initialization(data,K); this initialization is appropriate for spectral clustering.

We need to provide the power_initialization as default method and a custom center initialization. Low priority, a custom labeling initialization.

When the manifold dim is lower than embedding dim

I found a couple bugs associated with the embedding dimension being greater than the manifold dimension, basically having to do with truncating the remaining singular values.

  1. compute_G_from_H should be passing the manifold dimension as an argument (looks like its set up to do this but not doing it currently)

  2. In compute_G_from_H (the 'elif mdimG < n_dim:' part) there are undefined variables and so forth. I don't think this has been tested. As is, it won't work for mdimG < ndim_Y.

perfomance improvements for ltsa

Here is a piece of megaman.embeddings.ltsa.ltsa() that gives 10x improvement in speed for the first stage of ltsa (before null_sapce call); in my case running time of that part dropped from 2 hours to 10 minutes (600k points, 10-dimentional space). The amg solver runs fast anyway ~3min.

Improvements are 1) instead of searching for rows==i and then indexing cols at every iteration, we exploit the fact that rows is initially sorted as [0 0 0 1 1 2 2 2 3 3 .. ] and pre-split it; 2) updates for lil_matrix are faster. Surprisingly, those small fixes give serious improvements in run time.

Probably, if we could know in advance that one uses n_neighbours not radius, it could be done even slightly faster by pre-allocating np.ones((1, len(neighbors_i)), but that sounds hacky.

    if eigen_solver != 'dense':
        M = sparse.lil_matrix((N, N))
    else:
        M = np.zeros((N, N))
    split_map = np.where(np.diff(rows))[0]+1
    cols_i = np.split(cols, split_map)
    for i in tqdm.trange(N):
        neighbors_i = cols_i[i]
        n_neighbors_i = len(neighbors_i)
        use_svd = (n_neighbors_i > d_in)
        Xi = geom.X[neighbors_i]
        Xi -= Xi.mean(0)
        # compute n_components largest eigenvalues of Xi * Xi^T
        if use_svd:
            v = svd(Xi, full_matrices=True)[0]
        else:
            Ci = np.dot(Xi, Xi.T)
            v = eigh(Ci)[1][:, ::-1]
        Gi = np.zeros((n_neighbors_i, n_components + 1))
        Gi[:, 1:] = v[:, :n_components]
        Gi[:, 0] = 1. / np.sqrt(n_neighbors_i)
        GiGiT = np.dot(Gi, Gi.T)
        nbrs_x, nbrs_y = np.meshgrid(neighbors_i, neighbors_i)
        M[nbrs_x, nbrs_y] -= GiGiT
        M[neighbors_i, neighbors_i] += np.ones((1, len(neighbors_i))
    return null_space(M.tocsr(), n_components, k_skip=1, eigen_solver=eigen_solver,
                      random_state=random_state,solver_kwds=solver_kwds)

I am not sure how to run set of tests for ltsa, so I am reporting it here instead of pull requesting.

Clean up and documentation

@jakevdp @mmp2

  • in SpectralEmbedding the citations are out of date -- to update
  • example.py, dev_example.py, /benchmarkes/ & /astrodemo/ should be removed.
  • turn example.py into an iPython tutorial that we can host on the website. Perhaps we can do this as we teach the API to Grace?
  • The example usage & non-auto doc documentation for: Geometry, Adjacency, Laplacian, Affinity & all the embeddings need to be fixed with respect to new API changes.
  • The sphinx build needs to be re-run when API changes are made.
  • Add some example plots to the website

how to choose parameters for megaman

Hi all,
I am doing some experiments on megaman, and I am wondering how to select parameter for megaman (the dimension to reduce to (n_components), and the radius).
I have datasets including Dataset A: 300 samples, and dim=1.5 millions, while another one Dataset B is about 500 samples and dim=800. What dimension do should I scale them using megaman for each situation? For example, I reduce A to new dimension d1 (n_components): d1=100 or d1=2,... how to select good numbers for d1?
For the radius, how to know which radius is good?
Thank you.

Inherit from ``sklearn.base``

To make this maximally compatible with scikit-learn, we should have the estimators inherit from sklearn.base, and then run sklearn's check_estimator on each as part of the test suite.

The only thing to keep in mind is that sklearn estimators can't do anything in __init__ except store the arguments by name (this is to enable cloning the estimator for grid search & parallelization). It seems like we're already in good shape as far as that requirement.

More generalized ``distance_matrix()``?

Reading through the code, it seems we could benefit greatly from a more generalized distance_matrix() function, with this functionality passed along to each estimator. I'd imagine a call signature something like this:

def distance_matrix(X, method='auto', type='radius', **kwargs):
    ...
  • If method is pyflann, then the user can either pass a pre-built flann index as a keyword argument, or leave it out and have it generated automatically
  • Simiar for method=cyflann
  • if type is radius then the user should pass a radius as a keyword argument
  • if type is knn then the user should pass an integer number of neighbors as a keyword argument.

flann/flann.hpp: No such file or directory

When trying to install megaman and running python setup.py intsall, I get the following fatal error:

In file included from megaman/geometry/cyflann/index.cxx:620:0:
megaman/geometry/cyflann/cyflann_index.h:8:10: fatal error: flann/flann.hpp: No such file or directory
#include <flann/flann.hpp>
^~~~~~~~~~~~~~~~~
compilation terminated.

How could I fix it?

Python 3 compatibility

I'm afraid as soon as you use adjacency_method = 'cyflann' I get the following error. It's a cython issue as far as I understand. Any thoughts?

I'm using Anaconda python 3.5.2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-3db59b70d454> in <module>()
     16 geom.set_data_matrix(X)
     17 t0 = time.time()
---> 18 adjacency_matrix = geom.compute_adjacency_matrix()
     19 t1 = time.time() - t0
     20 print(t1)

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/geometry.py in compute_adjacency_matrix(self, copy, **kwargs)
    176         self.adjacency_matrix = compute_adjacency_matrix(self.X,
    177                                                          self.adjacency_method,
--> 178                                                          **kwds)
    179         if copy:
    180             return self.adjacency_matrix.copy()

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in compute_adjacency_matrix(X, method, **kwargs)
     22         else:
     23             method = 'kd_tree'
---> 24     return Adjacency.init(method, **kwargs).adjacency_graph(X)
     25 
     26 

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in adjacency_graph(self, X)
     45             return self.knn_adjacency(X)
     46         elif self.radius is not None:
---> 47             return self.radius_adjacency(X)
     48 
     49     def knn_adjacency(self, X):

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in radius_adjacency(self, X)
    100 
    101     def radius_adjacency(self, X):
--> 102         cyindex = self._get_built_index(X)
    103         return cyindex.radius_neighbors_graph(X, self.radius)
    104 

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in _get_built_index(self, X)
     93         if self.flann_index is None:
     94             cyindex = CyIndex(X, target_precision=self.target_precision,
---> 95                               **(self.cyflann_kwds or {}))
     96         else:
     97             cyindex = self.flann_index

megaman/geometry/cyflann/index.pyx in megaman.geometry.cyflann.index.Index.__cinit__ (megaman/geometry/cyflann/index.cxx:1802)()

/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so in string.from_py.__pyx_convert_string_from_py_std__in_string (megaman/geometry/cyflann/index.cxx:5777)()

TypeError: expected bytes, str found

cannot import megaman.geometry

I installed megaman without errors

megaman$ python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/anaconda2
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler

However, I received errors as below when call megaman:

from megaman.geometry import Geometry
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "megaman/__init__.py", line 7, in <module>
    from . import __check_build
  File "megaman/__check_build/__init__.py", line 56, in <module>
    raise_build_error(e)
  File "megaman/__check_build/__init__.py", line 51, in raise_build_error
    msg=msg))
ImportError: No module named _check_build
___________________________________________________________________________
Contents of megaman/__check_build:
__init__.py               __init__.pyc              _check_build.c
_check_build.pyx          setup.py                  setup.pyc
___________________________________________________________________________
It seems that megaman has not been built correctly.

If you have installed megaman from source, please do not forget
to build the package before using it: run `python setup.py install`
in the source directory.

It appears that you are importing a local megaman source tree.
Please either use an inplace install or try from another location.

Anyone could help me, please. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.