mmp2 / megaman Goto Github PK
View Code? Open in Web Editor NEWmegaman: Manifold Learning for Millions of Points
Home Page: http://mmp2.github.io/megaman/
License: BSD 2-Clause "Simplified" License
megaman: Manifold Learning for Millions of Points
Home Page: http://mmp2.github.io/megaman/
License: BSD 2-Clause "Simplified" License
Hi there!
I am trying to use Spectral embedding "predict" function. I have some training data I want to use to create embedding and some testing data I would like to project onto this new embedding.
radius = 0.5 adjacency_method = 'cyflann' adjacency_kwds = {'radius':radius} affinity_method = 'gaussian' affinity_kwds = {'radius':radius} laplacian_method = 'symmetricnormalized' laplacian_kwds = {'scaling_epps':radius} color = 'b'
geom = Geometry(adjacency_kwds=adjacency_kwds, affinity_kwds=affinity_kwds) dr_technique = SpectralEmbedding(n_components=target_dim, eigen_solver='auto',geom=geom, drop_first=False) # use 3 for spectral dr_technique.fit(data) final_array = dr_technique.predict(data)
Produces:
File "/usr/local/lib/python2.7/dist-packages/megaman/embedding/spectral_embedding.py", line 428, in predict X_test,adjacency_kwds) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/complete_adjacency_matrix.py", line 12, in complete_adjacency_matrix train_index = Cyflann.build_index(Xtrain) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/adjacency.py", line 113, in build_index return self._get_built_index(X) File "/usr/local/lib/python2.7/dist-packages/megaman/geometry/adjacency.py", line 105, in _get_built_index **(self.cyflann_kwds or {})) File "index.pyx", line 18, in megaman.geometry.cyflann.index.Index.__cinit__ ValueError: Buffer dtype mismatch, expected 'double' but got 'float'
What am I doing wrong?
Can projecting test data on already created embedding be done using other techniques in the megaman package like LLE, LTSA or Isomap?
Many Thanks!
I'm not sure whether this was supposed to be in the repo. Maybe @jmcq89 knows?
see #55
Fantastic library!
Would it be possible to have the modified version of the LLE implemented in sklearn?
http://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html
Hello,
I run file example.py in folder examples.
With the same input, e.g
X=
[[ 0.44399868 1.58345008 -0.10397256]
[ 0.89724097 1.05778984 -1.44154121]
[ 0.8240493 1.13608912 -0.43348191]
[ 0.41051068 1.85119328 -0.08814421]
[-0.65903619 0.14207212 0.24788877]
[ 0.98089687 0.1742586 -0.80547154]
[-0.55488662 0.04043679 0.16807402]
[-0.5233528 1.66523969 -1.8521161 ]
[-0.94192794 1.5563135 -1.33581507]
[-0.89054316 1.7400243 0.54510124]]
I run twice, and there are 2 different resultsresults after the function spectral.fit_transform(X)
.
for example
Y1=
[[ 0.2921505 0.1212865 ]
[ 0.27392757 0.02372519]
[ 0.5506981 0.22735989]
[-0.05899231 -0.14670033]
[-0.48160085 -0.21859889]
[ 0.03642044 0.07639132]
[ 0.46814198 -0.15150964]
[ 0.23721618 -0.6066226 ]
[ 0.15258655 -0.33629939]
[ 0.02977191 -0.5948518 ]]
Y2=
[[ 0.22423667 -0.37421422]
[ 0.25427333 0.42450319]
[ 0.27039709 0.55471119]
[ 0.07265556 -0.44721124]
[-0.36659607 0.1178414 ]
[-0.1421814 -0.14854459]
[ 0.445397 0.05315703]
[ 0.60791817 -0.29251698]
[ 0.14910215 0.21826375]
[ 0.24877077 0.00868079]]
even I set the seed as below:
import random
random.seed(1)
spectral = SpectralEmbedding(n_components=n_components, eigen_solver='arpack',
geom=geom)
embed_spectral = spectral.fit_transform(X)
print(X)
print(embed_spectral)
How do I use megaman so that I will have the same output with one input for different run times?
Thank you
We should try to use relative imports within the package; e.g. in isomap.py
use
from ..utils.eigendecomp import eigen_decomposition
rather than
from Mmani.utils.eigendecomp import eigen_decomposition
The latter can bring up weird issues at times.
we have info in the examples
directory, but we should organize it, add an Index.ipynb
, and link from the readme & documentation.
When trying to compile on Windows with MSVC I get the following error:
megaman\geometry\cyflann\index.cxx(2973): error C2664:
'int CyflannIndex::knnSearch(const std::vector<float,std::allocator<float>> &,std::vector<std::vector<int,std::allocator<int>>,std::allocator<std::vector<int,std::allocator<int>>>>,std::vector<std::vector<float,std::allocator<float>>,std::allocator<std::vector<float,std::allocator<float>>>>,int,int,int)':
cannot convert argument 2 from
std::vector<std::vector<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t,std::allocator<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t>>,std::allocator<std::vector<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t,std::allocator<__pyx_t_7megaman_8geometry_7cyflann_5index_dtypei_t>>>>'
to 'std::vector<std::vector<int,std::allocator<int>>,std::allocator<std::vector<int,std::allocator<int>>>>'
megaman\geometry\cyflann\index.cxx(2973): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
\path-to-megaman\megaman-master\megaman\geometry\cyflann\cyflann_index.h(32): note: see declaration of 'CyflannIndex::knnSearch'
Looks like a type-conversion issue with Cython, not sure whether it's just my box?
n_components --> embedding_dimension or just dim
adjacency_matrix -> distance_matrix
They should not replace the current names, but overload them.
Dear all,
I installed megaman, and it can be appeared by "conda list"
megaman 0.3.dev0
megaman 0.2 np111py27_1 conda-forge
However, I use "make test" to check, the error as below displayed:
Anyone could help me? Thank you.
make test
mkdir -p /tmp/megaman
python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/anaconda2
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler
cd /tmp/megaman && nosetests megaman
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
EEEEEEE.E........
======================================================================
ERROR: Failure: ImportError (numpy.core.multiarray failed to import)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/datasets/__init__.py", line 1, in <module>
from .datasets import (get_megaman_image, generate_megaman_data,
File "/anaconda2/lib/python2.7/site-packages/megaman/datasets/datasets.py", line 8, in <module>
from sklearn.utils import check_random_state
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 57, in <module>
from .base import clone
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/base.py", line 12, in <module>
from .utils.fixes import signature
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/utils/__init__.py", line 10, in <module>
from .murmurhash import murmurhash3_32
ImportError: numpy.core.multiarray failed to import
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/__init__.py", line 7, in <module>
from .locally_linear import LocallyLinearEmbedding
File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/locally_linear.py", line 17, in <module>
from ..embedding.base import BaseEmbedding
File "/anaconda2/lib/python2.7/site-packages/megaman/embedding/base.py", line 9, in <module>
from sklearn.base import BaseEstimator, TransformerMixin
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
from .geometry import Geometry
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
from .adjacency import compute_adjacency_matrix
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
from sklearn import neighbors
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/relaxation/__init__.py", line 3, in <module>
from .riemannian_relaxation import *
File "/anaconda2/lib/python2.7/site-packages/megaman/relaxation/riemannian_relaxation.py", line 9, in <module>
from megaman.geometry import RiemannMetric
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
from .geometry import Geometry
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
from .adjacency import compute_adjacency_matrix
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
from sklearn import neighbors
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_analyze_dimension_and_radius.py", line 4, in <module>
import megaman.utils.analyze_dimension_and_radius as adar
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/analyze_dimension_and_radius.py", line 16, in <module>
from megaman.geometry.adjacency import compute_adjacency_matrix
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
from .geometry import Geometry
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
from .adjacency import compute_adjacency_matrix
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
from sklearn import neighbors
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_eigendecomp.py", line 3, in <module>
from megaman.utils.eigendecomp import (eigen_decomposition, null_space,
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/eigendecomp.py", line 8, in <module>
from sklearn.utils.validation import check_random_state
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_estimate_radius.py", line 4, in <module>
from megaman.utils.estimate_radius import run_estimate_radius
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/estimate_radius.py", line 3, in <module>
from megaman.geometry.rmetric import riemann_metric_lazy
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/__init__.py", line 4, in <module>
from .geometry import Geometry
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/geometry.py", line 36, in <module>
from .adjacency import compute_adjacency_matrix
File "/anaconda2/lib/python2.7/site-packages/megaman/geometry/adjacency.py", line 4, in <module>
from sklearn import neighbors
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
======================================================================
ERROR: Failure: ImportError (cannot import name __check_build)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/anaconda2/lib/python2.7/site-packages/megaman/utils/tests/test_spectral_clustering.py", line 1, in <module>
from sklearn import neighbors
File "/Users/hainguyen/.local/lib/python2.7/site-packages/sklearn/__init__.py", line 56, in <module>
from . import __check_build
ImportError: cannot import name __check_build
----------------------------------------------------------------------
Ran 17 tests in 0.084s
FAILED (errors=8)
make: *** [test] Error 1
I'm looking at applying megaman to cluster on a large set of sentence embeddings, but I was hoping for more detail than an ordinal ranking of the algorithms. Is there a source that has benchmarked times on the different megaman algorithms?
Thanks,
Andrew
In the riemann_metric() method, n_dim is used to iterate over embedding dimensions (line 62-65). However, n_dim, when inherited from mdimG through the get_dual_rmetric() method, refers to the manifold dimension. Therefore, I think that iteration over embedding dimensions in metric computation should be done using mdimY and mdimG should be used to select a subset of singular values.
Dear Mega-team,
Great job with the code. Might you be able to help with some conceptual difficulties that I'm having.
I'd like to take a data set of size (rows=250M, features=5) and perform SpectralEmbedding into 2 or 3 dimensions. I'm finding very long computation times. Does it make more sense to perform the fit_transform()
method on a sub set of data, and then apply this mapping to all the data in the sample? If so, I can't figure out how to do this from the documentation.
I'm following the bare-bones example I find here (recreating the mega-man image), and am quite new to many of the concepts that mega-man has to offer.
I see a fit_transform()
method, but nothing like a sk-learn transform()
or prediction()
Thanks,
Ben
The handling of the Geometry
object within the embeddings needs some work & some consistency. I've done a bit of tweaking in 3fcc3d8, but there are still problems. In particular:
fit_geometry
, some use a manual default, and some fail (I've fixed some of this in 3fcc3d8)Pyflann doesn't seem to be installed properly on MacOS after installing megaman.
Steps to recreate:
conda install megaman --channel=conda-forge
.from megaman.geometry import Geometry
.On my computer, this fails with the following error message:
In [4]: from megaman.geometry import Geometry
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-4-3d5e7524d7bf> in <module>()
----> 1 from megaman.geometry import Geometry
/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/__init__.py in <module>()
2
3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods
/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/geometry.py in <module>()
34 from scipy import sparse
35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
37 from .affinity import compute_affinity_matrix
38 from .laplacian import compute_laplacian_matrix
/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/adjacency.py in <module>()
5 from scipy import sparse
6
----> 7 from .cyflann.index import Index as CyIndex
8 from .utils import RegisterSubclasses
9
ImportError: dlopen(/usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
Referenced from: /usr/local/anaconda3/envs/megaman/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so
Reason: image not found
Installing pyflann using conda install --channel=jakevdp pyflann
resolves this.
Hi there, I am having trouble installing the package either from conda or source following the instructions. I am using macosx-10.7-x86_64 with Python 2.7. Also failed on Python 3.5 and 3.6.
(manifold_env) /tmp/megaman(master) $make test
mkdir -p /tmp/megaman
python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/Users/fche0019/miniconda3/envs/manifold_env
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler
cd /tmp/megaman && nosetests megaman
E
ERROR: Failure: ImportError (No module named _check_build
Contents of /private/tmp/megaman/megaman/__check_build:
init.py setup.pyc _check_build.c
setup.py init.pyc _check_build.pyx
It seems that megaman has not been built correctly.
If you have installed megaman from source, please do not forget
to build the package before using it: run python setup.py install
in the source directory.
Traceback (most recent call last):
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/Users/fche0019/miniconda3/envs/manifold_env/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/private/tmp/megaman/megaman/init.py", line 7, in
from . import __check_build
File "/private/tmp/megaman/megaman/__check_build/init.py", line 56, in
raise_build_error(e)
File "/private/tmp/megaman/megaman/__check_build/init.py", line 51, in raise_build_error
msg=msg))
ImportError: No module named _check_build
Contents of /private/tmp/megaman/megaman/__check_build:
init.py setup.pyc _check_build.c
setup.py init.pyc _check_build.pyx
It seems that megaman has not been built correctly.
If you have installed megaman from source, please do not forget
to build the package before using it: run python setup.py install
in the source directory.
If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.
Ran 1 test in 0.001s
FAILED (errors=1)
make: *** [test] Error 1
can it be used on Windows?
Hello,
If you get the chance, could you please provide an example to explain the usage of the function run_riemannian_relaxation()? I've utilized the code as follows:
rr = run_riemannian_relaxation(laplacian_matrix, embed_tsne, 2, dict(niter=250, verbose=True))
rr.relax_isometry()
But am getting the following output and error:
Making Lk and nbhds
Iteration number: 0
Last step size eta: 0.0
current loss (before gradient step): 21.929018683354258
minimum loss: 21.929018683354258, at iteration: 0
Traceback (most recent call last):
File "", line 142, in
rr.relax_isometry()
File "/megaman/relaxation/riemannian_relaxation.py", line 83, in relax_isometry
self.trace_var.update(ii,self.H,self.Y,self.eta,self.loss)
File "/megaman/relaxation/trace_variable.py", line 60, in update
self.H[iiter] = H
IndexError: index 1 is out of bounds for axis 0 with size 1
Any guidance for performing the Riemannian relaxation would be appreciated. I've run your other example cases without problems. Thanks!
All the best,
Julienne
errors in OSX, when from megaman.embedding import SpectralEmbedding
the errors are following:
ImportError Traceback (most recent call last)
in ()
----> 1 from megaman.embedding import SpectralEmbedding
/Users/myusernname/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/init.py in ()
5 # LICENSE: Simplified BSD https://github.com/mmp2/megaman/blob/master/LICENSE
6
----> 7 from .locally_linear import LocallyLinearEmbedding
8 from .isomap import Isomap
9 from .ltsa import LTSA
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/locally_linear.py in ()
15 from scipy.linalg import eigh, svd, qr, solve
16 from scipy.sparse import eye, csr_matrix
---> 17 from ..embedding.base import BaseEmbedding
18 from ..utils.validation import check_array, check_random_state
19 from ..utils.eigendecomp import null_space, check_eigen_solver
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/embedding/base.py in ()
10 from sklearn.utils.validation import check_array
11
---> 12 from ..geometry.geometry import Geometry
13
14 # from sklearn.utils.validation import FLOAT_DTYPES
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/init.py in ()
2
3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/geometry.py in ()
34 from scipy import sparse
35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
37 from .affinity import compute_affinity_matrix
38 from .laplacian import compute_laplacian_matrix
/Users/mysername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/adjacency.py in ()
5 from scipy import sparse
6
----> 7 from .cyflann.index import Index as CyIndex
8 from .utils import RegisterSubclasses
9
ImportError: dlopen(/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
Referenced from: /Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so
Reason: image not found
Thanks for a great library!
I wonder if there is an easy way to perform transform()
only? I have sets of points A
and B_t for t in [0..T]
and I want to plot how B
changes over time, so I was wondering if I could train an embedding on A
and see B_t
projections onto A's
coordinates.
Thank you!
Traceback (most recent call last):
File "/home/jerry/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/jerry/anaconda/lib/python2.7/site-packages/megaman/geometry/tests/test_rmetric.py", line 72, in test_equal_original
assert_allclose( Gtest, G, tol)
File "/home/jerry/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 1391, in assert_allclose
verbose=verbose, header=header)
File "/home/jerry/anaconda/lib/python2.7/site-packages/numpy/testing/utils.py", line 733, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1.04681e+11, atol=0
(mismatch 0.25%)
x: array([[[ 3.104310e+15, 2.125112e+01],
[ 2.119583e+01, 5.400388e+01]],
...
y: array([[[ 4.421211e+15, 4.985776e+01],
[ 5.055786e+01, 5.400388e+01]],
...
-------------------- >> begin captured stdout << ---------------------
('phi.shape = ', (200, 2))
('G.shape = ', (200, 2, 2))
('H.shape = ', (200, 2, 2))
('L.shape = ', (200, 200))
--------------------- >> end captured stdout << ----------------------
Ran 140 tests in 7.220s
FAILED (failures=1)
make: *** [test] Error 1
errors in OSX when from megaman.geometry import Geometry
the errors are following:
`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in ()
----> 1 from megaman.geometry import Geometry
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/init.py in ()
2
3 from .rmetric import RiemannMetric
----> 4 from .geometry import Geometry
5 from .adjacency import Adjacency, compute_adjacency_matrix, adjacency_methods
6 from .affinity import Affinity, compute_affinity_matrix, affinity_methods
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/geometry.py in ()
34 from scipy import sparse
35 from scipy.special import gammaln
---> 36 from .adjacency import compute_adjacency_matrix
37 from .affinity import compute_affinity_matrix
38 from .laplacian import compute_laplacian_matrix
/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/adjacency.py in ()
5 from scipy import sparse
6
----> 7 from .cyflann.index import Index as CyIndex
8 from .utils import RegisterSubclasses
9
ImportError: dlopen(/Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so, 2): Library not loaded: @rpath/libflann.1.8.dylib
Referenced from: /Users/myusername/miniconda2/envs/ir-conda/lib/python2.7/site-packages/megaman/geometry/cyflann/index.so
Reason: image not found
`
Write to @jmcq89 or @mmp2 if you would like to contribute
This list is only very slightly prioritized (i.e we think the first two tasks are the most important currently). There are almost no dependencies between tasks, so any task can be undertaken at any time.
Ping @jmcq89
I've been going through the package and adding some tests of corner-cases.
There are a lot of commonalities between the four embedding classes, and a lot of code that is repeated. By writing common unit tests for these, I found a number of inconsistencies with how they handle input arguments.
I think this is a case where building the four algorithms from a common base class would be very helpful, both in keeping the code organized & terse, and in making sure all estimators are behaving as expected.
Currently only one initialization method provided orthogonal_initialization(data,K)
; this initialization is appropriate for spectral clustering.
We need to provide the power_initialization
as default method and a custom center initialization. Low priority, a custom labeling initialization.
I found a couple bugs associated with the embedding dimension being greater than the manifold dimension, basically having to do with truncating the remaining singular values.
compute_G_from_H should be passing the manifold dimension as an argument (looks like its set up to do this but not doing it currently)
In compute_G_from_H (the 'elif mdimG < n_dim:' part) there are undefined variables and so forth. I don't think this has been tested. As is, it won't work for mdimG < ndim_Y.
Here is a piece of megaman.embeddings.ltsa.ltsa()
that gives 10x improvement in speed for the first stage of ltsa
(before null_sapce
call); in my case running time of that part dropped from 2 hours to 10 minutes (600k points, 10-dimentional space). The amg
solver runs fast anyway ~3min.
Improvements are 1) instead of searching for rows==i
and then indexing cols
at every iteration, we exploit the fact that rows
is initially sorted as [0 0 0 1 1 2 2 2 3 3 .. ]
and pre-split it; 2) updates for lil_matrix
are faster. Surprisingly, those small fixes give serious improvements in run time.
Probably, if we could know in advance that one uses n_neighbours
not radius
, it could be done even slightly faster by pre-allocating np.ones((1, len(neighbors_i))
, but that sounds hacky.
if eigen_solver != 'dense':
M = sparse.lil_matrix((N, N))
else:
M = np.zeros((N, N))
split_map = np.where(np.diff(rows))[0]+1
cols_i = np.split(cols, split_map)
for i in tqdm.trange(N):
neighbors_i = cols_i[i]
n_neighbors_i = len(neighbors_i)
use_svd = (n_neighbors_i > d_in)
Xi = geom.X[neighbors_i]
Xi -= Xi.mean(0)
# compute n_components largest eigenvalues of Xi * Xi^T
if use_svd:
v = svd(Xi, full_matrices=True)[0]
else:
Ci = np.dot(Xi, Xi.T)
v = eigh(Ci)[1][:, ::-1]
Gi = np.zeros((n_neighbors_i, n_components + 1))
Gi[:, 1:] = v[:, :n_components]
Gi[:, 0] = 1. / np.sqrt(n_neighbors_i)
GiGiT = np.dot(Gi, Gi.T)
nbrs_x, nbrs_y = np.meshgrid(neighbors_i, neighbors_i)
M[nbrs_x, nbrs_y] -= GiGiT
M[neighbors_i, neighbors_i] += np.ones((1, len(neighbors_i))
return null_space(M.tocsr(), n_components, k_skip=1, eigen_solver=eigen_solver,
random_state=random_state,solver_kwds=solver_kwds)
I am not sure how to run set of tests for ltsa
, so I am reporting it here instead of pull requesting.
Hi all,
I am doing some experiments on megaman, and I am wondering how to select parameter for megaman (the dimension to reduce to (n_components), and the radius).
I have datasets including Dataset A: 300 samples, and dim=1.5 millions, while another one Dataset B is about 500 samples and dim=800. What dimension do should I scale them using megaman for each situation? For example, I reduce A to new dimension d1 (n_components): d1=100 or d1=2,... how to select good numbers for d1?
For the radius, how to know which radius is good?
Thank you.
To make this maximally compatible with scikit-learn, we should have the estimators inherit from sklearn.base
, and then run sklearn's check_estimator
on each as part of the test suite.
The only thing to keep in mind is that sklearn estimators can't do anything in __init__
except store the arguments by name (this is to enable cloning the estimator for grid search & parallelization). It seems like we're already in good shape as far as that requirement.
Reading through the code, it seems we could benefit greatly from a more generalized distance_matrix()
function, with this functionality passed along to each estimator. I'd imagine a call signature something like this:
def distance_matrix(X, method='auto', type='radius', **kwargs):
...
pyflann
, then the user can either pass a pre-built flann index as a keyword argument, or leave it out and have it generated automaticallycyflann
radius
then the user should pass a radius as a keyword argumentknn
then the user should pass an integer number of neighbors as a keyword argument.When trying to install megaman and running python setup.py intsall, I get the following fatal error:
In file included from megaman/geometry/cyflann/index.cxx:620:0:
megaman/geometry/cyflann/cyflann_index.h:8:10: fatal error: flann/flann.hpp: No such file or directory
#include <flann/flann.hpp>
^~~~~~~~~~~~~~~~~
compilation terminated.
How could I fix it?
I'm afraid as soon as you use adjacency_method = 'cyflann'
I get the following error. It's a cython issue as far as I understand. Any thoughts?
I'm using Anaconda python 3.5.2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-3db59b70d454> in <module>()
16 geom.set_data_matrix(X)
17 t0 = time.time()
---> 18 adjacency_matrix = geom.compute_adjacency_matrix()
19 t1 = time.time() - t0
20 print(t1)
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/geometry.py in compute_adjacency_matrix(self, copy, **kwargs)
176 self.adjacency_matrix = compute_adjacency_matrix(self.X,
177 self.adjacency_method,
--> 178 **kwds)
179 if copy:
180 return self.adjacency_matrix.copy()
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in compute_adjacency_matrix(X, method, **kwargs)
22 else:
23 method = 'kd_tree'
---> 24 return Adjacency.init(method, **kwargs).adjacency_graph(X)
25
26
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in adjacency_graph(self, X)
45 return self.knn_adjacency(X)
46 elif self.radius is not None:
---> 47 return self.radius_adjacency(X)
48
49 def knn_adjacency(self, X):
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in radius_adjacency(self, X)
100
101 def radius_adjacency(self, X):
--> 102 cyindex = self._get_built_index(X)
103 return cyindex.radius_neighbors_graph(X, self.radius)
104
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/adjacency.py in _get_built_index(self, X)
93 if self.flann_index is None:
94 cyindex = CyIndex(X, target_precision=self.target_precision,
---> 95 **(self.cyflann_kwds or {}))
96 else:
97 cyindex = self.flann_index
megaman/geometry/cyflann/index.pyx in megaman.geometry.cyflann.index.Index.__cinit__ (megaman/geometry/cyflann/index.cxx:1802)()
/Users/sachin/anaconda/lib/python3.5/site-packages/megaman/geometry/cyflann/index.cpython-35m-darwin.so in string.from_py.__pyx_convert_string_from_py_std__in_string (megaman/geometry/cyflann/index.cxx:5777)()
TypeError: expected bytes, str found
I installed megaman without errors
megaman$ python setup.py install
Cythonizing sources
megaman/__check_build/_check_build.pyx has not changed
megaman/geometry/cyflann/index.pyx has not changed
Compiling FLANN with FLANN_ROOT=/anaconda2
running install
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "megaman.__check_build._check_build" sources
building extension "megaman.geometry/cyflann.index" sources
building data_files sources
build_src: building npy-pkg config files
running build_py
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
running install_lib
running install_data
running install_egg_info
Removing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
Writing /anaconda2/lib/python2.7/site-packages/megaman-0.3.dev0-py2.7.egg-info
running install_clib
customize UnixCCompiler
However, I received errors as below when call megaman:
from megaman.geometry import Geometry
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "megaman/__init__.py", line 7, in <module>
from . import __check_build
File "megaman/__check_build/__init__.py", line 56, in <module>
raise_build_error(e)
File "megaman/__check_build/__init__.py", line 51, in raise_build_error
msg=msg))
ImportError: No module named _check_build
___________________________________________________________________________
Contents of megaman/__check_build:
__init__.py __init__.pyc _check_build.c
_check_build.pyx setup.py setup.pyc
___________________________________________________________________________
It seems that megaman has not been built correctly.
If you have installed megaman from source, please do not forget
to build the package before using it: run `python setup.py install`
in the source directory.
It appears that you are importing a local megaman source tree.
Please either use an inplace install or try from another location.
Anyone could help me, please. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.