pythonot / pot Goto Github PK
View Code? Open in Web Editor NEWPOT : Python Optimal Transport
Home Page: https://PythonOT.github.io/
License: MIT License
POT : Python Optimal Transport
Home Page: https://PythonOT.github.io/
License: MIT License
Describe the bug
The following script gives a shape mismatch error when computing sinkhorn2 with stabilization and many dists.
File "/Users/hichamjanati/Documents/github/forks/POT/ot/bregman.py", line 774, in sinkhorn_stabilized
log['logu'] = alpha / reg + np.log(u)
ValueError: operands could not be broadcast together with shapes (100,) (100,2)
To Reproduce
import numpy as np
import ot
from ot.bregman import sinkhorn2
n = 100
x = np.arange(n, dtype=np.float64)
# Gaussian distributions
a = ot.datasets.make_1D_gauss(n, m=20, s=5) # m= mean, s= std
b1 = ot.datasets.make_1D_gauss(n, m=60, s=8)
b2 = ot.datasets.make_1D_gauss(n, m=30, s=4)
# creating matrix A containing all distributions
b = np.vstack((b1, b2)).T
M = ot.utils.dist0(n)
M /= np.median(M)
epsilon = 0.1
w_stable, log = sinkhorn2(a, b, M, epsilon, method="sinkhorn_stabilized",
log=True)
Fix
Basically when log=True
, the actual code does not take into account the case where b
contains many distributions. The if nbb
should be moved up before computing the dual
variables.
bregman.py
if log:
log['logu'] = alpha / reg + np.log(u)
log['logv'] = beta / reg + np.log(v)
log['alpha'] = alpha + reg * np.log(u)
log['beta'] = beta + reg * np.log(v)
log['warmstart'] = (log['alpha'], log['beta'])
if nbb:
res = np.zeros((nbb))
for i in range(nbb):
res[i] = np.sum(get_Gamma(alpha, beta, u[:, i], v[:, i]) * M)
return res, log
I can make a tiny PR with an additional test if you want.
Hello to all contributors,
The last POT 0.6 release brought new features to the library and we have now 25 papers implemented in POT. It was discussed that before making the 1.0 release, we should work on some fundamental changes inside the library. In my humble opinion, we should work on the most urgent changes before adding new features. If we keep adding new features, it will be even more complicated to make the fundamental changes afterwards. I start this issue in order to discuss these matters.
I copy past here what was discussed before. The list is non exhaustive and I invite you to complete it if you have ideas/wishes:
Naming convention (clearer and more consistent)
Duplicated code (bregman module)
Clean commented code
a two letters package name -- ot -- can cause multiple headaches ..
The emd functions should be in a specific module not in the init file
In some functions, the transport plan is computed (which can be heavy to store on gpus) even though it is not needed. I'm thinking there should be a function that explicitly computes the transport plan given the dual variables making the call specific by the user.
sinkhorn returns the distance or the plan depending on the second dimension of the input distribution b ..
make sure we have all the working infrastructure to make this (and future releases) by the CIs.
Domain adaptation name
Torch backend
I would state that the most urgent before adding features is the naming convention, because we can't add new functions with old names (ot.sinkhorn2 ...).
It will be updated each time we converge toward a new name.
------------------documentation/examples------------------
------------------variable names------------------
Hi,
Probably a minor copy/paste fix needed in the notebooks:
For example, in the latter, the following code seems produce EMD instead of Sinknorn
# prediction between images (using out of sample prediction as in [6])
transp_Xs_emd = ot_emd.transform(Xs=X1)
transp_Xt_emd = ot_emd.inverse_transform(Xt=X2)
transp_Xs_sinkhorn = ot_emd.transform(Xs=X1) # Shouldn't be ot_sinkhorn.transform(Xs=X1) ?
transp_Xt_sinkhorn = ot_emd.inverse_transform(Xt=X2) # Same here
At least, it would match the example:
https://github.com/rflamary/POT/blob/e757b75976ece1e6e53e655852b9f8863e7b6f5a/examples/plot_otda_color_images.py#L118-L119
Thanks
PS. Sorry if I misunderstood something.
By dockerizing POT using this POT Dockerfile I came across an error that occurs during the command python3 setup.py install --user
:
Traceback (most recent call last):
File "setup.py", line 3, in <module>
from setuptools import setup, find_packages
File "/usr/local/lib/python3.4/dist-packages/setuptools/__init__.py", line 12, in <module>
import setuptools.version
File "/usr/local/lib/python3.4/dist-packages/setuptools/version.py", line 1, in <module>
import pkg_resources
File "/usr/local/lib/python3.4/dist-packages/pkg_resources/__init__.py", line 70, in <module>
import packaging.version
ImportError: No module named 'packaging'
Any idea how to solve this?
Update
I worked out first error but encounter another one. It seems related with Shippable/support#3316: the cause may be due to a new version of setuptools
.
We need to have a proper doicumentation in the GPU implementation module
Could you please tell me where is the solver for GCG? I've been searched for a while but couldn't find it. Thank you
Describe the bug
The .gromov.gromov wasserstein method fails (TypeError) when the cost matrices are very similar but not the same
To Reproduce
The full code is available at
https://colab.research.google.com/drive/1IhnOqeLV51gWE8FodnBsgR5cQC_w2EkL
Sys specifications
Linux-3.10.0-327.22.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Python 3.4.3 (default, Apr 28 2015, 11:29:27)
[GCC 4.9.2]
NumPy 1.16.2
SciPy 1.2.1
POT 0.5.1
Hi guys, I am a newbie in programming, I tried "pip install POT" in my terminal, and the following (first) error happened:
ot/lp/emd_wrap.cpp:6660:65: error: too many arguments to function call, expected 3, have 4
return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~
I am sure any reliant packages are updated. Any help I would appreciate. Thanks!
Hello,
I am trying out the GPU implementation of the sinkhorn transport, but with not much success.
>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.gpu.sinkhorn(a,b,M,1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'ot' has no attribute 'gpu'
However, the ot.sinkhorn(a,b,M,1) works as expected.
I have cupy installed as well as the CUDA SDK.
Could someone help?
In paper[5] Optimal Transport for Domain Adaptation, you used laplacian regularization. But I'm not sure how do we get matrix S which is a similarity matrix? Is there any tutorial related to this?
Thanks for the help.
I encountered the following issue while installing POT on MacOSX Mojave, with
python 3.6
python setup.py build running build running build_py running build_ext building 'ot.lp.emd_wrap' extension /usr/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -Iot/lp -I/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/Users/nico/code/POT/ot/lp -I/anaconda3/include/python3.6m -c ot/lp/emd_wrap.cpp -o build/temp.macosx-10.7-x86_64-3.6/ot/lp/emd_wrap.o warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] In file included from ot/lp/emd_wrap.cpp:648: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1823: /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by " \ ^ In file included from ot/lp/emd_wrap.cpp:650: ot/lp/EMD.h:19:10: fatal error: 'iostream' file not found #include <iostream> ^~~~~~~~~~ 2 warnings and 1 error generated. error: command '/usr/bin/gcc' failed with exit status 1
I finally solved it by adding in setup.py the following extra argument for the compiler
extra_compile_args=["-stdlib=libc++"]
However before pushing a PR, it is not clear to me if adding this option will break compatibility with other OS. Meanwhile, it is a simple workaround for this problem.
Hello,
I am trying to compute ot.emd2() distances between two histograms and for some reason, it fails with this error. I have managed to compute between other histograms so I am wondering what might be wrong with my criteria.
They satisfy the constraint of sum = 1 - which is the only one i'm aware of?
I can fix the problem using np.ascontiguousarray() but I'm trying to get an intuition if I'm doing something wrong to begin with.
I tried to use ot.gpu.sinkhorn
using CUDA and I got this traceback:
File "/lib/python3.5/site-packages/ot/gpu/bregman.py", line 132, in sinkhorn_knopp
np.divide(M, -reg, out=K)
File "cupy/core/_kernel.pyx", line 831, in cupy.core._kernel.ufunc.__call__
File "cupy/core/_kernel.pyx", line 355, in cupy.core._kernel._get_out_args
TypeError: output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule "same_kind"
I guess it has to do with: https://github.com/rflamary/POT/blob/master/ot/gpu/bregman.py#L120 which reuses the dtype of M, but M has been computed from ot.gpu.dist
using a cost matrix which has been created with dtype: np.int16
, so it makes sense to have this error.
I tried to set it as np.float64
to see if the error is indeed due to this. But I wonder if that's expected behavior. I can do a PR to make this error more user-friendly, but beyond this, why not have K
be np.float64
anyway? My use case to use np.int16
on the cost matrix is because I have a really big matrix, this way I can save up a lot of RAM.
Thank you again for this project :)
Hi everyone,
I am trying to install POT on an Ubuntu 16.04 with Anaconda and
using the instructions on http://pot.readthedocs.io/en/stable/
When executing pip install POT
, I obtain the following error message.
`
Collecting pot
Using cached https://files.pythonhosted.org/packages/50/66/714ee432a02e95a869c8e243e369ebad60e69a72ab1a72367c31df206619/POT-0.4.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 26, in
import pypandoc
ModuleNotFoundError: No module named 'pypandoc'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 29, in <module>
README = open(os.path.join(ROOT, 'README.md')).read()
File "/root/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-4awvn1uv/pot/
`
However, if I do conda install -c conda-forge pot
, it updates
and then it installs successfully POT.
I have installed POT on OSX with pip successfully with a similar anaconda setup.
Got below error :
/lib/python2.7/site-packages/ot/bregman.py:347: RuntimeWarning: invalid value encountered in multiply
Kp = (1 / a).reshape(-1, 1) * K
('Warning: numerical errors at iteration', 0)
Command:
ot.sinkhorn(a=input_vector, b=output_vector, M=distance_matrix, reg=0.01, verbose=True)
Details :
input_vector.shape : (8342,) [Sums upto 1]
output_vector.shape : (8342,) [Sums upto 1]
distance_matrix.shape : (8342,8342) [Euclidean distance]
What could be possible issue here. Please assist.
I'm getting the following error
Warning: numerical errors at iteration 0
when calling
d_sinkhorn = ot.sinkhorn2(v1, v2, cm, reg)
and v1 or v2 contain zeros.
How to handle this case?
Thanks
Hi,
I want to use the sinkhorn transport and the two regularization method), with different estimation of the target class proportions, similar to the work done here https://hal.archives-ouvertes.fr/hal-01254329/file/OT-multitemp2015-paper.pdf.
For now, the only estimation available is the uniform one, if I not missed something.
In the deprecated classes as OTDA_lpl1, it is possible to custom the weights used (with the ws parameter).
So my questions are:
Best regards,
Benjamin.
How to calculate 2D vector emd distance using POT? etc. I have these 2 vector:
[(0, 1), (1, 1), (2, 2), (3, 2), (4, 1), (5, 1)],
[(0, 1), (1, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]
distance matrix:
5x6
Actually, the 2 vector are words-bag, and distance matrix is the words Euclidean distance, I want using this to calculate Sentence Distance but don't know how to use EMD distance, any help?
Hi, I wish to request to add the code from the paper- http://people.csail.mit.edu/jsolomon/assets/convolutional_w2.compressed.pdf
Their matlab code is here https://github.com/gpeyre/2015-SIGGRAPH-convolutional-ot.git
Thanks
Kowshik
I am trying to calculate the EMD of two sets. When one set has a few hundred entries and the other has only 2, the EMD calculation fails and returns Problem Infeasible.
Steps to reproduce the behavior:
** SEE BELOW COMMENT FOR FIXED SCRIPT **
Expected behavior
Should return EMD around 1, instead says that the sets spherEng1 and pencilEnergy are not in the simplex
Screenshots
Here is comparing the EMDs calculated for less densely tiled to most densely tiled (number of particles = number of segments) with the two element set
Desktop (please complete the following information):
import platform; print(platform.platform())
Darwin-16.7.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.15.4')
import scipy; print("SciPy", scipy.version)
('SciPy', '1.1.0')
import ot; print("POT", ot.version)
('POT', '0.5.1')
Hi,
I may have found a line that could lead to errors when using OT objects in a supervised DA setting.
at line 992 in da.py
I propose to change classes = np.unique(ys)
into classes = [c for c in np.unique(ys) if c != -1]
which would enable people to use source samples with no labels to find the optimal couplin.
I also propose to add an example for semi supervised DA.
Do you agree with these propositions ? If yes, I'll open a PR.
README.md contains non-ascii characters, so setup.py will fail if the locale is ascii, e.g.
$ LC_ALL=C python setup.py install
Traceback (most recent call last):
File "setup.py", line 26, in <module>
import pypandoc
ModuleNotFoundError: No module named 'pypandoc'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 29, in <module>
README = open(os.path.join(ROOT, 'README.md')).read()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)
This is easily fixed by using open(..., encoding="utf-8")
(or, if you want Py2 compatibility, codecs.open(..., encoding="utf-8")
).
Could anyone tell me that how do you transfer the label after using barycentric mapping? Cause you don't have labels for target, you would like to transfer source labels to the mapping points?
Is this available in some part of the code ? Thank you so much.
hi,
I search the full document https://pot.readthedocs.io/en/latest/index.html but I can not find any usage information about ot.gpu
.
I wonder whether there is some document or example about the usage of ot.gpu
Thanks
Hello,
I don't have a formal background in OT, therefore pardon me if I am asking something extremely silly. In the plot_ot_1d.py, for the cost matrix calculation :
M = ot.dist(x.reshape((n, 1)), x.reshape((n, 1)))
It is bit strange because I was expecting cost matrix to be something in between two distributions but it seems that cost matrix is rather in between the samples of the 2 distribution. Kindly reply.
Hello there,
i am currently toying around POT comparing texts. I have a dictionary of 46k terms and im trying to compare 120k documents. Every document has at most 10-15 words(bibtex titles) so comparing 2 distributions - texts will result in comparing 2 [46000,1] vectors with at most 10 non zero entries.
Are there any suggestions in the process because the naive approach is too slow. Comparing 10k documents takes 2 days.
( emd2(p,q,C) p,q are [46k,1] C is [46k,46k]
Sinkhorn is even slower!
Thanks in advance!
Dear Remi,
thank you very much for releasing and documenting this package - it's really helpful to learn from ๐ I was wondering if there's a simpler/more explicit way to learn the network simplex algorithm?
I was looking on the web for very simple 1D emd
code to help to compare the number of computational steps and accuracy of the unregularized linear program algorithm, with the regularized Sinkhorn-Knopp algorithms which you have here in pure numpy.
I could only find MATLAB code though, and I don't have access to MATLAB? I tried converting it to Octave, but the linear programming solver in Octave seems to be different to the MATLAB one, and I could'nt get the same values as ot.emd(a,b,M)
. I tried both Gaussians, and simpler discrete distributions, but I couldn't find the problem?
It would be really great, and help my understanding a lot, if I could find some simple numpy code to calculate the emd
by setting up the linear program as the network simplex algorithm, as you do in EMD_wrapper.cpp. I'm trying to do this with, linprog-simplex?
I just wondered if you know of any such code which is available? It would be really helpful for people new to optimal transport to see how the different algorithms work, (side by side in numpy), and compare their accuracy at a basic level.
All the best,
Ajay
Hello,
I have been working on your free support barycenter examples.
https://github.com/rflamary/POT/blob/master/examples/plot_free_support_barycenter.py
I went through the code and there is something which looks wrong to me. To plot your figure you used :
for (x_i, b_i) in zip(measures_locations, measures_weights):
color = np.random.randint(low=1, high=10 * N)
pl.scatter(x_i[:, 0], x_i[:, 1], s=b * 1000, label='input measure')
but I think it should be
I can make a PR to correct it if it is a mistake.
There are some docstring inconsistencies in the docstring of some classes such as the Sinkhorn class: the parameter mapping is not in the signature call, so, how does one control the mapping now? there is an "out_of_sample_map" parameter in the call upon class construction which should be explained in the docstring of these classes.
Example:
Init signature: SinkhornLpl1Transport(reg_e=1.0, reg_cl=0.1, max_iter=10, max_inner_iter=200, log=False, tol=1e-08, verbose=False, metric='sqeuclidean', norm=None, distribution_estimation=<function distribution_estimation_uniform at 0x7effd9dd6400>, out_of_sample_map='ferradans', limit_max=inf)
Docstring:
Domain Adapatation OT method based on sinkhorn algorithm +
LpL1 class regularization.
reg_e : float, optional (default=1)
Entropic regularization parameter
reg_cl : float, optional (default=0.1)
Class regularization parameter
mapping : string, optional (default="barycentric")
The kind of mapping to apply to transport samples from a domain into
another one.
if "barycentric" only the samples used to estimate the coupling can
be transported from a domain to another one.
metric : string, optional (default="sqeuclidean")
The ground metric for the Wasserstein problem
norm : string, optional (default=None)
If given, normalize the ground metric to avoid numerical errors that
can occur with large metric values.
running pyflakes gives me:
$ pyflakes ot/*/*.py ot/*.py examples/*.py
ot/optim.py:9: '.bregman.sinkhorn_stabilized' imported but unused
ot/utils.py:96: undefined name 'reduce'
examples/demo_OTDA_classes.py:6: 'numpy as np' imported but unused
examples/demo_barycenter_1D.py:12: 'mpl_toolkits.mplot3d.Axes3D' imported but unused
examples/demo_barycenter_1D.py:14: 'matplotlib.colors.colorConverter' imported but unused
when running flake8
$ flake8 ot/*/*.py ot/*.py examples/*.py
you'll see that you have a lot of pep8 style violations.
Hi, just wanted to mention that I needed to add extra_link_args=["-stdlib=libc++"]
inside ext_modules = cythonize(Extension( ... ))
to get the cython code to compile. I'm using Python 3.7 on Mojave 10.14.4 with...
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Getting these things to compile is always tricky for me. Let me know if you want more info.
The following script:
import numpy as np
import ot
a = np.random.rand(1500, 95)
b = np.random.rand(50000, 95)
opt = ot.da.OTDA()
opt.fit(a,b)
print(np.sum(opt.G))
returns me on Windows
0
Windows-10-10.0.15063-SP0
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 0.19.0
POT 0.3.1
while it returns me on debian
0.716
Linux-3.2.0-4-amd64-x86_64-with-debian-7.11
('Python', '2.7.3 (default, Jun 21 2016, 18:38:19) \n[GCC 4.7.2]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('POT', '0.3.1')
If I'm not mistaken, it should always return 1
Hi there,
great library :) I have some small questions.
I observed that for solving one transportation problem emd is much faster than sinkhorn, but I actually expected it to be vice versa, that's one reason to use it.... how come?
If I'm using ot.gpu.sinkhorn, how could I calculate the distance from the transportation matrix?
Thanks in advance for clarification on these issues.
Best, Patrick
Hello there,
I wanted to install the library in a PC that has a GPU to test the parallelism of optimal transport.
I cannot though because there is an error on build:
ot/lp/network_simplex_simple.h:234:46: error: macro "MAX" requires 2 arguments, but only 1 given
MAX(std::numeric_limits::max()),
I'm guessing there is a define of MAX from an older user somewhere? If that's the case can you help me giving me some insights where should i look?
OS : Ubuntu 14.04.5 LTS
Thank you for your time in advance!
you should use https://github.com/sphinx-gallery/sphinx-gallery to generate your example gallery.
you would have the notebooks for free with download links at the bottom of the page.
also you build the doc I had to comment out this in conf.py :
# sys.path.insert(0, os.path.abspath("../.."))
#sys.setrecursionlimit(1500)
# class Mock(MagicMock):
# @classmethod
# def __getattr__(cls, name):
# return Mock()
# MOCK_MODULES = [ 'emd','ot.lp.emd']
# sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)
thanks for making these tools easily available !
Hello,
Would it be possible to remove line 19 from . import plot
from ot/init.py ?
It automatically loads matplotlib which could generates an error when using an instance without graphical display.
Thanks in advance !
Hi, I'd just like to mention that there is a small issue with one of the demos
On the "1D Wasserstein Barycenter demo" of the notebooks (notebooks/plot_barycenter_1D.ipynb) on lines 24 and 25 of the second code block, the Gaussian distributions are generated with
a1 = ot.datasets.make_1D_gauss(n, m=20, s=5)
However, this method was renamed to get_1D_gauss
a1 = ot.datasets.get_1D_gauss(n, m=20, s=5)
The demo runs without issues after that is fixed
Thanks !
We should change the domain adaptation Classes to be more sklearn compliant.
Main issues:
@agramfort proposed to Creat new Clases with proper names and begin deprecating the old classes.
I think it is a good move.
I have two data samples, each of size 100k, from two distributions in the 50-dimensional space, say n = 100k, p = 50. Can I use this OT library to compute the earth-mover distance between these two empirical data samples?
For the moment, we only perform doctest and a simple loading of the module.
We should begin to convert and propose tests for all functions and classes.
When I use "pip install POT", it failed. It depended on Cython. However, it seems that it forgets to tell pip that it depends on Cython.
I solve this problem by install Cython first. However, if we write both Cython and POT into requirements.txt, the installation will fail.
Could anyone solve that?
We should use sphinx Gallery to generate automatically the notebooks.
To do that we need to provide proper rst documentation in the examples as in
https://sphinx-gallery.readthedocs.io/en/latest/tutorials/plot_notebook.html#sphx-glr-tutorials-plot-notebook-py
We should rename the datasets.get_* function to datasets.get_* in order to be more sklearn compliant.
Also it should be possible to give the rng as input as in sklearn.
Hello,
I am writing you today to discuss the possible implementation of Frank Wolf variants which can be interesting to solve the GW problem. While the standard FW converges slowly in O(1/t), other methods converge faster. One of the faster method is the away step Frank Wolf which converges linearly (https://arxiv.org/pdf/1511.05932.pdf).
This was suggested by Thomas Kedreux.
Hi,
I need to get some values from transport computation, as cost matrix, value of minimisation...
Some of these values are stored in the log. But when I do:
ot_emd,log = ot.da.EMDTransport(norm="max",log=0)
I get the following error:
TypeError: __init__() got an unexpected keyword argument 'log'
In the EMDTransport class declaration there is:
"""
Parameters
----------
...
log : int, optional (default=0)
Controls the logs of the optimization algorithm
..."""
So the question:
Is it voluntary not to be able to recover the log with this class ? And so to get it back I should directly call the emd function without using the EMDTransport class.
Another question:
I want to get the min value computed by the minimisation problem (first with EMD but also with sinkhorn) to find a link between effectiveness of transport and OA obtained in classification, how can I do and are there some others values usable to get this kind of information?
At the end my goal is estimate several transports and choose automatically the best.
Regards
Benjamin
Hello,
thank you for POT!
In the [1] reference, the interpolation is discussed and an example given (see link below).
Is it feasible to do this in POT (the matlab code is https://github.com/gpeyre/2013-SIIMS-ot-splitting ) or to extend POT to do it?
Best regards
Thomas
a =[0.5, 0.5], b= [0.5, 0.5], M =[[0., 1.], [1., 0.]]
G0 = ot.emd(a,b,M), G0 =array([[ 0.5, 0. ], [ 0. , 0.5]])
a = [0.5, 0.5], b = [0.2, 0.8]
G0 = ot.emd(a,b,M) G0 = array([[ 0., 0.], [ 0., 0.]])
n=100, a= ot.datasets.get_1D_gauss(n, m=20, s=5), b= ot.datasets.get_1D_gauss(n, m=60, s=10), x=np.arange(n, dtype=np.float64), M = ot.dist(x.reshape((n,1)), x.reshape((n,1))), M/=M.max(), G0=ot.emd(a,b,M)
%matplotlib inline, pl.figure(1), ot.plot.plot1D_mat(a,b,G0, 'OT Matrix G0'), pl.show()
Describe the bug
To Reproduce
Steps to reproduce the behavior:
Expected behavior
It seems that I can never return the log dictionary using greenkhorn, and I am not sure why.
When running original sinkhorn, the log argument works fine.
Besides, I hope greenkhorn could allow for list as input (this is not an bug, but maybe it could be added in the future).
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Output of the following code snippet:
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)
import platform; print(platform.platform())
Darwin-18.2.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
Python 3.6.7 |Anaconda custom (64-bit)| (default, Oct 23 2018, 14:01:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
import numpy; print("NumPy", numpy.version)
NumPy 1.15.4
import scipy; print("SciPy", scipy.version)
SciPy 1.1.0
import ot; print("POT", ot.version)
POT 0.5.1
Additional context
Add any other context about the problem here.
The current documentation relies on sphinx-gallery which cannot be executed on readthedoc so we have to compile everything to rst and notebooks for a proper documentation.
This will make the repo explode so we should find a way to have an updated doc (staying on readthedoc if possible) probably by keeping a compiled version of the doc on a separate repository.
The compiled notebooks also are very nice (they allow a quick look at how the toolbox works) but should be stored also in a separate repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.