pythonot / pot Goto Github PK

POT : Python Optimal Transport

License: MIT License

Python 99.42% Makefile 0.16% Cython 0.42%

optimal-transport numerical-optimization machine-learning emd ot-mapping-estimation wasserstein-barycenter ot-solver python wasserstein wasserstein-discriminant-analysis gromov-wasserstein wasserstein-barycenters sinkhorn-divergences sinkhorn-knopp pot domain-adaptation

pot's People

Stargazers

Watchers

Forkers

arakotom agramfort yin-shane-xia cfwen ngayraud ajaytalati helloworldwq alfio python3pkg bbdamodaran arolet djsutherland calebium slasnista yaojiebao ymustc dohmatob patricieni gxdai monty47 yw81 lehaifeng kudiyar devermyst philotuxo cuptea ahoyosid zhangjuju rachelzheng kadeng bengepai yochju kowshikthopalli tardyb hulalazz helenligit nmonath psuarezserrato wanpeng16 kilianfatras yx-s-z tvayer vishalbelsare vivienseguy leogautheron dephiehuang shlpu rafaelmri jakirkham arita37 peterouzh css1995 linzehua aboisbunon jdmartin86 chaozhang-zju gakkilovemath hugcis daniely-tracxpoint pkomiske kkdeng twitwi gustavocarita kwonoh mafuguo mohitzsh tmanole vfdev-5 loic001 grapefroot emited devmessias z2007c dmelis vaishgajaraj hichamjanati benjaminleroy stjordanis mbp28 csbioazim rtavenar thoamsdong soso128 wangyongguang scutjinchengli xieyujia flamato mengbinghen ahcheriet galerkin yimzhai3 rupsabasu markorajkovic nuraiman csyanbin 717ct zhlzhl benjamin-lucas afcarl lyndonckz

pot's Issues

Shape mismatch for stabilized sinkhorn with multi-distributions

Describe the bug
The following script gives a shape mismatch error when computing sinkhorn2 with stabilization and many dists.

  File "/Users/hichamjanati/Documents/github/forks/POT/ot/bregman.py", line 774, in sinkhorn_stabilized
    log['logu'] = alpha / reg + np.log(u)
ValueError: operands could not be broadcast together with shapes (100,) (100,2)

To Reproduce

import numpy as np
import ot
from ot.bregman import sinkhorn2


n = 100
x = np.arange(n, dtype=np.float64)

# Gaussian distributions
a = ot.datasets.make_1D_gauss(n, m=20, s=5)  # m= mean, s= std
b1 = ot.datasets.make_1D_gauss(n, m=60, s=8)
b2 = ot.datasets.make_1D_gauss(n, m=30, s=4)

# creating matrix A containing all distributions
b = np.vstack((b1, b2)).T

M = ot.utils.dist0(n)
M /= np.median(M)
epsilon = 0.1

w_stable, log = sinkhorn2(a, b, M, epsilon, method="sinkhorn_stabilized",
                          log=True)

Fix

Basically when log=True, the actual code does not take into account the case where b contains many distributions. The if nbb should be moved up before computing the dual
variables.

bregman.py

    if log:
        log['logu'] = alpha / reg + np.log(u)
        log['logv'] = beta / reg + np.log(v)
        log['alpha'] = alpha + reg * np.log(u)
        log['beta'] = beta + reg * np.log(v)
        log['warmstart'] = (log['alpha'], log['beta'])
        if nbb:
            res = np.zeros((nbb))
            for i in range(nbb):
                res[i] = np.sum(get_Gamma(alpha, beta, u[:, i], v[:, i]) * M)
            return res, log

I can make a tiny PR with an additional test if you want.

Road to POT 1.0

Hello to all contributors,

The last POT 0.6 release brought new features to the library and we have now 25 papers implemented in POT. It was discussed that before making the 1.0 release, we should work on some fundamental changes inside the library. In my humble opinion, we should work on the most urgent changes before adding new features. If we keep adding new features, it will be even more complicated to make the fundamental changes afterwards. I start this issue in order to discuss these matters.

I copy past here what was discussed before. The list is non exhaustive and I invite you to complete it if you have ideas/wishes:

Reform changes

Naming convention (clearer and more consistent)
Duplicated code (bregman module)
Clean commented code
a two letters package name -- ot -- can cause multiple headaches ..
The emd functions should be in a specific module not in the init file
In some functions, the transport plan is computed (which can be heavy to store on gpus) even though it is not needed. I'm thinking there should be a function that explicitly computes the transport plan given the dual variables making the call specific by the user.
sinkhorn returns the distance or the plan depending on the second dimension of the input distribution b ..
make sure we have all the working infrastructure to make this (and future releases) by the CIs.
Domain adaptation name
Torch backend

I would state that the most urgent before adding features is the naming convention, because we can't add new functions with old names (ot.sinkhorn2 ...).

Name Shifting

It will be updated each time we converge toward a new name.

------------------documentation/examples------------------

n -> n_source_samples
xs/xt -> x_source/x_target
G0/Gs -> Gamma_emd/ Gamma_sinkhorn (May be ?)
reg parameter entropic -> epsilon (blur ?)
d (dimension parameter) -> n_features
N (barycenter example) -> N_distributions
X1 -> X_source (color transfer)

------------------variable names------------------

numItermax -> num_iter_max
numInnerItermax -> num_iter_max_{function name}
stopThr -> stop_threshold
(reg -> blur ?)
log (variable not bool) -> log_{namefunction}

Assignements

Variable names (Kilian)

Improve typos in Notebooks

Hi,
Probably a minor copy/paste fix needed in the notebooks:

For example, in the latter, the following code seems produce EMD instead of Sinknorn

# prediction between images (using out of sample prediction as in [6])
transp_Xs_emd = ot_emd.transform(Xs=X1)
transp_Xt_emd = ot_emd.inverse_transform(Xt=X2)

transp_Xs_sinkhorn = ot_emd.transform(Xs=X1) # Shouldn't be ot_sinkhorn.transform(Xs=X1) ?
transp_Xt_sinkhorn = ot_emd.inverse_transform(Xt=X2)  # Same here

At least, it would match the example:
https://github.com/rflamary/POT/blob/e757b75976ece1e6e53e655852b9f8863e7b6f5a/examples/plot_otda_color_images.py#L118-L119

Thanks
PS. Sorry if I misunderstood something.

Dockerizing POT, setting up error

By dockerizing POT using this POT Dockerfile I came across an error that occurs during the command python3 setup.py install --user:

Traceback (most recent call last):
  File "setup.py", line 3, in <module>
    from setuptools import setup, find_packages
  File "/usr/local/lib/python3.4/dist-packages/setuptools/__init__.py", line 12, in <module>
    import setuptools.version
  File "/usr/local/lib/python3.4/dist-packages/setuptools/version.py", line 1, in <module>
    import pkg_resources
  File "/usr/local/lib/python3.4/dist-packages/pkg_resources/__init__.py", line 70, in <module>
    import packaging.version
ImportError: No module named 'packaging'

Any idea how to solve this?

Update
I worked out first error but encounter another one. It seems related with Shippable/support#3316: the cause may be due to a new version of setuptools.

Doc GPU implementation

We need to have a proper doicumentation in the GPU implementation module

copy sinkhorn and OTDA class doc
proper doc for the pairwiseEuclideanGPU with formating

GENERALIZED CONDITIONAL GRADIENT FOR SOLVING REGULARIZED OT PROBLEMS

Could you please tell me where is the solver for GCG? I've been searched for a while but couldn't find it. Thank you

Gromov-Wasserstein fails when the cost matrices are slightly different

Describe the bug
The .gromov.gromov wasserstein method fails (TypeError) when the cost matrices are very similar but not the same

To Reproduce
The full code is available at
https://colab.research.google.com/drive/1IhnOqeLV51gWE8FodnBsgR5cQC_w2EkL

How was POT installed [pip]

Sys specifications

Linux-3.10.0-327.22.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Python 3.4.3 (default, Apr 28 2015, 11:29:27) 
[GCC 4.9.2]
NumPy 1.16.2
SciPy 1.2.1
POT 0.5.1

"pip install POT" fail with Python 3.7

Hi guys, I am a newbie in programming, I tried "pip install POT" in my terminal, and the following (first) error happened:

ot/lp/emd_wrap.cpp:6660:65: error: too many arguments to function call, expected 3, have 4
return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~

I am sure any reliant packages are updated. Any help I would appreciate. Thanks!

Cannot run gpu modules

Hello,

I am trying out the GPU implementation of the sinkhorn transport, but with not much success.

>>> a=[.5,.5]
>>> b=[.5,.5]
>>> M=[[0.,1.],[1.,0.]]
>>> ot.gpu.sinkhorn(a,b,M,1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'ot' has no attribute 'gpu'

However, the ot.sinkhorn(a,b,M,1) works as expected.
I have cupy installed as well as the CUDA SDK.

Could someone help?

Laplacian regularization

In paper[5] Optimal Transport for Domain Adaptation, you used laplacian regularization. But I'm not sure how do we get matrix S which is a similarity matrix? Is there any tutorial related to this?

Thanks for the help.

Compilation issue with MacOSX Mojave

I encountered the following issue while installing POT on MacOSX Mojave, with
python 3.6

python setup.py build running build running build_py running build_ext building 'ot.lp.emd_wrap' extension /usr/bin/gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -Iot/lp -I/anaconda3/lib/python3.6/site-packages/numpy/core/include -I/Users/nico/code/POT/ot/lp -I/anaconda3/include/python3.6m -c ot/lp/emd_wrap.cpp -o build/temp.macosx-10.7-x86_64-3.6/ot/lp/emd_wrap.o warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found] In file included from ot/lp/emd_wrap.cpp:648: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:18: In file included from /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1823: /anaconda3/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by " \ ^ In file included from ot/lp/emd_wrap.cpp:650: ot/lp/EMD.h:19:10: fatal error: 'iostream' file not found #include <iostream> ^~~~~~~~~~ 2 warnings and 1 error generated. error: command '/usr/bin/gcc' failed with exit status 1

I finally solved it by adding in setup.py the following extra argument for the compiler
extra_compile_args=["-stdlib=libc++"]

However before pushing a PR, it is not clear to me if adding this option will break compatibility with other OS. Meanwhile, it is a simple workaround for this problem.

Error: ndarray is not C-contiguous

Hello,

I am trying to compute ot.emd2() distances between two histograms and for some reason, it fails with this error. I have managed to compute between other histograms so I am wondering what might be wrong with my criteria.

They satisfy the constraint of sum = 1 - which is the only one i'm aware of?

I can fix the problem using np.ascontiguousarray() but I'm trying to get an intuition if I'm doing something wrong to begin with.

ot.gpu.sinkhorn uses dtype of the cost matrix

I tried to use ot.gpu.sinkhorn using CUDA and I got this traceback:

  File "/lib/python3.5/site-packages/ot/gpu/bregman.py", line 132, in sinkhorn_knopp
    np.divide(M, -reg, out=K)
  File "cupy/core/_kernel.pyx", line 831, in cupy.core._kernel.ufunc.__call__
  File "cupy/core/_kernel.pyx", line 355, in cupy.core._kernel._get_out_args
TypeError: output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule "same_kind"

I guess it has to do with: https://github.com/rflamary/POT/blob/master/ot/gpu/bregman.py#L120 which reuses the dtype of M, but M has been computed from ot.gpu.dist using a cost matrix which has been created with dtype: np.int16, so it makes sense to have this error.

I tried to set it as np.float64 to see if the error is indeed due to this. But I wonder if that's expected behavior. I can do a PR to make this error more user-friendly, but beyond this, why not have K be np.float64 anyway? My use case to use np.int16 on the cost matrix is because I have a really big matrix, this way I can save up a lot of RAM.

Thank you again for this project :)

UnicodeDecodeError: 'ascii' while installing with pip

Hi everyone,

I am trying to install POT on an Ubuntu 16.04 with Anaconda and

Python 3.6
Cython 0.28.3
Numpy 1.14.5
Scipy 1.1.0
Matplotlib 2.2.2

using the instructions on http://pot.readthedocs.io/en/stable/

When executing pip install POT, I obtain the following error message.

`
Collecting pot
Using cached https://files.pythonhosted.org/packages/50/66/714ee432a02e95a869c8e243e369ebad60e69a72ab1a72367c31df206619/POT-0.4.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 26, in
import pypandoc
ModuleNotFoundError: No module named 'pypandoc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-4awvn1uv/pot/setup.py", line 29, in <module>
    README = open(os.path.join(ROOT, 'README.md')).read()
  File "/root/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-4awvn1uv/pot/
`

However, if I do conda install -c conda-forge pot, it updates

ca-certificates
certifi
conda
openssl

and then it installs successfully POT.

I have installed POT on OSX with pip successfully with a similar anaconda setup.

Getting Error while computing sinkhorn distance

Got below error :

/lib/python2.7/site-packages/ot/bregman.py:347: RuntimeWarning: invalid value encountered in multiply
Kp = (1 / a).reshape(-1, 1) * K
('Warning: numerical errors at iteration', 0)

Command:
ot.sinkhorn(a=input_vector, b=output_vector, M=distance_matrix, reg=0.01, verbose=True)

Details :
input_vector.shape : (8342,) [Sums upto 1]
output_vector.shape : (8342,) [Sums upto 1]
distance_matrix.shape : (8342,8342) [Euclidean distance]

What could be possible issue here. Please assist.

ot.sinkhorn returns OT matrix. How can we convert it to single number which is equivalent of distance.

OT.sinkhorn, error when an input array contain zeros

I'm getting the following error
Warning: numerical errors at iteration 0
when calling
d_sinkhorn = ot.sinkhorn2(v1, v2, cm, reg)
and v1 or v2 contain zeros.

How to handle this case?
Thanks

Use target class proportions in transport.

Hi,

I want to use the sinkhorn transport and the two regularization method), with different estimation of the target class proportions, similar to the work done here https://hal.archives-ouvertes.fr/hal-01254329/file/OT-multitemp2015-paper.pdf.
For now, the only estimation available is the uniform one, if I not missed something.
In the deprecated classes as OTDA_lpl1, it is possible to custom the weights used (with the ws parameter).
So my questions are:

Why the current transport classes no longer allow the use of customised weights?
There are other classes allowing using these parameters?
There is any kind of issue on the transport estimation by using estimated proportions?

Best regards,
Benjamin.

POT calculate 2D vector EMD distance which have different length.

How to calculate 2D vector emd distance using POT? etc. I have these 2 vector:

[(0, 1), (1, 1), (2, 2), (3, 2), (4, 1), (5, 1)],
[(0, 1), (1, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]

distance matrix:
5x6

Actually, the 2 vector are words-bag, and distance matrix is the words Euclidean distance, I want using this to calculate Sentence Distance but don't know how to use EMD distance, any help?

Feature request:- convolution Wasserstein distances

Hi, I wish to request to add the code from the paper- http://people.csail.mit.edu/jsolomon/assets/convolutional_w2.compressed.pdf
Their matlab code is here https://github.com/gpeyre/2015-SIGGRAPH-convolutional-ot.git

Thanks
Kowshik

Not in simplex -- two sets of largely different sizes

I am trying to calculate the EMD of two sets. When one set has a few hundred entries and the other has only 2, the EMD calculation fails and returns Problem Infeasible.

Steps to reproduce the behavior:
** SEE BELOW COMMENT FOR FIXED SCRIPT **

Expected behavior
Should return EMD around 1, instead says that the sets spherEng1 and pencilEnergy are not in the simplex

Screenshots
Here is comparing the EMDs calculated for less densely tiled to most densely tiled (number of particles = number of segments) with the two element set

Desktop (please complete the following information):

OS: [MacOSX]
Python version [3.6]
POT installed with pip

import platform; print(platform.platform())
Darwin-16.7.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) \n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')
import numpy; print("NumPy", numpy.version)
('NumPy', '1.15.4')
import scipy; print("SciPy", scipy.version)
('SciPy', '1.1.0')
import ot; print("POT", ot.version)
('POT', '0.5.1')

semi supervised da - correction + example

Hi,

I may have found a line that could lead to errors when using OT objects in a supervised DA setting.

at line 992 in da.py I propose to change classes = np.unique(ys) into classes = [c for c in np.unique(ys) if c != -1] which would enable people to use source samples with no labels to find the optimal couplin.

I also propose to add an example for semi supervised DA.

Do you agree with these propositions ? If yes, I'll open a PR.

setup.py needs to specify an encoding when opening README.md

README.md contains non-ascii characters, so setup.py will fail if the locale is ascii, e.g.

$ LC_ALL=C python setup.py install
Traceback (most recent call last):
  File "setup.py", line 26, in <module>
    import pypandoc
ModuleNotFoundError: No module named 'pypandoc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 29, in <module>
    README = open(os.path.join(ROOT, 'README.md')).read()
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5501: ordinal not in range(128)

This is easily fixed by using open(..., encoding="utf-8") (or, if you want Py2 compatibility, codecs.open(..., encoding="utf-8")).

Barycentric mapping and label propagation

Could anyone tell me that how do you transfer the label after using barycentric mapping? Cause you don't have labels for target, you would like to transfer source labels to the mapping points?

Is this available in some part of the code ? Thank you so much.

Usage of ot.gpu

hi,
I search the full document https://pot.readthedocs.io/en/latest/index.html but I can not find any usage information about ot.gpu.
I wonder whether there is some document or example about the usage of ot.gpu
Thanks

A litttle help to clarify something.

Hello,

I don't have a formal background in OT, therefore pardon me if I am asking something extremely silly. In the plot_ot_1d.py, for the cost matrix calculation :
M = ot.dist(x.reshape((n, 1)), x.reshape((n, 1)))

It is bit strange because I was expecting cost matrix to be something in between two distributions but it seems that cost matrix is rather in between the samples of the 2 distribution. Kindly reply.

Text mining

Hello there,

i am currently toying around POT comparing texts. I have a dictionary of 46k terms and im trying to compare 120k documents. Every document has at most 10-15 words(bibtex titles) so comparing 2 distributions - texts will result in comparing 2 [46000,1] vectors with at most 10 non zero entries.

Are there any suggestions in the process because the naive approach is too slow. Comparing 10k documents takes 2 days.
( emd2(p,q,C) p,q are [46k,1] C is [46k,46k]
Sinkhorn is even slower!

Thanks in advance!

A pure numpy implementation of the network simplex algorithm, `ot.emd(a,b,M)`

Dear Remi,

thank you very much for releasing and documenting this package - it's really helpful to learn from 👍 I was wondering if there's a simpler/more explicit way to learn the network simplex algorithm?

I was looking on the web for very simple 1D emd code to help to compare the number of computational steps and accuracy of the unregularized linear program algorithm, with the regularized Sinkhorn-Knopp algorithms which you have here in pure numpy.

I could only find MATLAB code though, and I don't have access to MATLAB? I tried converting it to Octave, but the linear programming solver in Octave seems to be different to the MATLAB one, and I could'nt get the same values as ot.emd(a,b,M). I tried both Gaussians, and simpler discrete distributions, but I couldn't find the problem?

It would be really great, and help my understanding a lot, if I could find some simple numpy code to calculate the emd by setting up the linear program as the network simplex algorithm, as you do in EMD_wrapper.cpp. I'm trying to do this with, linprog-simplex?

I just wondered if you know of any such code which is available? It would be really helpful for people new to optimal transport to see how the different algorithms work, (side by side in numpy), and compare their accuracy at a basic level.

All the best,

Ajay

Free support barycenter examples

Hello,

I have been working on your free support barycenter examples.
https://github.com/rflamary/POT/blob/master/examples/plot_free_support_barycenter.py

I went through the code and there is something which looks wrong to me. To plot your figure you used :

for (x_i, b_i) in zip(measures_locations, measures_weights):
color = np.random.randint(low=1, high=10 * N)
pl.scatter(x_i[:, 0], x_i[:, 1], s=b * 1000, label='input measure')

but I think it should be $s=b_i * 1000$ instead of $s=b * 1000$.

I can make a PR to correct it if it is a mistake.

Docstring issues in ot.da

There are some docstring inconsistencies in the docstring of some classes such as the Sinkhorn class: the parameter mapping is not in the signature call, so, how does one control the mapping now? there is an "out_of_sample_map" parameter in the call upon class construction which should be explained in the docstring of these classes.

Example:

Init signature: SinkhornLpl1Transport(reg_e=1.0, reg_cl=0.1, max_iter=10, max_inner_iter=200, log=False, tol=1e-08, verbose=False, metric='sqeuclidean', norm=None, distribution_estimation=<function distribution_estimation_uniform at 0x7effd9dd6400>, out_of_sample_map='ferradans', limit_max=inf)
Docstring:
Domain Adapatation OT method based on sinkhorn algorithm +
LpL1 class regularization.

Parameters

reg_e : float, optional (default=1)
Entropic regularization parameter
reg_cl : float, optional (default=0.1)
Class regularization parameter
mapping : string, optional (default="barycentric")
The kind of mapping to apply to transport samples from a domain into
another one.
if "barycentric" only the samples used to estimate the coupling can
be transported from a domain to another one.
metric : string, optional (default="sqeuclidean")
The ground metric for the Wasserstein problem
norm : string, optional (default=None)
If given, normalize the ground metric to avoid numerical errors that
can occur with large metric values.

PEP8 cleanup

running pyflakes gives me:

$ pyflakes ot/*/*.py ot/*.py examples/*.py
ot/optim.py:9: '.bregman.sinkhorn_stabilized' imported but unused
ot/utils.py:96: undefined name 'reduce'
examples/demo_OTDA_classes.py:6: 'numpy as np' imported but unused
examples/demo_barycenter_1D.py:12: 'mpl_toolkits.mplot3d.Axes3D' imported but unused
examples/demo_barycenter_1D.py:14: 'matplotlib.colors.colorConverter' imported but unused

when running flake8

$ flake8 ot/*/*.py ot/*.py examples/*.py

you'll see that you have a lot of pep8 style violations.

Need to specify extra_link_args to compile

Hi, just wanted to mention that I needed to add extra_link_args=["-stdlib=libc++"] inside ext_modules = cythonize(Extension( ... )) to get the cython code to compile. I'm using Python 3.7 on Mojave 10.14.4 with...

$ gcc --version

Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Getting these things to compile is always tricky for me. Let me know if you want more info.

Another issue on Windows with ot.da.OTDA()

The following script:

import numpy as np
import ot
a = np.random.rand(1500, 95)
b = np.random.rand(50000, 95)
opt = ot.da.OTDA()
opt.fit(a,b)
print(np.sum(opt.G))

returns me on Windows
0

Windows-10-10.0.15063-SP0
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
NumPy 1.12.1
SciPy 0.19.0
POT 0.3.1

while it returns me on debian
0.716

Linux-3.2.0-4-amd64-x86_64-with-debian-7.11
('Python', '2.7.3 (default, Jun 21 2016, 18:38:19) \n[GCC 4.7.2]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('POT', '0.3.1')

If I'm not mistaken, it should always return 1

EMD and Sinkhorn

Hi there,

great library :) I have some small questions.

I installed the package and ran some simple test, do I understand it right that:

ot.emd and ot.lp.emd are the same
ot.emd2 and ot.lp.emd2 are the same
emd and emd2 use the same underlying algorithm, but one prints the transportation matrix and the other one the distance
ot.sinkhorn(method='sinkhorn'), ot.bregman.sinkhorn(method='sinkhorn') and ot.bregman.sinkhorn_knopp are the same
ot.sinkhorn2 and ot.bregman.sinkhorn2(method='sinkhorn') are the same
sinkhorn(method='sinkhorn') and sinkhorn2 use the same underlying algorithm, but one prints the transportation matrix and the other one the distance?

I observed that for solving one transportation problem emd is much faster than sinkhorn, but I actually expected it to be vice versa, that's one reason to use it.... how come?
If I'm using ot.gpu.sinkhorn, how could I calculate the distance from the transportation matrix?

Thanks in advance for clarification on these issues.
Best, Patrick

Cannot install library on Ubuntu 14.04.5 LTS

Hello there,

I wanted to install the library in a PC that has a GPU to test the parallelism of optimal transport.
I cannot though because there is an error on build:

ot/lp/network_simplex_simple.h:234:46: error: macro "MAX" requires 2 arguments, but only 1 given
MAX(std::numeric_limits::max()),

I'm guessing there is a define of MAX from an older user somewhere? If that's the case can you help me giving me some insights where should i look?

OS : Ubuntu 14.04.5 LTS

Thank you for your time in advance!

sphinx-gallery

you should use https://github.com/sphinx-gallery/sphinx-gallery to generate your example gallery.

you would have the notebooks for free with download links at the bottom of the page.

also you build the doc I had to comment out this in conf.py :

# sys.path.insert(0, os.path.abspath("../.."))
#sys.setrecursionlimit(1500)



# class Mock(MagicMock):
#     @classmethod
#     def __getattr__(cls, name):
#         return Mock()

# MOCK_MODULES = [ 'emd','ot.lp.emd']
# sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)

thanks for making these tools easily available !

remove import plot from ot/init.py

Hello,

Would it be possible to remove line 19 from . import plot from ot/init.py ?

It automatically loads matplotlib which could generates an error when using an instance without graphical display.

Thanks in advance !

Outdated method call in "1D Wasserstein Barycenter demo"

Hi, I'd just like to mention that there is a small issue with one of the demos

On the "1D Wasserstein Barycenter demo" of the notebooks (notebooks/plot_barycenter_1D.ipynb) on lines 24 and 25 of the second code block, the Gaussian distributions are generated with

a1 = ot.datasets.make_1D_gauss(n, m=20, s=5)

However, this method was renamed to get_1D_gauss

a1 = ot.datasets.get_1D_gauss(n, m=20, s=5)

The demo runs without issues after that is fixed

Thanks !

Domain adaptation Classes

We should change the domain adaptation Classes to be more sklearn compliant.

Main issues:

Use CamelCase for classes
Use init for setting parameters and instead of fit.

@agramfort proposed to Creat new Clases with proper names and begin deprecating the old classes.

I think it is a good move.

Maximum input sample size

I have two data samples, each of size 100k, from two distributions in the 50-dimensional space, say n = 100k, p = 50. Can I use this OT library to compute the earth-mover distance between these two empirical data samples?

Perform proper Pytest

For the moment, we only perform doctest and a simple loading of the module.

We should begin to convert and propose tests for all functions and classes.

fail when using "pip install POT"

When I use "pip install POT", it failed. It depended on Cython. However, it seems that it forgets to tell pip that it depends on Cython.

I solve this problem by install Cython first. However, if we write both Cython and POT into requirements.txt, the installation will fail.

Could anyone solve that?

Convert to sphinx-gallery only for notebooks

We should use sphinx Gallery to generate automatically the notebooks.

To do that we need to provide proper rst documentation in the examples as in
https://sphinx-gallery.readthedocs.io/en/latest/tutorials/plot_notebook.html#sphx-glr-tutorials-plot-notebook-py

Sklearn compliant datasets functions

We should rename the datasets.get_* function to datasets.get_* in order to be more sklearn compliant.

Also it should be possible to give the rng as input as in sklearn.

Feature request:- Away Frank Wolf

Hello,

I am writing you today to discuss the possible implementation of Frank Wolf variants which can be interesting to solve the GW problem. While the standard FW converges slowly in O(1/t), other methods converge faster. One of the faster method is the away step Frank Wolf which converges linearly (https://arxiv.org/pdf/1511.05932.pdf).

This was suggested by Thomas Kedreux.

Unusable parameter log for EMDTransport

Hi,

I need to get some values from transport computation, as cost matrix, value of minimisation...
Some of these values are stored in the log. But when I do:

ot_emd,log = ot.da.EMDTransport(norm="max",log=0)

I get the following error:

TypeError: __init__() got an unexpected keyword argument 'log'

In the EMDTransport class declaration there is:

"""
Parameters
----------
...
log : int, optional (default=0)
Controls the logs of the optimization algorithm
..."""

So the question:
Is it voluntary not to be able to recover the log with this class ? And so to get it back I should directly call the emd function without using the EMDTransport class.

Another question:
I want to get the min value computed by the minimisation problem (first with EMD but also with sinkhorn) to find a link between effectiveness of transport and OA obtained in classification, how can I do and are there some others values usable to get this kind of information?

At the end my goal is estimate several transports and choose automatically the best.

Regards

Benjamin

Displacement interpolation?

Hello,

thank you for POT!
In the [1] reference, the interpolation is discussed and an example given (see link below).
Is it feasible to do this in POT (the matlab code is https://github.com/gpeyre/2013-SIIMS-ot-splitting ) or to extend POT to do it?

Best regards

Thomas

I have successfully installed the POT library (in windows OS), but I have issue with the emd function

In the following example, the emd function works well:

a =[0.5, 0.5], b= [0.5, 0.5], M =[[0., 1.], [1., 0.]]
G0 = ot.emd(a,b,M), G0 =array([[ 0.5, 0. ], [ 0. , 0.5]])

In the below example (failure case), the output of emd function is of zero's

a = [0.5, 0.5], b = [0.2, 0.8]
G0 = ot.emd(a,b,M) G0 = array([[ 0., 0.], [ 0., 0.]])

Failure case on the example mentioned in the documentation (Demo_1D_OT.ipynb), again the output of emd is a matrix of zero's (see the figure)

n=100, a= ot.datasets.get_1D_gauss(n, m=20, s=5), b= ot.datasets.get_1D_gauss(n, m=60, s=10), x=np.arange(n, dtype=np.float64), M = ot.dist(x.reshape((n,1)), x.reshape((n,1))), M/=M.max(), G0=ot.emd(a,b,M)

%matplotlib inline, pl.figure(1), ot.plot.plot1D_mat(a,b,G0, 'OT Matrix G0'), pl.show()

Please let me know how to resolve the issue

Possible bugs in greenkhorn algorithm

Describe the bug

Whenever greenkhorn is called with log=True, the algorithm will return error.
greenkhorn does not allow for list input (I have to convert a, b, M as np.array manually), while sinkhorn_knopp does (in sinkhorn_knopp, the first three lines of code convert list to np.asarray).

To Reproduce
Steps to reproduce the behavior:

a = [.5, .5]
b = [.5, .5]
M = [[0., 1.], [1., 0.]]
a, b, M = np.array(a), np.array(b), np.array(M)
(greenkhorn will not allow list object, and have to convert manually)
T, log = ot.bregman.greenkhorn(a, b, M, 0.001, log=True)
(this line encounters error)

Expected behavior
It seems that I can never return the log dictionary using greenkhorn, and I am not sure why.
When running original sinkhorn, the log argument works fine.
Besides, I hope greenkhorn could allow for list as input (this is not an bug, but maybe it could be added in the future).

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: MacOSX
Python version [2.7,3.6]: 3.6
How was POT installed [source, pip, conda]: conda

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

import platform; print(platform.platform())
Darwin-18.2.0-x86_64-i386-64bit
import sys; print("Python", sys.version)
Python 3.6.7 |Anaconda custom (64-bit)| (default, Oct 23 2018, 14:01:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
import numpy; print("NumPy", numpy.version)
NumPy 1.15.4
import scipy; print("SciPy", scipy.version)
SciPy 1.1.0
import ot; print("POT", ot.version)
POT 0.5.1

Additional context
Add any other context about the problem here.

Gromov-Wasserstein Distance between 1-D vectors

Remove rst compiled doc and notebooks from repository

The current documentation relies on sphinx-gallery which cannot be executed on readthedoc so we have to compile everything to rst and notebooks for a proper documentation.

This will make the repo explode so we should find a way to have an updated doc (staying on readthedoc if possible) probably by keeping a compiled version of the doc on a separate repository.

The compiled notebooks also are very nice (they allow a quick look at how the toolbox works) but should be stored also in a separate repo.

pythonot / pot Goto Github PK

pot's People

Stargazers

Watchers

Forkers

pot's Issues

Reform changes

Name Shifting

Assignements

Parameters

In the following example, the emd function works well:

In the below example (failure case), the output of emd function is of zero's

Failure case on the example mentioned in the documentation (Demo_1D_OT.ipynb), again the output of emd is a matrix of zero's (see the figure)

Please let me know how to resolve the issue

Recommend Projects

Recommend Topics

Recommend Org