alphacsc / alphacsc Goto Github PK

View Code? Open in Web Editor NEW

116.0 11.0 37.0 12.6 MB

Convolution dictionary learning for time-series

Home Page: https://alphacsc.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 99.93% Makefile 0.07%

dictionary-learning convolutional-sparse-coding sparse-coding meg nips-2017 neurips-2018

alphacsc's Introduction

alphaCSC: Convolution sparse coding for time-series

This is a library to perform shift-invariant sparse dictionary learning, also known as convolutional sparse coding (CSC), on time-series data. It includes a number of different models:

univariate CSC
multivariate CSC
multivariate CSC with a rank-1 constraint¹
univariate CSC with an alpha-stable distribution²

A mathematical descriptions of these models is available in the documentation.

Installation

To install this package, the easiest way is using pip. It will install this package and its dependencies. The setup.py depends on numpy and cython for the installation so it is advised to install them beforehand. To install this package, please run one of the two commands:

(Latest stable version)

pip install alphacsc

(Development version)

pip install git+https://github.com/alphacsc/alphacsc.git#egg=alphacsc

(Dicodile backend)

pip install numpy cython
pip install alphacsc[dicodile]

To use dicodile backend, do not forget to set MPI_HOSTFILE environment variable.

If you do not have admin privileges on the computer, use the --user flag with pip. To upgrade, use the --upgrade flag provided by pip.

To check if everything worked fine, you can run:

python -c 'import alphacsc'

and it should not give any error messages.

Quickstart

Here is an example to present briefly the API:

import numpy as np
import matplotlib.pyplot as plt
from alphacsc import BatchCDL

# Define the different dimensions of the problem
n_atoms = 10
n_times_atom = 50
n_channels = 5
n_trials = 10
n_times = 1000

# Generate a random set of signals
X = np.random.randn(n_trials, n_channels, n_times)

# Learn a dictionary with batch algorithm and rank1 constraints.
cdl = BatchCDL(n_atoms, n_times_atom, rank1=True)
cdl.fit(X)

# Display the learned atoms
fig, axes = plt.subplots(n_atoms, 2, num="Dictionary")
for k in range(n_atoms):
    axes[k, 0].plot(cdl.u_hat_[k])
    axes[k, 1].plot(cdl.v_hat_[k])

axes[0, 0].set_title("Spatial map")
axes[0, 1].set_title("Temporal map")
for ax in axes.ravel():
    ax.set_xticklabels([])
    ax.set_yticklabels([])

plt.show()

Dicodile backend

AlphaCSC can use a dicodile-based backend to perform sparse encoding in parallel.

To install dicodile, run pip install alphacsc[dicodile].

Known OpenMPI issues

When self-installing OpenMPI (for instance to run dicodile on a single machine, or for continuous integration), running the dicodile solver might end up causing a deadlock (no output for a long time). It is often due to communication issue between the workers. This issue can often be solved by disabling Docker-related virtual NICs, for instance by running export OMPI_MCA_btl_tcp_if_exclude="docker0".

Bug reports

Use the github issue tracker to report bugs.

Cite our work

If you use this code in your project, please consider citing our work:

Dupré La Tour, T., Moreau, T., Jas, M., & Gramfort, A. (2018). Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals. Advances in Neural Information Processing Systems (NIPS).↩
Jas, M., Dupré La Tour, T., Şimşekli, U., & Gramfort, A. (2017). Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding. Advances in Neural Information Processing Systems (NIPS), pages 1099--1108.↩

alphacsc's People

Contributors

Stargazers

Watchers

alphacsc's Issues

how to install the alphacsc on anaconda?

I cannot use the command
pip install numpy cython
pip install alphacsc

Why?

typo in bench_methods_run.py

There is a typo in bench_methods_run.py for methods: run_fista and run_l_bfgs.

In method run_fista :

pobj, times, d_hat, z_hat = learn_d_z(X, n_atoms, n_times_atom, func_d=update_d_block, reg=reg, n_iter=n_iter, random_state=random_state, n_jobs=1, solver_z='fista', solver_z_kwargs=dict(max_iter=2), ds_init=ds_init, verbose=verbose)

should be replaced by:

pobj, times, d_hat, z_hat, reg = learn_d_z(X, n_atoms, n_times_atom, func_d=update_d_block, reg=reg, n_iter=n_iter, random_state=random_state, n_jobs=1, solver_z='fista', solver_z_kwargs=dict(max_iter=2), ds_init=ds_init, verbose=verbose)

same modification can be applied to run_l_bfgs

potential issues?

First of all thank you a lot for sharing this code, it is extremely helpful. I found some issues after I cloned the repo from github and tried to run it.

convolutional_dictionary_learning.py
line 336 - 362. Issue in the initialization of BatchCDL and GreedyCDL. By changing with super(BatchCDL, self) and super(GreedyCDL, self) respectively, I have solved the problem.
line 91. The documentation should be changed. There a conflict between the option 'chunks' for D_init and the one in init_dict.py, where at line 77 you make use of 'chunk' without s.

Then I realized that for some runs the code, the fit method was not working and it raised this error
line 232, in learn_d_z_multi
pobj, times, D_hat, z_hat = _batch_learn(greedy=True, **kwargs)
File "/home/vanessa/src/neuralanalysis/neuralanalysis/alphacsc/learn_d_z_multi.py", line 316, in _batch_learn
D_hat = np.concatenate([D_hat, new_atom[None]])
ValueError: all the input array dimensions except for the concatenation axis must match exactly

After a while I came up with a plausible reason, which I think is related to the initialization when using the 'chunk' option. The error pops up when the atom's dimension is smaller than the required one.

init_dict.py
in get_max_error_dict function, line 355. If I understood correctly, we take the patch based on the max reconstruction error. In the unravel_index of the vector, should the dimensions be

n0, t0 = np.unravel_index(i0, (n_trials, n_times - n_times_atom)) instead of
n0, t0 = np.unravel_index(i0, (n_trials, n_times)?

This change fixed the bug. If I am not wrong you are selecting the best patch, but sometimes, with the previous reshape, t0 is higher than (n_times - n_times_atom). So the following array
d0 = X[n0, :, t0:t0 + n_times_atom][None]
becomes smaller than the n_times_atom. Does it make sense?

Thank you a lot in advance!

Experiments from the paper: Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

I'm trying to replicate some experiments from the paper: "Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals " (mainly figure 3) but I'm not sure how is GreedyCDL supported for univariate signals (is't only by forcing rank1=False and uv_constraint='auto' )?
Also, for simulating the data, are you using load_data in simulate.py or is there another function that simulates data for multiple channels?

Cryptic error

When I try to make n_iter=20 or something small just to quickly try things, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/Desktop/projects/github_repos/alphacsc/examples/multicsc/plot_sample_evoked_response.py in <module>()
    127 ###############################################################################
    128 # Fit the model and learn rank1 atoms
--> 129 cdl.fit(X_split)
    130 
    131 ###############################################################################

~/Desktop/projects/github_repos/alphacsc/alphacsc/convolutional_dictionary_learning.py in fit(self, X, y)
    197             random_state=self.random_state, n_jobs=self.n_jobs,
    198             name=self.name, raise_on_increase=self.raise_on_increase,
--> 199             sort_atoms=self.sort_atoms)
    200 
    201         self._pobj, self._times, self._D_hat, self._z_hat, self.reg_ = res

~/Desktop/projects/github_repos/alphacsc/alphacsc/learn_d_z_multi.py in learn_d_z_multi(X, n_atoms, n_times_atom, n_iter, n_jobs, lmbd_max, reg, loss, loss_params, rank1, uv_constraint, eps, algorithm, algorithm_params, detrending, detrending_params, solver_z, solver_z_kwargs, solver_d, solver_d_kwargs, D_init, D_init_params, unbiased_z_hat, use_sparse_z, stopping_pobj, raise_on_increase, verbose, callback, random_state, name, window, sort_atoms)
    230         pobj, times, D_hat, z_hat = _batch_learn(greedy=False, **kwargs)
    231     elif algorithm == "greedy":
--> 232         pobj, times, D_hat, z_hat = _batch_learn(greedy=True, **kwargs)
    233     elif algorithm == "online":
    234         pobj, times, D_hat, z_hat = _online_learn(**kwargs)

~/Desktop/projects/github_repos/alphacsc/alphacsc/learn_d_z_multi.py in _batch_learn(X, D_hat, z_hat, compute_z_func, compute_d_func, obj_func, end_iter_func, n_iter, lmbd_max, reg, verbose, greedy, random_state, name, uv_constraint, window)
    292             raise ValueError('The greedy method needs at least %d to learn %d '
    293                              'atoms. Got only %d. Please increase n_iter.'
--> 294                              % (n_iter_by_atom * n_iter, n_atoms, n_iter))
    295 
    296     # monitor cost function

ValueError: The greedy method needs at least 20 to learn 40 atoms. Got only 20. Please increase n_iter.

My n_iter is already 20. So not sure why it's complaining ...

Add numpydoc checks to CI jobs

API add post-processing functions for activations

A particularly tricky part to look at the results in neuroscience in alphcsc is how to process the resulting activations. In particular, we could add:

a helper to shift the activations' times to match the peak of an atom. Or maybe think of better alignment (start of non-zero atom?)
a helper to compute epoched activations, that takes into account the event offset and output an Epoch object?

Fix bug in compute objective

Fix

alphacsc/alphacsc/loss_and_gradient.py

Line 70 in 0223c56

obj += reg * safe_sum(z_hat)

and

alphacsc/alphacsc/loss_and_gradient.py

Line 72 in 0223c56

obj += np.sum(reg * safe_sum(z_hat, axis=(1, 2)))

by replacing safe_sum(z_hat) with safe_sum(abs(z_hat))

Somato Data Filter Question

Question about the filter and sample rate on the somato data used for plot somato mu waves. When I look at the info variable, I see that the low-pass is 100Hz and the sample rate is 150Hz. Shouldn't the low-pass be less than 1/2 of the sample rate?

These settings could be leading to some spurious findings if correct.

Best,
Tim.

Parameter tol in LGCD depends on the scale of z

The tol parameter in the LGCD algorithm is dependent on the activation magnitude. It should be normalized to avoid getting many Regularization parameter reg is too large.

shape error for CSC on 1D time series

I get this:

~/Desktop/projects/github_repos/alphacsc/alphacsc/learn_d_z.py in learn_d_z(X, n_atoms, n_times_atom, func_d, reg, lmbd_max, n_iter, random_state, n_jobs, solver_z, solver_d_kwargs, solver_z_kwargs, ds_init, ds_init_params, sample_weights, verbose, callback, stopping_pobj)
    113         Regularization parameter used.
    114     """
--> 115     n_trials, n_times = X.shape
    116 
    117     rng = check_random_state(random_state)

ValueError: not enough values to unpack (expected 2, got 1)

The shape of X for me is (621620,)

add tests

Just a reminder for me to add tests and travis

valueError: numpy.ufunc size changed, may indicate binary incompatibility

I can't import this library successfully but all dependencies are already satisfied

DOC move to using BIDS in examples with MEG data

Update dicodile encoder params

Update DicodileEncoder params only with tol and reg after tomMoral/dicodile#47 is fixed and a new version is released.

The problem when use pip install alphacsc

Hello,
when I use the command "pip install alphacsc", it always failed. Could you please tell me how to solve this problem? Thanks a lot.

fix typo in simulate.py file

The function load_data in simulate.py with f_noise=False returns an error: 'tuple' object cannot be interpreted as an integer
It can be fixed by replacing:
signal += sigma * rng.randn(signal.shape)
by
signal += sigma * rng.randn(*signal.shape)

A bug appears

when i try to import it,a bug appears,i do not know how to fix it,could you please help me?

type object 'alphacsc.cython_code.sparse_conv.array' has no attribute 'reduce_cython'

ENH get_max_error_patch should take window parameter

When using window=True, the selection of the max error patch should be changed to take into account a weighted norm for the error computation, like it is done in dicodile.

Also, it would be nice that get_max_error_patch returns a patch, whatever the form of the dictionary and to move the get_uv part in DSolver.get_max_error_dict, with probably a projection step for the Rank1DSolver.

CLN warn when max_iter is lower than the number of coordinates in z

With lgcd, we can sometimes get sparse coding with fewer iterations than the number of coordinates.
This could lead to bad results. It would be nice to have a warning in this case.

PERF investigate numba perf for coordinate descent solvers

I don't remember why we did not used numba to accelerate the coordinate descent for loop, which is often critical for algorithms performances but this should be investigated.

This issue should only be treated once #70 is done, so we have an easy way to report performance gain compared to other solvers.

readme is cryptic

@TomDLT @tomMoral : the readme here is cryptic: https://github.com/alphacsc/alphacsc

as github does not render mathematical equations formatted according to LaTeX.

API add a MNE compatible object to perform CSC

We should implement a MNE compatible object with similar API to the mne.preprocessing.ICA object.

This would help a lot the neuroscience users to try this on their data.

Update get_cost in z_encoder

Update get_cost function in _z_encoder to call compute_objective in _z_encoder.

As ztz and ztX are not initialized prior to calling compute_z, this requires some changes. Two possible solutions:

get_cost function can return the value of cost function when z_hat=0 (i.e. 0.5 * np.linalg.norm(self.X[0]) ** 2) if ztz and ztX are not set.
However test_get_cost fails with the following message (final cost after running compute_z is greater than the initial cost):

solver = 'l-bfgs'
X = array([[[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986,
         -0.23415337, -0.23413696,  1.57921282,  0.767434...     1.06667469,  1.16929559,  1.38215899,  0.64870989,
         -0.16711808,  0.14671369,  1.20650897, -0.81693567]]])
D_hat = array([[ 0.85771824,  0.1945646 ,  0.47588238,  0.69484882,  0.57908626,
        -0.30303112,  0.2946003 , -0.04693235...1623,  0.71909456,  0.68934122,  0.06912736,  0.16871127,
        -0.39607166, -0.88370123, -0.15521554,  0.06975264]])
requires_dicodile = None

    @pytest.mark.parametrize('solver, n_trials, rank1',
                             [('l-bfgs', 3, True),
                              ('dicodile', 1, False)])
    def test_get_cost(solver, X, D_hat, requires_dicodile):
        """Test for valid values."""
    
        with get_z_encoder_for(solver=solver,
                               X=X,
                               D_hat=D_hat,
                               n_atoms=N_ATOMS,
                               n_times_atom=N_TIMES_ATOM,
                               n_jobs=2) as z_encoder:
            initial_cost = z_encoder.get_cost()
    
            z_encoder.compute_z()
            z_hat = z_encoder.get_z_hat()
            final_cost = z_encoder.get_cost()
    
>           assert final_cost < initial_cost
E           assert 393.7852276977734 < 144.81588197885912

D_hat      = array([[ 0.85771824,  0.1945646 ,  0.47588238,  0.69484882,  0.57908626,
        -0.30303112,  0.2946003 , -0.04693235...1623,  0.71909456,  0.68934122,  0.06912736,  0.16871127,
        -0.39607166, -0.88370123, -0.15521554,  0.06975264]])
X          = array([[[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986,
         -0.23415337, -0.23413696,  1.57921282,  0.767434...     1.06667469,  1.16929559,  1.38215899,  0.64870989,
         -0.16711808,  0.14671369,  1.20650897, -0.81693567]]])
final_cost = 393.7852276977734
initial_cost = 144.81588197885912
requires_dicodile = None
solver     = 'l-bfgs'
z_encoder  = <alphacsc._z_encoder.AlphaCSCEncoder object at 0x7fef89941eb0>
z_hat      = array([[[0.03148913, 0.        , 0.11964971, ..., 0.        ,
         0.02470083, 0.        ],
        [0.        , 0...0967279, 0.        ],
        [0.12684568, 0.12219768, 0.02416528, ..., 0.        ,
         0.        , 0.        ]]])

ztz and ztX should be set to 0 upon initialization as

        self.ztz = np.zeros((self.n_atoms, self.n_atoms,
                             2 * self.n_times_atom - 1))
        self.ztX = np.zeros((self.n_atoms, self.n_channels,
                             self. n_times_atom))

However test_get_cost fails with the following message (the cost computed in the test is not close to final_cost computed after compute_z is called. ):

solver = 'l-bfgs'
X = array([[[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986,
         -0.23415337, -0.23413696,  1.57921282,  0.767434...     1.06667469,  1.16929559,  1.38215899,  0.64870989,
         -0.16711808,  0.14671369,  1.20650897, -0.81693567]]])
D_hat = array([[ 0.85771824,  0.1945646 ,  0.47588238,  0.69484882,  0.57908626,
        -0.30303112,  0.2946003 , -0.04693235...1623,  0.71909456,  0.68934122,  0.06912736,  0.16871127,
        -0.39607166, -0.88370123, -0.15521554,  0.06975264]])
requires_dicodile = None

    @pytest.mark.parametrize('solver, n_trials, rank1',
                             [('l-bfgs', 3, True),
                              ('dicodile', 1, False)])
    def test_get_cost(solver, X, D_hat, requires_dicodile):
        """Test for valid values."""
    
        with get_z_encoder_for(solver=solver,
                               X=X,
                               D_hat=D_hat,
                               n_atoms=N_ATOMS,
                               n_times_atom=N_TIMES_ATOM,
                               n_jobs=2) as z_encoder:
            initial_cost = z_encoder.get_cost()
    
            z_encoder.compute_z()
            z_hat = z_encoder.get_z_hat()
            final_cost = z_encoder.get_cost()
    
            assert final_cost < initial_cost
    
            X_hat = construct_X_multi(z_hat, D_hat, n_channels=N_CHANNELS)
            cost = compute_objective(X=X, X_hat=X_hat, z_hat=z_hat, reg=0.1,
                                     D=D_hat)
    
>           assert np.isclose(cost, final_cost)
E           assert False
E            +  where False = <function isclose at 0x7f9aa010ac10>(396.9063356164568, 393.7852276977734)
E            +    where <function isclose at 0x7f9aa010ac10> = np.isclose

D_hat      = array([[ 0.85771824,  0.1945646 ,  0.47588238,  0.69484882,  0.57908626,
        -0.30303112,  0.2946003 , -0.04693235...1623,  0.71909456,  0.68934122,  0.06912736,  0.16871127,
        -0.39607166, -0.88370123, -0.15521554,  0.06975264]])
X          = array([[[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986,
         -0.23415337, -0.23413696,  1.57921282,  0.767434...     1.06667469,  1.16929559,  1.38215899,  0.64870989,
         -0.16711808,  0.14671369,  1.20650897, -0.81693567]]])
X_hat      = array([[[ 1.87670366e-02,  1.58087090e-02,  7.03257398e-02,
          1.36799839e-01,  3.19383914e-02,  5.76223312e-02...09410227e-01,  8.94328662e-02,
          6.66489290e-02,  4.55550879e-02,  1.00303005e-01,
         -1.48259646e-02]]])
cost       = 396.9063356164568
final_cost = 393.7852276977734
initial_cost = 432.6504303161992
requires_dicodile = None
solver     = 'l-bfgs'
z_encoder  = <alphacsc._z_encoder.AlphaCSCEncoder object at 0x7f9a7bbf7e50>
z_hat      = array([[[0.03148913, 0.        , 0.11964971, ..., 0.        ,
         0.02470083, 0.        ],
        [0.        , 0...0967279, 0.        ],
        [0.12684568, 0.12219768, 0.02416528, ..., 0.        ,
         0.        , 0.        ]]])

alphacsc/tests/test_z_encoder.py:274: AssertionError

Inconsistency between univariate and multivariate models.

⚠️ The API is not consistent between the different models. ⚠️

Models:

(1) Univariate CSC (learn_d_z)
(2) Univariate CSC with alpha-stable noise (learn_d_z_weighted)
(3) Multivariate CSC (learn_d_z_multi or BatchCDL) (with or without rank-1 constraint)

API differences:

(1) and (2) use z_hat.shape = (n_atoms, n_trials, n_times) while (3) uses z_hat.shape = (n_trials, n_atoms, n_times)
(1) and (2) do not have a class API.
(1) and (3) use different parameter name (ds_init and D_init)

Also, (1) and (2) do not implement the following features

refitting z_hat on a frozen support for unbiasing
temporal window re-parametrization
~~chunk initialization strategy~~ fc38373
~~scaled regularization parameter~~ dfe1b7b
LGCD solver for the Z-step
greedy algorithm, online algorithm
restart of zero-activated atoms
atom sorting by variance

A solution would be to make the functions (1) and (2) private, to ease the use of (3) in the univariate case, and to make a new class for model (2) (using internally (3) instead of (1)).

TypeError: super() takes at least 1 argument (0 given)

Greetings.

In trying to run any of the examples, the call to convolutional_dictionary_learning.py consistently gives the above error. From online posts it seems like it is Python3 syntax that isn't compatible with 2.7. Is that a bug or am I simply misunderstanding which python you work in?

Thanks!

JDS

add more examples to website

I think it would be useful to have more examples on the website and make it more pedagogic if we want neuro folks to adopt our code. Just a few ideas here:

dipole fit on motor rhythm in MNE sample data
show recovered artifacts on MNE sample data
simulated and recovered atoms for multivariate CSC

happy to help if needed :)

Document exclusion of docker-related NICs

When using the dicodile backend, if OpenMPI is self-installed rather than provided (e.g. to test on a single machine, or for CI) it may be necessary to disable docker-related virtual NICs, for instance by running export OMPI_MCA_btl_tcp_if_exclude="docker0"

more examples on real data

Try on https://github.com/voytekresearch/Cole_2017/blob/master/data.mat from voytek lab

Strange joblib-related errors?

When I attempted to set n_jobs > 1 for univariate CSC, I got an error in the middle of learning:

OS: Ubuntu 16.04
joblib version: 0.13.2
alphacsc version: Installed from master branch

import numpy as np
import alphacsc
x = np.fromfile('test-channel.dat', dtype='<f8')
fs = 1250
n_trials = 5
n_minutes_per_trial = 0.5
n_samples_per_trial = int(n_minutes_per_trial * 60 * fs)
data = x[:n_samples_per_trial * n_trials].reshape(n_trials, n_samples_per_trial)

_, _, d_hat, z_hat, _ = alphacsc.learn_d_z(data,
                                           n_atoms=10,
                                           n_times_atom=int(fs),
                                           reg=5,
                                           n_iter=20,
                                           solver_z='l-bfgs',
                                           solver_z_kwargs=dict(factr=1e9),
                                           solver_d_kwargs=dict(factr=1e2),
                                           random_state=0,
                                           n_jobs=8,
                                           verbose=10
                                          )

ValueError                                Traceback (most recent call last)
<ipython-input-39-b352a7f31058> in <module>
     29                                            random_state=0,
     30                                            n_jobs=8,
---> 31                                            verbose=10
     32                                           )
     33 

~/code/alphacsc/alphacsc/learn_d_z.py in learn_d_z(X, n_atoms, n_times_atom, func_d, reg, lmbd_max, n_iter, random_state, n_jobs, solver_z, solver_d_kwargs, solver_z_kwargs, ds_init, ds_init_params, sample_weights, verbose, callback, stopping_pobj)
    164                              solver=solver_z, b_hat_0=b_hat_0,
    165                              solver_kwargs=solver_z_kwargs,
--> 166                              sample_weights=sample_weights)
    167             times.append(time.time() - start)
    168 

~/code/alphacsc/alphacsc/update_z.py in update_z(X, ds, reg, z0, debug, parallel, solver, b_hat_0, solver_kwargs, sample_weights)
     63         my_update_z(X, ds, reg, z0, i, debug, solver, b_hat_0, solver_kwargs,
     64                     sample_weights)
---> 65         for i in np.array_split(np.arange(n_trials), parallel.n_jobs))
     66     z_hat = np.vstack(zhats)
     67 

~/applications/anaconda3/envs/ml/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
    932 
    933             with self._backend.retrieval_context():
--> 934                 self.retrieve()
    935             # Make sure that we get a last message telling us we are done
    936             elapsed_time = time.time() - self._start_time

~/applications/anaconda3/envs/ml/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/applications/anaconda3/envs/ml/lib/python3.6/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

~/applications/anaconda3/envs/ml/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    423                 raise CancelledError()
    424             elif self._state == FINISHED:
--> 425                 return self.__get_result()
    426 
    427             self._condition.wait(timeout)

~/applications/anaconda3/envs/ml/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

ValueError: need at least one array to concatenate

The strange thing is that I've gotten different kinds of errors running the same code, such as "assignment destination is read-only". However, when I checked the input array flags, everything looked ok:

data.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Moreover, the errors are not always reproducible when I run this code, and I can't figure out why. But in general I encounter much fewer of them if I set n_jobs=1

TypeError when learning full rank atoms with GreedyCDL on windows

Hello,

I'm getting the following error when trying to use GreedyCDL with rank1=False and solver_z='lgcd':

.....05
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "~\Merlin\alphacsc\alphacsc\update_z_multi.py", line 135, in _update_z_multi_idx
    constants['DtD'] = compute_DtD(D=D, n_channels=n_channels)
  File "~\Merlin\alphacsc\alphacsc\utils\compute_constants.py", line 11, in compute_DtD
    return _compute_DtD_D(D)
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\numba\core\dispatcher.py", line 703, in _explain_matching_error
    raise TypeError(msg)
TypeError: No matching definition for argument type(s) readonly array(float64, 3d, C)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~\Merlin\alphacsc\alphacsc\convolutional_dictionary_learning.py", line 189, in fit
    res = learn_d_z_multi(
  File "~\Merlin\alphacsc\alphacsc\learn_d_z_multi.py", line 202, in learn_d_z_multi
    pobj, times = _batch_learn(greedy=True, **kwargs)
  File "~\Merlin\alphacsc\alphacsc\learn_d_z_multi.py", line 288, in _batch_learn
    z_encoder.compute_z()
  File "~\Merlin\alphacsc\alphacsc\_z_encoder.py", line 278, in compute_z
    self.z_hat, self.ztz, self.ztX = self._compute_z_aux(self.X,
  File "~\Merlin\alphacsc\alphacsc\_z_encoder.py", line 271, in _compute_z_aux
    return update_z_multi(
  File "~\Merlin\alphacsc\alphacsc\update_z_multi.py", line 78, in update_z_multi
    results = Parallel(n_jobs=n_jobs)(
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Users\dumemer\.conda\envs\LG\lib\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "C:\Users\dumemer\.conda\envs\LG\lib\concurrent\futures\_base.py", line 446, in result
    return self.__get_result()
  File "C:\Users\dumemer\.conda\envs\LG\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
TypeError: No matching definition for argument type(s) readonly array(float64, 3d, C)

I'm not sure where that could be coming from. My configuration is:

OS: Windows Server 2016/2019 both (I know)
alphacsc v0.4.1.dev12 (installed from master)
joblib v1.2.0

It seems to work with the l-bfgs z solver though

PERF update benchmarking script to run all methods

The benchmark script benchmarks/bench_method_run.py does not run anymore due to some changes in the API.
It should be fixed, and the solver dicodile should be added too.

Maybe this benchmark could be updated as a benchopt benchmark? Not sure what is the best way to go with this.

overloading info object?

I saw the following lines of code: https://github.com/alphacsc/alphacsc/blob/master/alphacsc/datasets/somato.py#L130-L137. This seems a little dangerous to me as the info object is not meant to store other kinds of data. If you try to do:

from alphacsc.datasets.somato import load_data
from mne.io import write_info

sfreq = 150.
X, info = load_data(sfreq=sfreq, dataset='sample', n_splits=1)
write_info('test.fif', info)

and this threw an error. My personal preference is to simply expose the MNE raw objects in the examples but if this helps in making the examples simpler, that's fine too. But perhaps we should find another solution than overloading the info object?

numba is a dependency?

I got this:

In [1]: %run examples/multicsc/plot_sample_evoked_response.py
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/Desktop/projects/github_repos/alphacsc/alphacsc/utils/compat.py in <module>()
      4 try:
----> 5     import numba
      6     from numba import jit

ModuleNotFoundError: No module named 'numba'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
~/Desktop/projects/github_repos/alphacsc/examples/multicsc/plot_sample_evoked_response.py in <module>()
     41 # Next, we define the parameters for multivariate CSC
     42 
---> 43 from alphacsc import GreedyCDL
     44 cdl = GreedyCDL(
     45     # Shape of the dictionary

~/Desktop/projects/github_repos/alphacsc/alphacsc/__init__.py in <module>()
----> 1 from .update_d import update_d_block
      2 from .learn_d_z import learn_d_z, objective
      3 from .learn_d_z_mcem import learn_d_z_weighted
      4 from .learn_d_z_multi import learn_d_z_multi
      5 from .utils import construct_X, check_random_state

~/Desktop/projects/github_repos/alphacsc/alphacsc/update_d.py in <module>()
      7 from scipy import linalg, optimize
      8 
----> 9 from .utils import construct_X, check_consistent_shape
     10 
     11 

~/Desktop/projects/github_repos/alphacsc/alphacsc/utils/__init__.py in <module>()
      1 # flake8: noqa F401
      2 from .dictionary import get_D, get_uv
----> 3 from .convolution import construct_X, construct_X_multi, _choose_convolve
      4 from .validation import check_random_state, check_consistent_shape
      5 from .profile_this import profile_this

~/Desktop/projects/github_repos/alphacsc/alphacsc/utils/convolution.py in <module>()
     10 
     11 from .. import cython_code
---> 12 from .compat import numba, jit
     13 from .lil import get_z_shape, is_lil
     14 

~/Desktop/projects/github_repos/alphacsc/alphacsc/utils/compat.py in <module>()
     16 
     17     numba = object()
---> 18     numba.int64 = dummy_numba_type()
     19     numba.float64 = dummy_numba_type()
     20 

AttributeError: 'object' object has no attribute 'int64'

Should the readme be updated or numba is not meant to be a dependency?

Deploy GitHub pages job fails

Deploy GitHub pages CI job fails due to ammaraskar/sphinx-action#33

Where to download LFP data rodent_striatum.npy

GreedyCDL vs BatchCDL vs learn_d_z for univariate signals

I was trying to help someone use alphaCSC and realized that it's not straightforward to understand which class to use when. We are also missing some guidelines how to tweak certain parameters (e.g., which solver to choose?).

Empirically it seems that LGCD is better but it is not possible to use it from learn_d_z. I think this function should be deprecated and either GreedyCDL or BatchCDL should be supported also for univariate signals. Zen of Python says:

There should be one-- and preferably only one --obvious way to do it.

The GreedyCDL class should be documented in api.rst. And to use it for univariate signals, one needs to set rank1=False and uv_constraint='joint'.

In the same vein, split_signal is supported only for 2D signals at the moment, but extending it to 1D data will increase the user base with very little effort. Some efforts in making these things consistent would be great!

A bug appears

when I try to run plot_gain.py,a bug appears,i do not know how to fix it,could you please help me?
Exception: Internal MPI error!, error stack:
MPI_Comm_spawn(cmd="D:\Program\Anaconda3\python.exe", argv=0x0000023981BAA990, maxprocs=4, info=0x9c000000, root=0, MPI_COMM_SELF, intercomm=0x0000023981BAA9E0, errors=0x0000000000000000) failed
Internal MPI error! FAILspawn not supported without process manager

LGCD solver issue when using new data larger than a certain size

the LGCD solver (which is the default for BatchCDL) seems to run into some issues when trying on new data larger than a certain size. Specifically, when I use the transform function, it freezes above a certain size threshold but works perfectly below it. None of the other solvers seem to have the issue, but they also dont seem to work as well.

ENH add optinal torch-based solver

In our recent paper with @Malezieux, we achieved faster performances with almost no performance drop with a torch-based algorithm.
It would be nice to implement such algorithm in alphcsc, with an optional dependency on pytorch.

Where to add this algorithm is not totally straightforward. I think the easier would probably be to implement a separate algorithm similar to _batch_learn and _online_learn. But another possibility would be to refactor the code to better take into account the bi-level optimization framework (less alternate minimization), which would ease this integration (and follow up that will unfold from our work on the field).

transform_inverse -> inverse_transform

the sklearn API is inverse_transform and not transform_inverse

see the glitch in https://alphacsc.github.io/stable/generated/alphacsc.BatchCDL.html?highlight=batchcdl#alphacsc.BatchCDL for example

N_TRIALS=5 fails some tests

The tests pass with N_TRIALS=2 in conftest.py, but fail for N_TRIALS=5.

Failing tests are:

`alphacsc/tests/test_learn_d_z_multi.py::test_transformers[True-alternate_adaptive-separate-OnlineCDL]`
`alphacsc/tests/test_learn_d_z_multi.py::test_transformers[True-alternate-separate-OnlineCDL]`

for rank1, solver_d, uv_constraint, klass parameter values respectively.

DOC master doc is not deployed automatically

Currently, the doc needs to be updated manually.
Maybe we could add doc build/deploy step.

DOC example for population study

We should provide an example where we learn atoms from multiple subjects at once.
The script should leverage the OnlineCDL class and potentially split each subject into sub-windows.

A potential dataset for such task could be a BCI dataset such as: Moabb BNCI2014001. This dataset contains only 18 recordings, for 9 subjects so it is not too big to do the processing. WDYT @agramfort ?

ENH improve integration with encoder API

Things that we can do/discuss for alphacsc/dicodile integration. Let's discuss those on monday.

Add tests for the z_encoder behavior and the different backends.
Implement a first dicodile powered Encoder backend
Overall, avoid calling get_z_hat if not strictly necessary. This is costly in communication and all operations that can be distributed are more efficient.
In compute_d_multi, give z_encoder instead of X, z_hat, D_hat, constants. This way, depending on the algorithm, it can either call get_sufficient_statistic or get_z_hat depending on what it needs. Maybe also we can see if we can implement some other operations in update_d_multi this way.
For callback, also try to use the z_encoder as much as possible.
implement efficient operation when D_hat is composed of rank_1 atoms in dicodile
implement add_one_atom in dicodile
implement get_error_max in dicodile in a distributed algorithm. (see tomMoral/dicodile#49 ).
implement a multi signal dicodile version for online learning.

Validation of hyperparameters

I started using the multivariate rank-1 CSC, and am unsure about how to select the parameters, e.g. the regularization and number of atoms. Which output should I look at to assess the fit independently?

Also, I would like to chose a longer time per atom and not apply a high-pass filter > 0.5 Hz to be able to look at slow dynamics. Do you have any advice the parameters regarding this?