oist-ncbc / spykesim Goto Github PK

Extended edit similarity measurement for high dimensional discrete-time series signal (e.g., multi-unit spike-train).

Home Page: https://pypi.org/project/spykesim

License: MIT License

Makefile 0.10% Python 98.50% C 1.31% Shell 0.10%

neuroscience spike-trains editdistance neuroinformatics theoretical-neuroscience similarity-measures python

spykesim's Introduction

spykesim is a Python module that offers functions for measuring the similarity between two segmented multi-neuronal spiking activities.

Extended edit similarity measurement is implemented. You can find details in the following paper.

https://www.frontiersin.org/articles/10.3389/fninf.2019.00039

This library is re-implementation of the algorithm. The original implementation can be found in this repo.

Supported Operating Systems

This library tested on Ubuntu and MacOS.

For Windows users: Please consider to use Ubuntu via Windows Subsystem for Linux.

Installation

If you do not have Python3.7 on your environment, you may use Anaconda.

Cython and Numpy needs to be preinstalled as these will be used in the installation process.

If you have not installed these packages, run the following:

pip install numpy cython

You can install this library via pip as well:

pip install spykesim

or you may clone and build by yourself:

git clone https://github.com/KeitaW/spykesim.git
cd spykesim
python setup.py build_ext --inplace install

Dependencies

Python (>= 3.7)
Numpy(Needs to be preinstalled)
Cython(Needs to be preinstalled)
scipy
tqdm
h5py

Tutorial

You can find a tutorial in doc.

Citation

You can use the following bib entry to cite this work:

@article{Watanabe:2019eq,
author = {Watanabe, Keita and Haga, Tatsuya and Tatsuno, Masami and Euston, David R and Fukai, Tomoki},
title = {{Unsupervised Detection of Cell-Assembly Sequences by Similarity-Based Clustering}},
journal = {Frontiers in Neuroinformatics},
year = {2019},
volume = {13},
month = may
}

spykesim's People

Contributors

Stargazers

Watchers

Forkers

092975 mynameismoney

spykesim's Issues

Missing dependency? (hdbscan)

I tried to import editsim using the following expression:

from spykesim import editsim

and hdbscan was requested, even though previous dependency checks were successful. Maybe it should be added to the dependency list?

System info:

Python 3.7.4 (default, Jul 16 2019, 07:12:58)
[GCC 9.1.0] on linux

Linux 5.2.9-arch1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux

Sequence detection from Profiles.

Hi, this is an interesting method. I am trying it on a set of data and I was wondering if you have plans to include detection/extraction of sequences; or do you have any suggestion of the approach?

I have plotting out the windows which belong to the same cluster, along with the spikes of those windows but they do not seem to match well with each other, nor with the cluster profiles in es.profiles. Do you have any suggestion?

Originally posted by @tuanpham96 in #1 (comment)

Computing the profile twice

In editsim.pyx:

def gen_profile(self, th_=5, sigma=5):
    ...................................................
    for uidx, mats in zip(uidxs, mats_list):
        profile = regularize_profile(barton_sternberg(mats, self._sim_bp, 2*len(mats)))
        if profile.sum() >= th_:
            self.profiles[uidx] = regularize_profile(barton_sternberg(mats, self._sim_bp, 2*len(mats)))
    return self

I think you could just use:
if profile.sum() >= th_:
self.profiles[uidx] = profile

because you are computing the same thing twice. This would save some time.

Deprecated module warnings

Modules related to sklearn / six give deprecation warnings upon importing editsim. Are they necessary for general computation in spykesim? Should the lib be updated?

Error:

>>> from spykesim import editsim
/usr/lib/python3.7/site-packages/sklearn/externals/six.py:31: DeprecationWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
  "(https://pypi.org/project/six/).", DeprecationWarning)
/usr/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)

System info:

Python 3.7.4 (default, Jul 16 2019, 07:12:58)
[GCC 9.1.0] on linux

Linux 5.2.9-arch1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux

Clustering on similarity matrix

Hi.

In spykesim/editsim.pyx you perform the clustering directly on the similarity matrix, like this:

def clustering(self, min_cluster_size=5):
"""
Perform HDBSCAN clustering algorithm on the similarity matrix calculated by gensimmat
"""
self.clusterer = HDBSCAN(min_cluster_size=min_cluster_size)
self.cluster_labels = self.clusterer.fit_predict(self.simmat)

Given that self.simmat is a similarity matrix, I think it should be first converted to a distance matrix. Then, then clustering should be performed using the metric='precomputed' option of HDBSCAN. Something like this (assuming distmat is the distance matrix obtained from self.simmat):
self.clusterer = HDBSCAN(min_cluster_size=min_cluster_size, metric='precomputed')
self.cluster_labels = self.clusterer.fit_predict(self.distmat)

I tried the code in its current state in the tutorial, and I get 4 'valid' clusters (index>=0) and 5 windows classified as noise. If I implement these modifications, I get 3 equal 'valid' clusters and 45 windows classified as noise, which makes more sense to me.

Thanks!

Write test for the similarity calculation.

Create a branch for this issue.
Copy https://github.com/KeitaW/Chaldea/blob/master/chaldea/edit_sim.jl to this repo.
Write tests based on edt_sim.jl.
Write a draft function using numpy
Write a faster-version using Cython

Barton Sternerg always returns first alignment for first 2 windows

So, I had noticed that the returned profile always resembles closely one of the first two windows (in temporal order) in a cluster, so I checked the code.

barton_sternberg returns mat[i] for some reason instead of the final alignment, where i is 1 and is not affected by the for loop that follows. So mat[i] that is returned actually always corresponds to alignment1 between windows 1 and 2, so it is NOT representative for the entire cluster. I assume this is a simple coding mistake, but it should be corrected before it affects potential users.

def barton_sternberg(mats_, sim_bp, niter):
..........
i, j = 1, 2 # for test
dp_max, dp_max_x, dp_max_y, bp, flip = sim_bp(
mats[i].astype(np.double), mats[j].astype(np.double))
al1, al2 = clocal_exp_editsim_align_alt(bp, dp_max_x, dp_max_y, mats[i], mats[j], flip)
mats[i] = al1
mats[j] = al2
al = (al1 + al2) / 2
processed[i] = True
processed[j] = True
..........
return mats[i]

Question regarding Ternary operator in `clocal_exp_editsim`

I got the following question from https://github.com/rcojocaru.

"""
I have a short question about the code. In editsim.py --> clocal_exp_editsim(_withbp) you have this:

for col1 in range(nrow):
for col2 in range(ncol):
match = 0
for row in range(nneuron):
match += mat1[row, col1] * mat2[row, col2]
match = -10 if match == 0 else match
...
dp[col1+1, col2+1] = max4(
0,
down_score,
right_score,
dp[col1, col2] + match

Do you remember why you introduced this if clause for the case in which match is 0? I think it can have radical effects on the edit similarity score. For example, even if comparing identical sequences, instead of getting the maximum edit similarity score, the result would be alpha dependent because of this if clause. I would really appreciate your input before modifying core things.
"""

Dimension mismatch between `mat` in `gen_profile`

I have occasionally ran into problems when I used certain min_cluster_size during clustering, and that affected gen_profile step. Apparently there was some dimension mismatch between the matrices when gen_profile is being run in this line:
https://github.com/KeitaW/spykesim/blob/a18ddc4680f893b20d4c2c214228e5912649d646/spykesim/editsim.pyx#L204

I believe it has to do with how mats are produced below, which does not take into account the sliding width, or assumes that slide=window
https://github.com/KeitaW/spykesim/blob/a18ddc4680f893b20d4c2c214228e5912649d646/spykesim/editsim.pyx#L193

Is that a bug? If so, can it be fixed with something like this:

for idx in indices:
      mat = self.binarray_csc[:, self.times[idx]:(self.times[idx]+self.window)].toarray()