Git Product home page Git Product logo

oist-ncbc / spykesim Goto Github PK

View Code? Open in Web Editor NEW
20.0 6.0 3.0 4.45 MB

Extended edit similarity measurement for high dimensional discrete-time series signal (e.g., multi-unit spike-train).

Home Page: https://pypi.org/project/spykesim

License: MIT License

Makefile 0.10% Python 98.50% C 1.31% Shell 0.10%
neuroscience spike-trains editdistance neuroinformatics theoretical-neuroscience similarity-measures python

spykesim's Introduction

PyPI MIT License Build Status

spykesim is a Python module that offers functions for measuring the similarity between two segmented multi-neuronal spiking activities.

Extended edit similarity measurement is implemented. You can find details in the following paper.

https://www.frontiersin.org/articles/10.3389/fninf.2019.00039

This library is re-implementation of the algorithm. The original implementation can be found in this repo.

Supported Operating Systems

This library tested on Ubuntu and MacOS.

For Windows users: Please consider to use Ubuntu via Windows Subsystem for Linux.

Installation

If you do not have Python3.7 on your environment, you may use Anaconda.

Cython and Numpy needs to be preinstalled as these will be used in the installation process.

If you have not installed these packages, run the following:

pip install numpy cython

You can install this library via pip as well:

pip install spykesim

or you may clone and build by yourself:

git clone https://github.com/KeitaW/spykesim.git
cd spykesim
python setup.py build_ext --inplace install

Dependencies

  • Python (>= 3.7)
  • Numpy(Needs to be preinstalled)
  • Cython(Needs to be preinstalled)
  • scipy
  • tqdm
  • h5py

Tutorial

You can find a tutorial in doc.

Citation

You can use the following bib entry to cite this work:

@article{Watanabe:2019eq,
author = {Watanabe, Keita and Haga, Tatsuya and Tatsuno, Masami and Euston, David R and Fukai, Tomoki},
title = {{Unsupervised Detection of Cell-Assembly Sequences by Similarity-Based Clustering}},
journal = {Frontiers in Neuroinformatics},
year = {2019},
volume = {13},
month = may
}

spykesim's People

Contributors

092975 avatar keitaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

spykesim's Issues

Missing dependency? (hdbscan)

I tried to import editsim using the following expression:

from spykesim import editsim

and hdbscan was requested, even though previous dependency checks were successful. Maybe it should be added to the dependency list?

System info:

Python 3.7.4 (default, Jul 16 2019, 07:12:58)
[GCC 9.1.0] on linux
Linux 5.2.9-arch1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux

Sequence detection from Profiles.

Hi, this is an interesting method. I am trying it on a set of data and I was wondering if you have plans to include detection/extraction of sequences; or do you have any suggestion of the approach?

I have plotting out the windows which belong to the same cluster, along with the spikes of those windows but they do not seem to match well with each other, nor with the cluster profiles in es.profiles. Do you have any suggestion?

Originally posted by @tuanpham96 in #1 (comment)

Computing the profile twice

In editsim.pyx:

def gen_profile(self, th_=5, sigma=5):
    ...................................................
    for uidx, mats in zip(uidxs, mats_list):
        profile = regularize_profile(barton_sternberg(mats, self._sim_bp, 2*len(mats)))
        if profile.sum() >= th_:
            self.profiles[uidx] = regularize_profile(barton_sternberg(mats, self._sim_bp, 2*len(mats)))
    return self 

I think you could just use:
if profile.sum() >= th_:
self.profiles[uidx] = profile

because you are computing the same thing twice. This would save some time.

Deprecated module warnings

Modules related to sklearn / six give deprecation warnings upon importing editsim. Are they necessary for general computation in spykesim? Should the lib be updated?

Error:

>>> from spykesim import editsim
/usr/lib/python3.7/site-packages/sklearn/externals/six.py:31: DeprecationWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
  "(https://pypi.org/project/six/).", DeprecationWarning)
/usr/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)

System info:

Python 3.7.4 (default, Jul 16 2019, 07:12:58)
[GCC 9.1.0] on linux
Linux 5.2.9-arch1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux

Clustering on similarity matrix

Hi.

In spykesim/editsim.pyx you perform the clustering directly on the similarity matrix, like this:

def clustering(self, min_cluster_size=5):
"""
Perform HDBSCAN clustering algorithm on the similarity matrix calculated by gensimmat
"""
self.clusterer = HDBSCAN(min_cluster_size=min_cluster_size)
self.cluster_labels = self.clusterer.fit_predict(self.simmat)

Given that self.simmat is a similarity matrix, I think it should be first converted to a distance matrix. Then, then clustering should be performed using the metric='precomputed' option of HDBSCAN. Something like this (assuming distmat is the distance matrix obtained from self.simmat):
self.clusterer = HDBSCAN(min_cluster_size=min_cluster_size, metric='precomputed')
self.cluster_labels = self.clusterer.fit_predict(self.distmat)

I tried the code in its current state in the tutorial, and I get 4 'valid' clusters (index>=0) and 5 windows classified as noise. If I implement these modifications, I get 3 equal 'valid' clusters and 45 windows classified as noise, which makes more sense to me.

Thanks!

Write test for the similarity calculation.

  • Create a branch for this issue.
  • Copy https://github.com/KeitaW/Chaldea/blob/master/chaldea/edit_sim.jl to this repo.
  • Write tests based on edt_sim.jl.
  • Write a draft function using numpy
  • Write a faster-version using Cython

Barton Sternerg always returns first alignment for first 2 windows

So, I had noticed that the returned profile always resembles closely one of the first two windows (in temporal order) in a cluster, so I checked the code.

barton_sternberg returns mat[i] for some reason instead of the final alignment, where i is 1 and is not affected by the for loop that follows. So mat[i] that is returned actually always corresponds to alignment1 between windows 1 and 2, so it is NOT representative for the entire cluster. I assume this is a simple coding mistake, but it should be corrected before it affects potential users.

def barton_sternberg(mats_, sim_bp, niter):
..........
i, j = 1, 2 # for test
dp_max, dp_max_x, dp_max_y, bp, flip = sim_bp(
mats[i].astype(np.double), mats[j].astype(np.double))
al1, al2 = clocal_exp_editsim_align_alt(bp, dp_max_x, dp_max_y, mats[i], mats[j], flip)
mats[i] = al1
mats[j] = al2
al = (al1 + al2) / 2
processed[i] = True
processed[j] = True
..........
return mats[i]

Question regarding Ternary operator in `clocal_exp_editsim`

I got the following question from https://github.com/rcojocaru.

"""
I have a short question about the code. In editsim.py --> clocal_exp_editsim(_withbp) you have this:

for col1 in range(nrow):
for col2 in range(ncol):
match = 0
for row in range(nneuron):
match += mat1[row, col1] * mat2[row, col2]
match = -10 if match == 0 else match
...
dp[col1+1, col2+1] = max4(
0,
down_score,
right_score,
dp[col1, col2] + match

Do you remember why you introduced this if clause for the case in which match is 0? I think it can have radical effects on the edit similarity score. For example, even if comparing identical sequences, instead of getting the maximum edit similarity score, the result would be alpha dependent because of this if clause. I would really appreciate your input before modifying core things.
"""

Dimension mismatch between `mat` in `gen_profile`

I have occasionally ran into problems when I used certain min_cluster_size during clustering, and that affected gen_profile step. Apparently there was some dimension mismatch between the matrices when gen_profile is being run in this line:
https://github.com/KeitaW/spykesim/blob/a18ddc4680f893b20d4c2c214228e5912649d646/spykesim/editsim.pyx#L204

I believe it has to do with how mats are produced below, which does not take into account the sliding width, or assumes that slide=window
https://github.com/KeitaW/spykesim/blob/a18ddc4680f893b20d4c2c214228e5912649d646/spykesim/editsim.pyx#L193

Is that a bug? If so, can it be fixed with something like this:

for idx in indices:
      mat = self.binarray_csc[:, self.times[idx]:(self.times[idx]+self.window)].toarray()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.