btel / spikesort Goto Github PK

Spike sorting library implemented in Python/NumPy/PyTables

Home Page: http://spike-sort.readthedocs.org

License: Other

Python 96.34% Fortran 3.66%

spikesort's Introduction

Spike sorting library implemented in Python/NumPy/PyTables
----------------------------------------------------------

Project website: spikesort.org

Requirements:

* Python >= 2.6
* PyTables
* NumPy
* matplotlib

Optional:

* scikits.learn -- clustering algorithms
* neurotools  -- spike train analysis

Test dependencies:

* all the above 
* hdf5-tools

To see the library in actions see examples folder.

spikesort's People

Contributors

Stargazers

Watchers

Forkers

chrox belevtsoff espenhgn simone-codeluppi nickponvert faezehtamimi subhacom juliasprenger crimson12ed zanemuir chong-jiao mxhit-j

spikesort's Issues

let the names of the requested features be customizable

The names of the features which required by the components are currently hard coded, like so:

class SpikeDetector(base.Component):
    """Detect Spikes with alignment"""
    waveform_src = base.RequiredFeature("SignalSource",
                                        base.HasAttributes("signal"))

which imposes some limitations on how those components may be interconnected. For example, if one wants to use different filter for spike extraction, it will not be possible without inheriting the current SpikeExtractor and modifying the code.

One possible way to overcome this limitation without change in the API:

class SpikeDetector(base.Component):
    """Detect Spikes with alignment"""
    def __init__(self, ... , waveform_src = "SignalSource"):
        self.waveform_src = base.RequiredFeature(waveform_src,
                                             base.HasAttributes("signal"))

change order of extracellular recording

extracelluar recordings are stored in 2d array, where first dimension are the samples and second channels (contacts). However from point of view of optimisation it is better to store the fastest changing index in the second dimension (C ordering). rewrite IO, tests and extract to use this ordering

mention "multiple_filters"-related API changes in the next release notes

The API changes are:

There's no f_filter argument in GenericSource anymore
All filtering stuff is moved from core.extract.py to core.filters.py

merge master to dev and push to dev from now on

as we agreed:

push to dev until the next release is ready (creating branches for features and large fixes if needed)
merge dev to master
tag the release

Convert scikits.learn imports to sklearn

Scaling of X-axis with a keypress

Just like "+" and "-" allow scaling of the Y-axis, it would be great if e.g., "Shift-+" and "Shift--" scaled the X-axis. That way one wouldn't have to restart the GUI every time one wanted to see a different time scale.

add coverage testing

bring up spikes in manual sort

highlighted spikes should come up in the foreground

spikes are not correctly aligned when window is too short

add selective browsing in the SpikeBrowser

I often find myself spending quite some time scrolling around in the SpikeBrowser to locate a cell which has few spikes. It would be nice to have a kind of selective browsing. One possible way is to have an additional attribute, something like:

>>> browser.browse_lablel
'all'
>>> browser.browse_label=1

Pressing the next button will then scroll around the cells with label=1 only.

write release notes

update documentation

read documentation and make it up-to-date
check spelling mistakes
describe the datastructures

invalid imports

on clean debian system unit test complain about missing modules:

patterns

Remove the import.

not all 'duplicates' are removed after spike alignment

fix spike_sort.core.extract.remove_doubles()

fix Dashboard component

Dashboard can sometimes throw an exception when:

call dash.show(some_cell)
change the threshold
update SpikeDetector

components.py, line 665:
UnboundLocalError: local variable 'spt' referenced before assignment

recluster(cell, ...) fails when using masked data

mask = features.get('is_valid')
if mask is not None:

    valid_data = data[mask,:]    # <-- here it fails

    cl = cluster_func(valid_data, *args, **kwargs)
    labels = np.zeros(data.shape[0], dtype='int')-1

'mask' array is full-lengt and 'data' is truncated by ClusterAnalyzer to hold features only for selected cell

no LICENSE

add LICENSE file to the root directory

SpikeBrowserWithLabels might be broken

When running the tutorial I get the following exception:

Traceback (most recent call last):
  File "examples/sorting/cluster_beans.py", line 58, in <module>
    browser.show()
  File "/usr/local/lib/python2.7/dist-packages/SpikeSort-0.1-py2.7.egg/spike_beans/components.py", line 475, in show
    self._draw()
  File "/usr/local/lib/python2.7/dist-packages/SpikeSort-0.1-py2.7.egg/spike_beans/components.py", line 466, in _draw
    self._set_data()
  File "/usr/local/lib/python2.7/dist-packages/SpikeSort-0.1-py2.7.egg/spike_beans/components.py", line 488, in _set_data
    labels = self.label_src.labels
  File "/usr/local/lib/python2.7/dist-packages/SpikeSort-0.1-py2.7.egg/spike_beans/base.py", line 111, in __get__
    self.result = self.Request(obj)
  File "/usr/local/lib/python2.7/dist-packages/SpikeSort-0.1-py2.7.egg/spike_beans/base.py", line 130, in Request
    % (obj, self.feature)
AssertionError: The value <spike_beans.components.ClusterAnalyzer object at 0x39f2550> of 'LabelSource' does not match the specified criteria

Using SpikeBrowser() works fine. Not sure exactly what's going on here. Looks like the labels attribute of instances of the SpikeBrowser class doesn't exist, but the RequiredFeature class makes it a necessity to pass the (currently failing) assertion.

Label the electrodes in the spike browsing plot

It would be nice to have electrode labels on the spike browsing plot, since it is fairly common for electrodes to not be linearly ordered on their respective shanks, e.g., the first shank (going from left to right) from top to bottom is numbered 1, 3, 2, 6. It would be nice to see these numbers in the spike browsing plot.

is_masked should be True for invalid spikes

is_masked is now counterintuitive - the name suggests that it is True for masked (invalid) spike, but in fact it is currently False for invalid and True for valid spikes.

Documentation says it is the correct (intuitive) way, which is obviously wrong!

move 'delete_spikes' to ClusterAnalyzer

this method is currently in SpikeDetector component. Applying this function results in updating all the observers, including ClusterAnalyzer, which vanishes all previous clustering results in the current session.

Support for dataformats

Support to new dataformats can be added via the neo library.

mention matplotlib1.0.1 patches to docmentation

It's getting no so easy to get matplotlib 1.0.1 running with new libraries. In particular, it's already impossible to compile it against libpng>=1.1 and new tk module in python 2.7.2.

I've generated couple of patches based on the new mpl releases, so we can probably mention them in the documentation. Faced this problem in rolling-release distro (Arch).

Patches are here: https://gist.github.com/2294220

Simultaneous highlighting of spikes

It would be extremely useful to be able to see the detected spikes highlighted on the "features" plot simultaneously as one is scrolling through them in the spike browser.

add documentation of components

Type checking

if type(thresh) is str or type(thresh) is unicode:
    # ...

should be changed to

if isinstance(thresh, basestring):
    # ...

Also,

type(someInstance) is TypeObject

should be avoided.

no buttons appear in spike browser

running browse_data.py example brings up the spike browser window, but with no buttons. clicking in the area where buttons should appear works!

update docstring to use numpy convention

Use numpy.ptp instead of array.max() - array.min()

Some of the code in, for example, src/spike_sort/core/evaluate.py uses peak_to_peak = avg_spike.max()-avg_spike.min(). NumPy has such functionality built in as the ptp function.

Move classes into their own files

I personally find it easier to grok code faster if the classes are separated into their own individual files (with related support functions if necessary). What I do in the case that they aren't in their own files is run the code and grep until I find what I'm looking for. So really, this issue could be called "Reduce the baseline amount of grep necessary to understand the code".

Refactor some of the repetitive filtering code into an abstract base class.

A little background just in case. With the advent of OOP came one of its most useful behaviors: polymorphism. This allows for repetitive code to be abstracted into a base class from which related classes inherit and can (or must) override one or more methods. In this case, many of the filtering classes perform the same action of checking the coefficient cache and then updating it if necessary. This allows one to create a method, call it m, that calls the method that must be overloaded, call it om. Since m is inherited by all subclasses, a given subclass, call it Sub, can call m and m will know to call the om method of Sub. I've written this code, but I need to write a test (and pass it!) before pushing it.

chunk data while reading

insted of reading whole spike waveforms into memory. read them by chunks and copy to pytables/memmap array.

Make all user-defined classes inherit from object.

User-defined classes should (ultimately) inherit from Python's root object object.

fix matplotlib 1.1.0 compatibility problems

Some weird plotting artefats can be observed when using spikesort with matplotlib 1.1.0. Among those: absence of text in the Legend plot, dark (almost black) colored feature plots

sphinx 1.1 generates lots of warnings

Spinx 1.1 complains about not being able to import *_src attributes of SpikeBeans components. These attributes are instatiated on runtime and they give an exception when the source was not defined.

Solution:

make __src private by prefixing them with underscore, that is: ___src

cluster labels should become dictionaries

Allow customizable thresholding procedure

This is a VERY rough idea, but I've run in to an issue where I need to threshold each channel separately (we have many recordings that are not performed with tetrodes). It might be nice to have a parameter that allows one to run an arbitrary Python callable that conforms to the interface of detect_spikes.

example cluster_manual.py hangs

cluster_manual takes forever to run and does not show any windows

utility functions

add functions for basic analysis of spike trains:

PSTH
ISIH
auto- and cross-correlations
raster plots

update docstring

use numpy docs convention for docstrings: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#docstring-standard

clarify copyright/license on 3rd party code in src/spike_beans/base.py

class dictproperty(object):
    """implements collection properties with dictionary-like access.
    Adapted from `Active State Recipe 440514:
    <http://code.activestate.com/recipes/440514-dictproperty-properties-for-dictionary-attributes/>`_
    """

is imho rather a copy of the code from there so please be nice and at least list original author/license (not sure yet how PSF would play with your BSD-2, probably making the whole project under PSF...)

Conform to PEP8

Make code conform to PEP8 style for easier reading.

remove dependency on pytables

PyTables is very specialised dependency and it may be hard to install on some systems. It may increase the "entry level" for perspective user. Replacing PyTables with memmapped numpy should not affect performance too much, so pytables can be dropped in future versions.

Use scipy.spatial.distance.cdist instead of _metric_euclidean

cdist does the same thing as _metric_euclidean in cluster.py and has a more efficient C implementation. It also has different distance metrics should the need for that ever arise in the future.

A quick benchmark using IPython's timeit functionality reveals the following:

def euc(a, b):
    m, n = a.shape
    p, k = b.shape
    if n != k:
        raise TypeError('a and b must have the same number of columns')
    delta = np.zeros((m, p))
    for d in xrange(n):
        delta += np.subtract.outer(a[:, d], b[:, d]) ** 2
    return np.sqrt(delta)

from scipy.spatial.distance import cdist
a = randn(100, 100)
b = randn(100, 100)
timeit euc(a, b) # 100 loops, best of 3: 7.72 ms per loop
timeit cdist(a, b) # 1000 loops, best of 3: 1.45 ms per loop

Wrong array dimensions bug in extract.py

Around line 305 in src/spike_sort/core/extract.py the following code spits out an exception:

spWave[:, i, :] = sp_data[contacts, sp + win[0]:sp + win[1]].T

The reason is because spWave[:, i, :] has dimension (n, 1) and sp_data[contacts, sp + win[0]:sp + win[1]].T has dimension (n,). My version of Numpy doesn't allow such arrays to be assigned to one another.

I'll submit a fix for this: change sp_data[...].T to np.atleast_2d(sp_data[...]).T. This adds a dimension to any 0 or 1D arrays to make them into a 2D array and does nothing for arrays with ndim >= 2.

spike browser fails when spikes are truncated

Allow other backends besides Tk

Personally, I find Tk to be very ugly (no judgment on the choice to use it though, it can be MUCH easier to work with Tk than other GUI toolkits in Python), and it doesn't jive with Gnome or OS X very well.

bring up spikes in manual sort

highlighted spikes should come up in the foreground

plotting spikes takes very long

plotting large number of spikes >10000 by PlotSpikes component takes a couple of seconds and significantly affects spike sorting experience

Speed up align_spikes and extract_spikes via Cython

extract_spikes and align_spikes are brutally slow right now. I think it would be useful to attempt to speed up the loops in these functions using Cython. Of course, if there's something I'm missing about how to make this faster using, e.g., PyTables (somewhat) fast I/O then please let me know.