State space models for decoding hippocampal trajectories and determining their type using sorted or clusterless data

License: MIT License

Shell 0.01% Python 0.89% Jupyter Notebook 99.11%

python hippocampus replay state-space-model clusterless sorted-spikes

replay_trajectory_classification's Introduction

Eden-Kramer-Lab

Website

replay_trajectory_classification's People

Contributors

Stargazers

Watchers

Forkers

christophkirst ajay-zou ryanharvey1 jihyunbak synapticsage 410pfeliciano ahmeddeladly commanderpho benjamin-wkd jennymsmith94 tomgeorge1234 chenyanliu7

replay_trajectory_classification's Issues

Account for arbitrary spatial topologies in the likelihood

Right now, any Gaussians or splines used don't respect the connectivity of the spatial topologies. It would be ideal to account for this so that we are absolutely sure the right things are smoothed.

Additionally, if we used some sort of diffusion distance this would possibly help with any edge effects due to no data outside of boundaries.

Write usage notes on the relationship between `place_bin_size`, `movement_std`, and `replay_speed`

In order to represent a Gaussian random walk, the bin size must be small enough to represent at least the adjacent bins.

Handle missing data

When there is missing data, the likelihood should be 1 for all states. Add an argument to predict to specify missing data timepoints.

Fix diffusion boundaries

Unsure if diffusion implementation is correctly handling the boundaries. Need to double check.

Fix tutorial notebook to show clusterless features

Per issue #14

If I want to use speed instead of spike to do decoding / classification, which model should I use? I saw that there is a calcium likelihood with gamma distribution, but I don't know which decoder/classifier class to use, since they accept either SortedSpikes or Clusterless, but not calcium?

I'm using 1.3.15

Thanks!

Add in 'local' state

Want the flexibility to add in a local state which directly encodes the animal's position. This requires evaluation of the likelihood at the predict state and passing in position. See https://github.com/Eden-Kramer-Lab/replay_identification .

Update tutorials to have more realistic parameters

Tutorial covering track_graph, 1D, 2D

Please specify numpy version <1.24

After mamba install, my numpy version went up to 1.26, which does not work with the track linearization package. This is because the np.warning is used somewhere in the code in the track linearization package, whereas it is removed for numpy >= 1.25

reshape seems incorrect for the posterior xarray for 2D decoding?

In sorted spike decoder:
https://replay-trajectory-classification.readthedocs.io/en/latest/_modules/replay_trajectory_classification/decoder.html#SortedSpikesDecoder.fit

I find the convert_results_to_xarray behaves a bit strangely for 2D.

since new_shape = (n_time, n_x, n_y), after masking the value and then reshape and then swapaxes, the output will be (n_time, n_y, n_x), which is inconsistent with the coords for XArray. This would give an error during the "Try" and would bump things to “except”. However, because the reshape by default uses Python order and not Fortran, the resulting array will be incorrect. (This problem probably won't show up in the example in the tutorial, since the circular track is isotropic. My maze has unequal x and y dimensions and the result does not make much sense)

For my situation, the easiest fix seems to be doing
(dims, mask(value,is_track_interior).reshape(X,order='F'))
instead of
(dims, mask(value,is_track_interior).reshape(new_shape).swapaxes(-1,-2))
in the "try".

Sorry if my description is a bit confusing. Feel free to ask for clarifications.

(BTW thank you so much for your amazing work! This package is so pleasant to use and powerful. )

Add model checking methods

Compatability with open field 2D data

Hi,

I'm having some issues getting code to work, and wanted to check which aspects of the code have been verified to work with open-field 2D data. The examples of 2D data here all seem to assume an underlying 1D track geometry, and I wanted to check where this was/wasn't built into the code?

For example, I am having issues fitting place fields, following the code example in notebook 2, 'Decoding in 2D' (I changed only the movement variance and place bin size to be reasonable for my data). However place field fits are all nan for all neurons, despite no nan values being present in either position or spiking inputs. If I instead provide my own 2D place field estimates, is the decoder/classifier designed to manage 2D data without an underlying 1d geometry?

Warn user if `place_bin_size` is too small relative to `movement_std`

Add occupancy threshold for extracting track interior

If the animal only visits a bin once, then we may not want to include it in the decode.

Translating API from 0.x to 1.x

Hello,
I am somewhat lost trying to translate between the two interfaces. My data is from a linear track and I want to have the classifier represent the two directions separately to account for remapping between directions.
If I understand correctly, in 0.x I would do something like this:

clusterless_algorithm = 'multiunit_likelihood'
clusterless_algorithm_params = {
    model_kwargs': {
        bandwidth': np.array([mark_std, mark_std, mark_std, mark_std, pos_std]) 
    }
}

continuous_transition_types = [
    ['random_walk', 'uniform',  'uniform',   'uniform'],
    ['uniform',            'uniform',  'uniform',   'uniform''],
    ['uniform',            'uniform',  'random_walk', 'uniform'],
    ['uniform',            'uniform',  'uniform',   'uniform'],
    ]

encoding_group_to_state = [-1, -1, -1, 1, 1, 1] # corresponds to direction

classifier = ClusterlessClassifier(movement_var=movement_var,
                                   place_bin_size=pos_bin_size,
                                   clusterless_algorithm=clusterless_algorithm,
                                   clusterless_algorithm_params=clusterless_algorithm_params,
                                   continuous_transition_types=continuous_transition_types,
                                   discrete_transition_diag=state_diag_val)

classifier.fit(position,
               multiunits,
               encoding_group_labels=direction,
               encoding_group_to_state=encoding_group_to_state,
               is_training=is_training)

My translation attempt to 1.x is:

clusterless_algorithm = 'multiunit_likelihood'
clusterless_algorithm_params = {
          'mark_std' : mark_std, 
          'position_std' : pos_std
}

continuous_transition_types = [
    [RandomWalk(movement_var=movement_var), Uniform(),   Uniform(),                             Uniform()],
    [Uniform(),                             Uniform(),   Uniform(),                             Uniform()],
    [Uniform(),                             Uniform(),   RandomWalk(movement_var=movement_var), Uniform()],
    [Uniform(),                             Uniform(),   Uniform(),                             Uniform()],
    ]

encoding_group_to_state = [-1, -1, 1, 1] # corresponds to direction

classifier = ClusterlessClassifier(environment=Environment(place_bin_size=pos_bin_size),
                                   continuous_transition_types=continuous_transition_types,
                                   clusterless_algorithm=clusterless_algorithm,
                                   clusterless_algorithm_params=clusterless_algorithm_params,
                                   discrete_transition_diag=DiagonalDiscrete(state_diag_val))

classifier.fit(position,
               multiunits,
               encoding_group_labels=direction,
               encoding_group_to_state=encoding_group_to_state,
               is_training=is_training)

Problems:

encoding_group_to_state is no longer a keyword of ClusterlessClassifier.fit and I don't know what do to with it.
I don't understand from the docs the distinction between different encoding groups and different environments, which of them I should use, and how.

Thank you

Visual repr for Environment class

Would be nice as a visualization

Add ability to represent a previously experienced environment.

Provide convenience function to make spike indicator

Implement variational or approximate algorithm for estimating discrete state transition

spiking likelihood

Is it true that the spikes are expected to be 0/1, instead of any integer, when using the spiking_likelihood_kde? Does that mean if my bin is large and there are more than one spikes per bin, the result is not accurate?

Likelihood on first time step should be multiplied by initial conditions?

Tutorial for using clusterless spikes, and applying this library to neuropixels

The tutorial on data formatting for the clusterless spikes data does not make it clear what is in the data aside from nans. So when a spike is detected should there be potentials there?

Also this may be a special use case but I am trying to use neuropixels data with this library. How should I pick which channels on the probes to include? For instance I have 34 channels in CA3 region on one recoriding. Could I use all 34 or is your code built assuming you have a tetrode?

Add bootstrap option to account of uncertainty in estimation of place fields

Implement code so it works in 1D

Numpy 1.24 incompatability

Thank you for this awesome code package! After installation, I was initially getting errors when attempting to import the package

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\replay_trajectory_classification\__init__.py", line 2, in <module>
    from replay_trajectory_classification.classifier import (
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\replay_trajectory_classification\classifier.py", line 18, in <module>
    from replay_trajectory_classification.continuous_state_transitions import (
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\replay_trajectory_classification\continuous_state_transitions.py", line 9, in <module>
    from replay_trajectory_classification.environments import Environment, diffuse_each_bin
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\replay_trajectory_classification\environments.py", line 16, in <module>
    from track_linearization import plot_graph_as_1D
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\track_linearization\__init__.py", line 2, in <module>
    from track_linearization.core import get_linearized_position
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\track_linearization\core.py", line 10, in <module>
    np.warnings.filterwarnings('ignore')
  File "C:\Users\Selmaan\Anaconda3\envs\swr\lib\site-packages\numpy\__init__.py", line 320, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'warnings'

It looks like track_linearization\core.py is using a deprecated numpy method to access warnings? This seems to be a known issue w/ other packages, and was fixed by downgrading my numpy version from 1.24.4 to 1.23.5, as suggested here:
https://stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool

https://stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool

unit in movement_var

I'm a bit confused about the unit in movement_var from estimate_movement_var, as well as the resulting transition matrix using the RandomWalk transition type.

From the code it seems the unit of the returned movement_var is length / second. In make_state_transition, however, the transition probability is computed using the variance given by estinate_movement_var directly, without dividing it by the sampling frequency in the decoding problem. This means if my time bin for the decoded data is 0.1s, I'm still using a degree of smoothing as if my time bin were 1s.

Is this a correct interpretation of what the code is doing? Thanks!

Add discrete state transition initial conditions

Right now this is implicit in the code, but it would be better if it was explicit.

`model.predict` fails

Not sure what's going on...

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[21], line 5
      1 # predict from model
      2 # state_names = ['Inbound-Continuous', 'Inbound-Fragmented', 'Outbound-Continuous', 'Outbound-Fragmented']
      3 state_names= ['continuous', 'fragmented'],
----> 5 results = classifier.predict(spike_indicator,
      6                              state_names=state_names,
      7                              use_gpu=True)

File ~/repos/replay_trajectory_classification/replay_trajectory_classification/classifier.py:1103, in SortedSpikesClassifier.predict(self, spikes, time, is_compute_acausal, use_gpu, state_names, store_likelihood)
   1100 if store_likelihood:
   1101     self.likelihood_ = likelihood
-> 1103 return self._get_results(
   1104     likelihood, n_time, time, is_compute_acausal, use_gpu, state_names
   1105 )

File ~/repos/replay_trajectory_classification/replay_trajectory_classification/classifier.py:599, in _ClassifierBase._get_results(self, likelihood, n_time, time, is_compute_acausal, use_gpu, state_names)
    590         results["acausal_posterior"] = np.full(
    591             (n_time, n_states, n_position_bins, 1), np.nan, dtype=dtype
    592         )
    593         results["acausal_posterior"][:, :, is_track_interior] = compute_acausal(
    594             results["causal_posterior"][:, :, is_track_interior].astype(dtype),
    595             self.continuous_state_transition_[st_interior_ind].astype(dtype),
    596             self.discrete_state_transition_.astype(dtype),
    597         )
--> 599     return self._convert_results_to_xarray(
    600         results, time, state_names, data_log_likelihood
    601     )
    603 else:
    604     logger.info("Estimating causal posterior...")

File ~/repos/replay_trajectory_classification/replay_trajectory_classification/classifier.py:695, in _ClassifierBase._convert_results_to_xarray(self, results, time, state_names, data_log_likelihood)
    689     dims = ["time", "state", "position"]
    690     coords = dict(
    691         time=time,
    692         position=get_centers(edges[0]),
    693         state=state_names,
    694     )
--> 695     results = xr.Dataset(
    696         {
    697             key: (dims, (mask(value, is_track_interior).squeeze(axis=-1)))
    698             for key, value in results.items()
    699         },
    700         coords=coords,
    701         attrs=attrs,
    702     )
    704 return results

File ~/miniconda3/envs/spyglass/lib/python3.9/site-packages/xarray/core/dataset.py:750, in Dataset.__init__(self, data_vars, coords, attrs)
    747 if isinstance(coords, Dataset):
    748     coords = coords.variables
--> 750 variables, coord_names, dims, indexes, _ = merge_data_and_coords(
    751     data_vars, coords, compat="broadcast_equals"
    752 )
    754 self._attrs = dict(attrs) if attrs is not None else None
    755 self._close = None

File ~/miniconda3/envs/spyglass/lib/python3.9/site-packages/xarray/core/merge.py:482, in merge_data_and_coords(data, coords, compat, join)
    480 objects = [data, coords]
    481 explicit_coords = coords.keys()
--> 482 indexes = dict(_extract_indexes_from_coords(coords))
    483 return merge_core(
    484     objects, compat, join, explicit_coords=explicit_coords, indexes=indexes
    485 )

File ~/miniconda3/envs/spyglass/lib/python3.9/site-packages/xarray/core/merge.py:491, in _extract_indexes_from_coords(coords)
    489 """Yields the name & index of valid indexes from a mapping of coords"""
    490 for name, variable in coords.items():
--> 491     variable = as_variable(variable, name=name)
    492     if variable.dims == (name,):
    493         yield name, variable._to_xindex()

File ~/miniconda3/envs/spyglass/lib/python3.9/site-packages/xarray/core/variable.py:111, in as_variable(obj, name)
    109     obj = obj.copy(deep=False)
    110 elif isinstance(obj, tuple):
--> 111     if isinstance(obj[1], DataArray):
    112         raise TypeError(
    113             "Using a DataArray object to construct a variable is"
    114             " ambiguous, please extract the data using the .data property."
    115         )
    116     try:

IndexError: tuple index out of range

Tutorial for using GPU functions

Add ability to select electrodes or cells by state

Missing argument in estimate_place_fields()

Dear Eric,
I was reading the preprint you and your colleagues published on BioRxiv about Hippocampal replay of experience at real-world speeds. I found the model you are using to caracterize replay dynamics very interesting and wanted to test it on my own data. Working through the second tutorial via Jupyter Notebook, I encountered an issue while trying to fit the model with the simulated data. It looks like the estimate_place_fields() function is missing a parameter:

TypeError Traceback (most recent call last)
in
10 place_bin_size=np.sqrt(movement_var),
11 knot_spacing=10)
---> 12 decoder.fit(position, spikes)

~\anaconda3\lib\site-packages\replay_trajectory_classification\decoder.py in fit(self, position, spikes, is_training, is_track_interior, track_graph, edge_order, edge_spacing)
265 self.fit_state_transition(
266 position, is_training, transition_type=self.transition_type)
--> 267 self.fit_place_fields(position, spikes, is_training)
268
269 return self

~\anaconda3\lib\site-packages\replay_trajectory_classification\decoder.py in fit_place_fields(self, position, spikes, is_training)
210 position[is_training], spikes[is_training],
211 self.place_bin_centers_, penalty=self.spike_model_penalty,
--> 212 knot_spacing=self.knot_spacing)
213
214 def plot_place_fields(self, sampling_frequency=1, col_wrap=5):

TypeError: estimate_place_fields() missing 1 required positional argument: 'place_bin_edges'

Best wishes,
Tom

Handle multiple spikes per bin for clusterless

Currently, we ignore multiple spikes per bin. This seems to work okay, but we are losing data.

eden-kramer-lab / replay_trajectory_classification Goto Github PK