contextlab / supereeg Goto Github PK

Infer activity throughout the brain from a small(ish) number of electrodes using Gaussian process regression

License: Other

Makefile 0.03% Python 99.57% Dockerfile 0.40%

supereeg's Introduction

Overview

supereeg (name inspired by Robert Sawyer's The Terminal Experiment is a (fictional) tool for recording the electrical activities of every neuron in the living human brain. Our approach is somewhat less ambitious, but (we think) still "super" cool: obtain high spatiotemporal estimates of activity patterns throughout the brain using data from a small(ish) number of implanted electrodes. The toolbox is designed to analyze ECoG (electrocorticographic) data, e.g. from epilepsy patients undergoing pre-surgical evaluation.

The way the technique works is to leverage data from different patients' brains (who had electrodes implanted in different locations) to learn a "correlation model" that describes how activity patterns at different locations throughout the brain relate. Given this model, along with data from a sparse set of locations, we use Gaussian process regression to "fill in" what the patients' brains were "most probably" doing when those recordings were taken. Details on our approach may be found in this preprint. You may also be interested in watching this talk or reading this blog post from a recent conference.

Although our toolbox is designed with ECoG data in mind, in theory this tool could be applied to a very general set of applications. The general problem we solve is: given known (correlational) structure of a large number of "features," and given that (at any one time) you only observe some of those features, how much can you infer about what the remaining features are doing?

Toolbox documentation, including a full API specification, tutorials, and gallery of examples may be found here on our readthedocs page.

Installation

Recommended way of installing the toolbox

You may install the latest stable version of our toolbox using pip:

pip install supereeg

or if you have a previous version already installed:

pip install --upgrade supereeg

Dangerous/hacker/developer way of installing the toolbox (use caution!)

To install the latest (bleeding edge) version directly from this repository use:

pip install --upgrade git+https://github.com/ContextLab/supereeg.git

GPU acceleration

This is highly recommended if you are building your own models. Only NVIDIA GPUs are supported. To enable GPU acceleration for building models, pass gpu=True to se.Model, and install CuPy.

One time setup

Install Docker on your computer using the appropriate guide below:
- OSX
- Windows
- Ubuntu
- Debian
Launch Docker and adjust the preferences to allocate sufficient resources (e.g. > 4GB RAM)
Build the docker image by opening a terminal in the desired folder and enter docker pull contextualdynamicslab/supereeg
Use the image to create a new container for the workshop
- The command below will create a new container that will map your computer's Desktop to /mnt within the container, so that location is shared between your host OS and the container. Feel free to change Desktop to whatever folder you prefer to share instead, but make sure to provide the full path. The command will also share port 8888 with your host computer so any jupyter notebooks launched from within the container will be accessible at localhost:8888 in your web browser (or 192.168.99.100:8888 if using Docker Toolbox)
- docker run -it -p 8888:8888 --name supereeg -v ~/Desktop:/mnt contextualdynamicslab/supereeg
- You should now see the root@ prefix in your terminal, if so you've successfully created a container and are running a shell from inside!
To launch Jupyter: jupyter notebook --no-browser --ip=0.0.0.0 --allow-root
(Optional) Connect Docker to PyCharm or another IDE

Using the container after setup

You can always fire up the container by typing the following into a terminal
- docker start supereeg && docker attach supereeg
- When you see the root@ prefix, letting you know you're inside the container
Close a running container with ctrl + d from the same terminal you used to launch the container, or docker stop supereeg from any other terminal

Requirements

The toolbox is currently supported on Mac and Linux. It has not been tested on Windows (and we expect key functionality not to work properly on Windows systems). If using Windows, consider using Windows Subsystem for Linux or a Docker container.

Dependencies:

python 3.5+
pandas>=0.21.1
seaborn>=0.7.1
matplotlib==2.1.0
scipy>=0.17.1
numpy>=1.10.4
scikit-learn>=0.18.1
nilearn
nibabel
joblib
multiprocessing
deepdish
future
imageio
hypertools
scikit-image
cupy (HIGHLY RECOMMENDED for GPU acceleration)
pytest (for development)

Citing

We wrote a paper about supereeg, which you can read here. The paper provides full details about the approach along with some performance tests an a large ECoG dataset. If you use this toolbox or wish to cite us, please use the following citation:

Lucy L W Owen, Tudor A Muntianu, Andrew C Heusser, Patrick M Daly, Katherine W Scangos, Jeremy R Manning, A Gaussian Process Model of Human Electrocorticographic Data, Cerebral Cortex, , bhaa115, https://doi.org/10.1093/cercor/bhaa115

Here is a bibtex formatted reference:

@article{10.1093/cercor/bhaa115,
    author = {Owen, Lucy L W and Muntianu, Tudor A and Heusser, Andrew C and Daly, Patrick M and Scangos, Katherine W and Manning, Jeremy R},
    title = "{A Gaussian Process Model of Human Electrocorticographic Data}",
    journal = {Cerebral Cortex},
    year = {2020},
    month = {06},
    issn = {1047-3211},
    doi = {10.1093/cercor/bhaa115},
    url = {https://doi.org/10.1093/cercor/bhaa115},
    note = {bhaa115},
    eprint = {https://academic.oup.com/cercor/advance-article-pdf/doi/10.1093/cercor/bhaa115/33344231/bhaa115.pdf},
}

Contributing

Thanks for considering adding to our toolbox! Some text below has been borrowed from the Matplotlib contributing guide.

Submitting a bug report

If you are reporting a bug, please do your best to include the following:

A short, top-level summary of the bug. In most cases, this should be 1-2 sentences.
A short, self-contained code snippet to reproduce the bug, ideally allowing a simple copy and paste to reproduce. Please do your best to reduce the code snippet to the minimum required.
The actual outcome of the code snippet
The expected outcome of the code snippet

Contributing code

The preferred way to contribute to supereeg is to fork the main repository on GitHub, then submit a pull request.

If your pull request addresses an issue, please use the title to describe the issue and mention the issue number in the pull request description to ensure a link is created to the original issue.
All public methods should be documented in the README.
Each high-level plotting function should have a simple example in the examples folder. This should be as simple as possible to demonstrate the method.
Changes (both new features and bugfixes) should be tested using pytest. Add tests for your new feature to the tests/ repo folder.
Please note that the code is currently in beta thus the API may change at any time. BE WARNED.

Testing

To test supereeg, install pytest (pip install pytest) and run pytest in the supereeg folder.

supereeg's People

Contributors

Stargazers

Watchers

Forkers

satpreetsingh codeaudit lucywowen jeremymanning afcarl andrewheusser hl1477 vishalbelsare kscangos pmdaly ljchang tmuntianu paxtonfitzpatrick

supereeg's Issues

create model functions

a tools to create new models given a list of brain objects

function for loading model

in its current form, the full model in mni space is ~1.5 GB. Because of the size, it would be inefficient to load from a google drive link each time. I was thinking that we could have a download('example_model') function that stores the model locally, and then adapt the load function to first check to see if it exists locally before retrieving it from google drive. sound ok?

new example model

We need a new example model (131 locations) in the new numerator/denominator format. @lucywowen do you have something we can use?

bad electrode questions:

Bad electrodes:

where most electrodes are removed - higher in MTL?

model.save issue

NameError: global name 'pickle' is not defined

to_nii() failing

Need to figure out why to_nii is failing when loading example_nifti but not when created from brain object

Ideas from ECoG journal club: model improvements

Ways to improve model:

Time lag - known conduction delay
Flatmap distance function
DTI
Resting state functional MRI
Subset of nonlinear functions

appropriate radius in new 20K locations

Should we change the radius now that we've up-sampled our full set of locations.

take advantage of GPUs on discovery

some of our computations may benefits from GPUs, particularly if we use tensorflow. discovery has 4 GPU nodes. it'd be worth exploring whether they could help us

cholesky decomp of toeplitz matrix

R = scipy.linalg.toeplitz(np.linspace(-1,1,len(locs))[::-1])
R = np.dot(R, R.T)
R = np.divide(R, np.max(R))
scipy.linalg.cholesky(R)

error:
LinAlgError: 170-th leading minor not positive definite

set_context function

seaborn has a nice way to set the plot styling for a particular context (like figures for a talk, poster, paper or notebook)

I was thinking we could take a similar approach with superEEG, setting a context to run the code on a single computer, cluster, or some other setups. I idea would be to:

import superEEG
superEEG.set_context('google-cluster')

This would change an internal config object that determines the behavior of our functions (how the jobs are split up etc). thoughts?

rename function predict -> infer

more accurate name?

Task list from Andy/Lucy meeting (1/18)

Finish tests (see test issue for point by point)
Make sure all parameters are defined
Documentation for new brain plots (plot_data() and plot_locs())
Only load necessary stats functions in brain and model scripts
Update tutorials
Update examples

predict best locations

Use resting state matrix to create synthetic data, parcellate by region (like Bassett paper), and predict values based on many random samples of n electrodes. Average predicted correlations in regions and across samples to generate a 'best electrode configuration' map.

preprocess module?

for things like filtering, removing elecs by kurosis, we could have a wrapper function (superEEG.preprocess) that takes a brain objects and returns a processed brain object.

an alternative would be to pass a dictionary (or something) when creating the brain object.

suuport BIDS data format

http://bids.neuroimaging.io/

Debug predict function

Something is off with the predict function when running simulated patient data. I think it has to do with the expand_corrmat function, but I'm going to break it down to see where the issue is occurring.

Here's the outline/pseudocode for the simulated_patient_data script:

Define:

# samples
# patients
# electrodes/patient in model
# electrodes/patient in patient to be reconstructed

Extract locations from nifti

Using a downsampled set of locations = 170

Create synthetic patient data

Create toeplitz matrix = R
Multiply the cholesky decomposition of R by a new random multivariate distribution for each patient
This gives synthetic 'recordings' from each of the 170 locations
Save each of these as a brain object

Subsample patient (bo) to be reconstructed

Randomly sample locations (known locations) and corresponding timeseries
Create new brain object - bo_sub

Create synthetic model

Loop over all but one patient (bo_sub or the patient to be reconstructed)
For all other locations not including the known locations (unknown locations), subsample locations and corresponding timeseries
Create correlation matrix for each patient expanding to all unknown locations
Sum across patients for model
Model contains Numerator and Denominator fields

For patient to be reconstructed (bo_sub), predict values at all unknown locations

Create correlation matrix at all unknown locations
Add with model and average
Expand new average correlation matrix to the patient's electrode locations (known locations)
this is the step that's the problem. I think
Index new expanded correlation matrix just at unknown by known locations and known by known locations for the reconstruct function
Output will be just the predicted timeseries for the unknown locations

Calculate the correlation between predicted and actual timeseries

Actual: use the 'recordings' for unknown locations in the reconstructed patient (bo)
Predicted: use the reconstructed recordings
Calculate the correlation at each corresponding location

remap function

copying from another issue #27 -

And finally, to help with the se.save(bo2, 'bo3.bo', template='template.nii') function, it would be useful to have a helper function as part of the brain objects to do something like:
bo3 = bo2.remap('template.nii') <-- return a new brain object with locations specified in template.nii
bo4 = bo2.remap('bo2.bo') <-- return a new brain object with locations specified in bo2.bo

Also, for all of the above, it'd be useful to support both the "filename" syntax as described above, and also an alternative syntax where you just pass the (brain object or nifti) objects themselves. In other words if template is a nifti object (stored on disk as template.nii),

bo3 = bo2.remap(template) and bo3 = bo2.remap('template.nii') should do the same thing. And similarly, if template is a brain object rather than a nifti object, the code should also work.

^ from @jeremymanning

issue importing tensorflow

optimal electrode placement based on ROI

Can we include a measure in the user interface that allows to you predict the activity in one region given the activity in another region (i.e. if we want to know activity in the amygdala, can we reconstruct that activity using electrodes in the contralateral occipital cortex, and to what degree of certainty ... to inform potential electrode placement by regions of desired reconstruction.

generate new model including RAM dataset

load only subfield

modify file format (brain objects and model objects)
would be nice to load in subfield - pickle does

HDF5 or NPZ??

setting up this repository

This repository is for creating a releasable package for doing superEEG inferences on ECoG data. We should start with this template: https://github.com/ContextLab/CDL-starter-pack

reduce memory load with quantization

noticed that we are using 32 bit precision for at least some of the model components (maybe the data too?). I was reading about quantization (reducing precision) to make neural net params smaller in memory. https://www.tensorflow.org/performance/quantization

maybe worth considering?

add logo to images folder

@jeremymanning could you push a small logo for SuperEEG to the images folder?

Simulated dataset that we can reconstruct perfectly

Lucy and I are trying to come up with a test dataset where we simulate two electrodes, and then reconstruct one location equidistant between them. The idea would be to have a test where we know exactly what the reconstruction should be, and we can make sure our code recovers it exactly. we tried using 2 sinusoids where of frequency f and 2f, but the problem is that they are orthogonal so the model breaks down. Any ideas for 2 timeseries that we can use to create this test?

@jeremymanning - not a rush on this, just wanted to jot down the idea

write up readme

can model off of quail

license

need to choose a license, and modify code examples with updated license

remove synthetic data from repo

we should take these out of the repo and put them on google drive

create example scripts

define an API

We need to come up with a list of functions we want to include in the toolbox and define how users should be able to interact with them. Ideas:

supereeg.brain: data object that includes a pandas dataframe of voltage timeseries data (brain.data) and a pandas datagrame of electrode locations (brain.locs). A supereeg.template is a special case of a brain object where data is None. Note: this likely needs to be done in a fancy way whereby the data is a pointer to data on disk, but doesn't actually store it in memory. Otherwise we'll quickly run out of memory for any non-trivial example.
supereeg.predict: this is the main function. given a brain data object with the subject's data (subj) and a template object (default: standard MNI 152 brain), return a new brain object containing the inferred data from the subject at all locations in the template.
probably also some plotting functions that wrap various parts of nilearn (e.g. plot electrode locations, create an animation, make a map, etc.)
functions for easily replicating all analyses in the paper
a function that takes in a list of brain objects (one per subject) and a template object, and returns correlation matrices for each subject along with a combined correlation matrix. (question: should this be stored in the brain data objects?) this will get used by predict.

We may want to use something like tensorflow or some sort of lazy evaluation to make this toolbox work really efficiently.

brain object -> nifti and nifti -> brain object

need a function to convert inferred brain objects

refactor loading/saving of nifti files

_I'm imagining that BrainObject.nii is a function that takes either 0 or 1 argument. If 0 arguments are passed in, it returns itself as a nifti object. If 1 argument is passed in, it treats that argument as a template (which can be specified as a BrainObject object or a NiftiImage object). In other words, this should work similarly to a numpy array where x.T returns the transpose of x.

To facilitate easy translating between brain data in different formats, I'd make a read_brain function that takes in data (in whatever format) and returns a brain object. So:

If you pass in a brain object, the read_brain function returns its argument
If you pass in a string that links to a brain object, you load that brain object into memory and return it
If you pass in a nifti object, you convert it to a brain object and return it
If you pass in a string that links to a .nii or .nii.gz file, you load it into memory, convert it to a brain object, and return it.
Anytime you need to parse brain data, you can then call the read_brain function to get out the data in brain object format.

Similarly, you can create a write_brain function that takes in brain data (in whatever format) and a string (with a .bo, .nii., or .nii.gz extension). First you can call read_brain to convert whatever you've been given into a brain object, and then you'd either write that brain object to disk (if you're given the .bo extension) or convert it to nifti format and write to disk (if you're given the .nii or .nii.gz extension)._ (moved from #27)

tests

write tests for:

creating a brain object
creating a model object
updating a model object
predicting a timeseries
saving a model
saving a brain object (pickle, nifti)

se.save function

currently, we have methods to save brain objects and model objects (e.g. bo.save('filename')). To save nifti files, the current API is: nii = bo.to_nii('filename') where nii is a nibabel nifti object in memory, and if a filename is passed, it is also saved to disk.

Another possible (or complementary) approach would be to have a separate se.save function (@jeremymanning proposed):

nifti1 = se.save(bo2, 'nifti1.nii') <-- convert bo2 to a nifti image, save to disk, and return a nifti object
nift2 = se.save(bo2, 'nifti2.nii', template='template.nii') <-- convert bo2 to a nifti image using the specified template, save to disk, and return the nifti ojbect
bo2_copy = se.save(bo2, 'bo2.bo') <-- save bo2 to disk and return a copy of itself
bo3 = se.save(bo2, 'bo3.bo', template='template.nii')` <-- convert bo2 into the coordinates specified by template.nii and save to disk; return the new brain object

write tests for predict function

ideally we have tests for all helper functions, but particularly this one

add nibabel to requirements

resolved by pip install nibabel

"real time" predict function

since it seems like the result of eq10 is independent of n timepoints reconstructed, we could write a "close-to-realtime" version of the predict function, which could be output as a data stream, and that we could connect to brain plot, hypertools classifiers, etc.

Sanity checks with Uri Hasson's fMRI data

We could use fMRI data (where we have the full-brain patterns) as a sanity check for SuperEEG by pretending that some of the fMRI data were missing.

Here's what I'm thinking of for Uri's data (we could pick out one or two datasets to apply this to):

Analysis 1

For each subject s, use every other subject's data to estimate the full-brain correlation matrix
Hold out (randomly) x% of the voxels from subject s
Using the remaining (100 - x)% of the voxels and the correlation matrix estimated from everyone else's data, fill in data from the missing voxels using SuperEEG
Compare the reconstructions to the observed data (e.g. correlations between the reconstructed vs. observed activity, by voxel)

We can then make a brain map (reconstruction accuracy by voxel, averaged across subjects) for different values of x.

Key questions:

What is the minimum value of x that still leads to decent reconstruction accuracy?
Are there particular brain areas that are reconstructed especially well (or poorly)?

Analysis 2

Since everyone in each dataset experienced the same story, we could also use ISFC (in sliding windows, or using a new technique my lab is developing that estimates moment-by-moment ISFC matrices) to take into account dynamic correlation matrices. This would entail an analysis similar to the one above, except that:

We'd estimate (using everyone else's data) the moment-by-moment ISFC matrices, using our "timecorr" method
Separately for each timepoint, we'd then reconstructing the activity for the held-out voxels using that timepoint's ISFC matrix (for different proportions, x, of held-out voxels)

Key questions:

Do reconstruction accuracies improve when we take into account the dynamics of the stimulus-driven correlations?
Does this change the minimum value of x that is needed to reliably reconstruct held-out data?

support python 3.4+

Tasks to complete over break:

Write tests for helper functions
Clean up code and check breaks with pytest
Add/edit examples
Optional but would be cool: add plotting functions for brain object

expose rbf width to user

allow user to choose own radius while making model

se.Model(bos, rbf_width = 20)

Optimize code for reconstructing timesamples

I did some simulations and it seems like the answer that eq. 10 in the paper returns is not dependent on the number of time samples passed, so we have the option to pass as many time samples as we'd like. may be relevant for tensorflow implementation:

heres the code:

import numpy as np
import seaborn as sns

# unknown
Kba = np.random.rand(100,10)

# known
Kaa = np.random.rand(10,10)
Y = np.random.rand(10,5000)

# reconstruct 100 time samples
ts1 = np.squeeze(
    np.dot(np.dot(Kba, np.linalg.pinv(Kaa)), Y[:,:500]).T)

# reconstruct 2500 time samples
ts2 = np.squeeze(
    np.dot(np.dot(Kba, np.linalg.pinv(Kaa)), Y[:,:2500]).T)

print("TS 1 (one time sample) shape: " + str(ts1.shape))
print("TS 2 (2500 time samples) shape: " + str(ts2.shape))

# are the first 100 samples the same?

sns.plt.plot(ts1[:500,:])
sns.plt.title('500 time sample')
sns.plt.show()
sns.plt.plot(ts2[:500,:])
sns.plt.title('2500 time samples')
sns.plt.show()

sns.plt.scatter(ts1[:500,:], ts2[:500,:])
sns.plt.title('First 500 time samples')
sns.plt.show()

finish simulations on synthetic patient data

readthedocs

+description of toolbox
+API
+examples
+tutorial

plot functions

what kinds of plotting functions should we include int he package? here are some possibilities

plot of electrodes locations (glassbrain?)
heatmap of model (sns.heatmap)
timeseries plot of subject data (wrap mne function?)
static and dynamic brain plots of predicted data
...

generate mni space model with existing dataset

distance metrics for averaging correlation matrices

Our current implementation computed subject-wise correlation matrices, fisher's r2zs them, computes the euclidean mean and then inverts the fisher's r2z (i.e. z2r)

We could explore other distance functions such as log-euclidean and riemannian. The logic being that SPD matrices live on a curved manifold, so euclidean averaging might be sub-optimal.

This figure gives a good intuition for why a non-euclidean distance metric might be better:

tensorflow backend

Suggestion from Tim Blakely (Google):

"Depending on the use case, calculating the channels on the fly may be practical (and likely much cheaper). Along those lines, if most/all of the calculations can be expressed as a series of matrix operations, it might be worth some engineering effort to see if the calculations could be ported to TensorFlow. It's the machine learning toolkit we use internally at Google, and while it's mainly used for deep learning and analysis it's actually a generalized platform for doing arbitrary tensor computations (e.g. n-d matrix operations). The reason it might be interesting for your project is that TensorFlow works on both CPUs and GPUs. The latter is particularly useful since 1) it can perform streaming matrix multiplications orders of magnitude faster than CPUs, 2) you can get GCE instances with GPUs in them today, and 3) it may allow for much-closer-to-real-time calculation of SuperEEG channels. Only a suggestion, however; it's by no means a requirement for the project."

expand_corrmat: mode='predict'

de43ac4
In order to make these computations somewhat tractable, the second expand_corrmat (after the expanded subject correlation matrix is averaged with the model) needs to only expand to the unknownxknown and knownxknown locations of the matrix.

Comparing the mode='predict' with the default (mode='fit') gives seemingly similar values for the correlation matrix:

mode='predict':

mode='fit':

But when comparing the reconstructions, they are way off (correlation=.07). So I'm still working through this.