Git Product home page Git Product logo

supereeg's Introduction

supereeg logo

Overview

supereeg (name inspired by Robert Sawyer's The Terminal Experiment is a (fictional) tool for recording the electrical activities of every neuron in the living human brain. Our approach is somewhat less ambitious, but (we think) still "super" cool: obtain high spatiotemporal estimates of activity patterns throughout the brain using data from a small(ish) number of implanted electrodes. The toolbox is designed to analyze ECoG (electrocorticographic) data, e.g. from epilepsy patients undergoing pre-surgical evaluation.

The way the technique works is to leverage data from different patients' brains (who had electrodes implanted in different locations) to learn a "correlation model" that describes how activity patterns at different locations throughout the brain relate. Given this model, along with data from a sparse set of locations, we use Gaussian process regression to "fill in" what the patients' brains were "most probably" doing when those recordings were taken. Details on our approach may be found in this preprint. You may also be interested in watching this talk or reading this blog post from a recent conference.

Although our toolbox is designed with ECoG data in mind, in theory this tool could be applied to a very general set of applications. The general problem we solve is: given known (correlational) structure of a large number of "features," and given that (at any one time) you only observe some of those features, how much can you infer about what the remaining features are doing?

Toolbox documentation, including a full API specification, tutorials, and gallery of examples may be found here on our readthedocs page.

Installation

Recommended way of installing the toolbox

You may install the latest stable version of our toolbox using pip:

pip install supereeg

or if you have a previous version already installed:

pip install --upgrade supereeg

Dangerous/hacker/developer way of installing the toolbox (use caution!)

To install the latest (bleeding edge) version directly from this repository use:

pip install --upgrade git+https://github.com/ContextLab/supereeg.git

GPU acceleration

This is highly recommended if you are building your own models. Only NVIDIA GPUs are supported. To enable GPU acceleration for building models, pass gpu=True to se.Model, and install CuPy.

One time setup

  1. Install Docker on your computer using the appropriate guide below:
  2. Launch Docker and adjust the preferences to allocate sufficient resources (e.g. > 4GB RAM)
  3. Build the docker image by opening a terminal in the desired folder and enter docker pull contextualdynamicslab/supereeg
  4. Use the image to create a new container for the workshop
    • The command below will create a new container that will map your computer's Desktop to /mnt within the container, so that location is shared between your host OS and the container. Feel free to change Desktop to whatever folder you prefer to share instead, but make sure to provide the full path. The command will also share port 8888 with your host computer so any jupyter notebooks launched from within the container will be accessible at localhost:8888 in your web browser (or 192.168.99.100:8888 if using Docker Toolbox)
    • docker run -it -p 8888:8888 --name supereeg -v ~/Desktop:/mnt contextualdynamicslab/supereeg
    • You should now see the root@ prefix in your terminal, if so you've successfully created a container and are running a shell from inside!
  5. To launch Jupyter: jupyter notebook --no-browser --ip=0.0.0.0 --allow-root
  6. (Optional) Connect Docker to PyCharm or another IDE

Using the container after setup

  1. You can always fire up the container by typing the following into a terminal
    • docker start supereeg && docker attach supereeg
    • When you see the root@ prefix, letting you know you're inside the container
  2. Close a running container with ctrl + d from the same terminal you used to launch the container, or docker stop supereeg from any other terminal

Requirements

The toolbox is currently supported on Mac and Linux. It has not been tested on Windows (and we expect key functionality not to work properly on Windows systems). If using Windows, consider using Windows Subsystem for Linux or a Docker container.

Dependencies:

  • python 3.5+
  • pandas>=0.21.1
  • seaborn>=0.7.1
  • matplotlib==2.1.0
  • scipy>=0.17.1
  • numpy>=1.10.4
  • scikit-learn>=0.18.1
  • nilearn
  • nibabel
  • joblib
  • multiprocessing
  • deepdish
  • future
  • imageio
  • hypertools
  • scikit-image
  • cupy (HIGHLY RECOMMENDED for GPU acceleration)
  • pytest (for development)

Citing

We wrote a paper about supereeg, which you can read here. The paper provides full details about the approach along with some performance tests an a large ECoG dataset. If you use this toolbox or wish to cite us, please use the following citation:

Lucy L W Owen, Tudor A Muntianu, Andrew C Heusser, Patrick M Daly, Katherine W Scangos, Jeremy R Manning, A Gaussian Process Model of Human Electrocorticographic Data, Cerebral Cortex, , bhaa115, https://doi.org/10.1093/cercor/bhaa115

Here is a bibtex formatted reference:

@article{10.1093/cercor/bhaa115,
    author = {Owen, Lucy L W and Muntianu, Tudor A and Heusser, Andrew C and Daly, Patrick M and Scangos, Katherine W and Manning, Jeremy R},
    title = "{A Gaussian Process Model of Human Electrocorticographic Data}",
    journal = {Cerebral Cortex},
    year = {2020},
    month = {06},
    issn = {1047-3211},
    doi = {10.1093/cercor/bhaa115},
    url = {https://doi.org/10.1093/cercor/bhaa115},
    note = {bhaa115},
    eprint = {https://academic.oup.com/cercor/advance-article-pdf/doi/10.1093/cercor/bhaa115/33344231/bhaa115.pdf},
}

Contributing

Thanks for considering adding to our toolbox! Some text below has been borrowed from the Matplotlib contributing guide.

Submitting a bug report

If you are reporting a bug, please do your best to include the following:

  1. A short, top-level summary of the bug. In most cases, this should be 1-2 sentences.
  2. A short, self-contained code snippet to reproduce the bug, ideally allowing a simple copy and paste to reproduce. Please do your best to reduce the code snippet to the minimum required.
  3. The actual outcome of the code snippet
  4. The expected outcome of the code snippet

Contributing code

The preferred way to contribute to supereeg is to fork the main repository on GitHub, then submit a pull request.

  • If your pull request addresses an issue, please use the title to describe the issue and mention the issue number in the pull request description to ensure a link is created to the original issue.

  • All public methods should be documented in the README.

  • Each high-level plotting function should have a simple example in the examples folder. This should be as simple as possible to demonstrate the method.

  • Changes (both new features and bugfixes) should be tested using pytest. Add tests for your new feature to the tests/ repo folder.

  • Please note that the code is currently in beta thus the API may change at any time. BE WARNED.

Testing

Build Status

To test supereeg, install pytest (pip install pytest) and run pytest in the supereeg folder.

supereeg's People

Contributors

andrewheusser avatar jeremymanning avatar lucywowen avatar paxtonfitzpatrick avatar pmdaly avatar tmuntianu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

supereeg's Issues

function for loading model

in its current form, the full model in mni space is ~1.5 GB. Because of the size, it would be inefficient to load from a google drive link each time. I was thinking that we could have a download('example_model') function that stores the model locally, and then adapt the load function to first check to see if it exists locally before retrieving it from google drive. sound ok?

new example model

We need a new example model (131 locations) in the new numerator/denominator format. @lucywowen do you have something we can use?

to_nii() failing

Need to figure out why to_nii is failing when loading example_nifti but not when created from brain object

take advantage of GPUs on discovery

some of our computations may benefits from GPUs, particularly if we use tensorflow. discovery has 4 GPU nodes. it'd be worth exploring whether they could help us

cholesky decomp of toeplitz matrix

R = scipy.linalg.toeplitz(np.linspace(-1,1,len(locs))[::-1])
R = np.dot(R, R.T)
R = np.divide(R, np.max(R))
scipy.linalg.cholesky(R)

error:
LinAlgError: 170-th leading minor not positive definite

set_context function

seaborn has a nice way to set the plot styling for a particular context (like figures for a talk, poster, paper or notebook)

I was thinking we could take a similar approach with superEEG, setting a context to run the code on a single computer, cluster, or some other setups. I idea would be to:

import superEEG
superEEG.set_context('google-cluster')

This would change an internal config object that determines the behavior of our functions (how the jobs are split up etc). thoughts?

Task list from Andy/Lucy meeting (1/18)

  • Finish tests (see test issue for point by point)

  • Make sure all parameters are defined

  • Documentation for new brain plots (plot_data() and plot_locs())

  • Only load necessary stats functions in brain and model scripts

  • Update tutorials

  • Update examples

predict best locations

Use resting state matrix to create synthetic data, parcellate by region (like Bassett paper), and predict values based on many random samples of n electrodes. Average predicted correlations in regions and across samples to generate a 'best electrode configuration' map.

preprocess module?

for things like filtering, removing elecs by kurosis, we could have a wrapper function (superEEG.preprocess) that takes a brain objects and returns a processed brain object.

an alternative would be to pass a dictionary (or something) when creating the brain object.

Debug predict function

Something is off with the predict function when running simulated patient data. I think it has to do with the expand_corrmat function, but I'm going to break it down to see where the issue is occurring.

Here's the outline/pseudocode for the simulated_patient_data script:

  1. Define:
  • # samples
  • # patients
  • # electrodes/patient in model
  • # electrodes/patient in patient to be reconstructed
  1. Extract locations from nifti
  • Using a downsampled set of locations = 170
  1. Create synthetic patient data
  • Create toeplitz matrix = R
  • Multiply the cholesky decomposition of R by a new random multivariate distribution for each patient
  • This gives synthetic 'recordings' from each of the 170 locations
  • Save each of these as a brain object
  1. Subsample patient (bo) to be reconstructed
  • Randomly sample locations (known locations) and corresponding timeseries
  • Create new brain object - bo_sub
  1. Create synthetic model
  • Loop over all but one patient (bo_sub or the patient to be reconstructed)
  • For all other locations not including the known locations (unknown locations), subsample locations and corresponding timeseries
  • Create correlation matrix for each patient expanding to all unknown locations
  • Sum across patients for model
  • Model contains Numerator and Denominator fields
  1. For patient to be reconstructed (bo_sub), predict values at all unknown locations
  • Create correlation matrix at all unknown locations
  • Add with model and average
  • Expand new average correlation matrix to the patient's electrode locations (known locations)
    this is the step that's the problem. I think
  • Index new expanded correlation matrix just at unknown by known locations and known by known locations for the reconstruct function
  • Output will be just the predicted timeseries for the unknown locations
  1. Calculate the correlation between predicted and actual timeseries
  • Actual: use the 'recordings' for unknown locations in the reconstructed patient (bo)
  • Predicted: use the reconstructed recordings
  • Calculate the correlation at each corresponding location

remap function

copying from another issue #27 -

And finally, to help with the se.save(bo2, 'bo3.bo', template='template.nii') function, it would be useful to have a helper function as part of the brain objects to do something like:
bo3 = bo2.remap('template.nii') <-- return a new brain object with locations specified in template.nii
bo4 = bo2.remap('bo2.bo') <-- return a new brain object with locations specified in bo2.bo

Also, for all of the above, it'd be useful to support both the "filename" syntax as described above, and also an alternative syntax where you just pass the (brain object or nifti) objects themselves. In other words if template is a nifti object (stored on disk as template.nii),

bo3 = bo2.remap(template) and bo3 = bo2.remap('template.nii') should do the same thing. And similarly, if template is a brain object rather than a nifti object, the code should also work.

^ from @jeremymanning

optimal electrode placement based on ROI

Can we include a measure in the user interface that allows to you predict the activity in one region given the activity in another region (i.e. if we want to know activity in the amygdala, can we reconstruct that activity using electrodes in the contralateral occipital cortex, and to what degree of certainty ... to inform potential electrode placement by regions of desired reconstruction.

load only subfield

modify file format (brain objects and model objects)
would be nice to load in subfield - pickle does

HDF5 or NPZ??

Simulated dataset that we can reconstruct perfectly

Lucy and I are trying to come up with a test dataset where we simulate two electrodes, and then reconstruct one location equidistant between them. The idea would be to have a test where we know exactly what the reconstruction should be, and we can make sure our code recovers it exactly. we tried using 2 sinusoids where of frequency f and 2f, but the problem is that they are orthogonal so the model breaks down. Any ideas for 2 timeseries that we can use to create this test?

@jeremymanning - not a rush on this, just wanted to jot down the idea

license

need to choose a license, and modify code examples with updated license

define an API

We need to come up with a list of functions we want to include in the toolbox and define how users should be able to interact with them. Ideas:

  • supereeg.brain: data object that includes a pandas dataframe of voltage timeseries data (brain.data) and a pandas datagrame of electrode locations (brain.locs). A supereeg.template is a special case of a brain object where data is None. Note: this likely needs to be done in a fancy way whereby the data is a pointer to data on disk, but doesn't actually store it in memory. Otherwise we'll quickly run out of memory for any non-trivial example.

  • supereeg.predict: this is the main function. given a brain data object with the subject's data (subj) and a template object (default: standard MNI 152 brain), return a new brain object containing the inferred data from the subject at all locations in the template.

  • probably also some plotting functions that wrap various parts of nilearn (e.g. plot electrode locations, create an animation, make a map, etc.)

  • functions for easily replicating all analyses in the paper

  • a function that takes in a list of brain objects (one per subject) and a template object, and returns correlation matrices for each subject along with a combined correlation matrix. (question: should this be stored in the brain data objects?) this will get used by predict.

We may want to use something like tensorflow or some sort of lazy evaluation to make this toolbox work really efficiently.

refactor loading/saving of nifti files

_I'm imagining that BrainObject.nii is a function that takes either 0 or 1 argument. If 0 arguments are passed in, it returns itself as a nifti object. If 1 argument is passed in, it treats that argument as a template (which can be specified as a BrainObject object or a NiftiImage object). In other words, this should work similarly to a numpy array where x.T returns the transpose of x.

To facilitate easy translating between brain data in different formats, I'd make a read_brain function that takes in data (in whatever format) and returns a brain object. So:

If you pass in a brain object, the read_brain function returns its argument
If you pass in a string that links to a brain object, you load that brain object into memory and return it
If you pass in a nifti object, you convert it to a brain object and return it
If you pass in a string that links to a .nii or .nii.gz file, you load it into memory, convert it to a brain object, and return it.
Anytime you need to parse brain data, you can then call the read_brain function to get out the data in brain object format.

Similarly, you can create a write_brain function that takes in brain data (in whatever format) and a string (with a .bo, .nii., or .nii.gz extension). First you can call read_brain to convert whatever you've been given into a brain object, and then you'd either write that brain object to disk (if you're given the .bo extension) or convert it to nifti format and write to disk (if you're given the .nii or .nii.gz extension)._ (moved from #27)

tests

write tests for:

  • creating a brain object
  • creating a model object
  • updating a model object
  • predicting a timeseries
  • saving a model
  • saving a brain object (pickle, nifti)

se.save function

currently, we have methods to save brain objects and model objects (e.g. bo.save('filename')). To save nifti files, the current API is: nii = bo.to_nii('filename') where nii is a nibabel nifti object in memory, and if a filename is passed, it is also saved to disk.

Another possible (or complementary) approach would be to have a separate se.save function (@jeremymanning proposed):

nifti1 = se.save(bo2, 'nifti1.nii') <-- convert bo2 to a nifti image, save to disk, and return a nifti object
nift2 = se.save(bo2, 'nifti2.nii', template='template.nii') <-- convert bo2 to a nifti image using the specified template, save to disk, and return the nifti ojbect
bo2_copy = se.save(bo2, 'bo2.bo') <-- save bo2 to disk and return a copy of itself
bo3 = se.save(bo2, 'bo3.bo', template='template.nii')` <-- convert bo2 into the coordinates specified by template.nii and save to disk; return the new brain object

"real time" predict function

since it seems like the result of eq10 is independent of n timepoints reconstructed, we could write a "close-to-realtime" version of the predict function, which could be output as a data stream, and that we could connect to brain plot, hypertools classifiers, etc.

Sanity checks with Uri Hasson's fMRI data

We could use fMRI data (where we have the full-brain patterns) as a sanity check for SuperEEG by pretending that some of the fMRI data were missing.

Here's what I'm thinking of for Uri's data (we could pick out one or two datasets to apply this to):

Analysis 1

  1. For each subject s, use every other subject's data to estimate the full-brain correlation matrix
  2. Hold out (randomly) x% of the voxels from subject s
  3. Using the remaining (100 - x)% of the voxels and the correlation matrix estimated from everyone else's data, fill in data from the missing voxels using SuperEEG
  4. Compare the reconstructions to the observed data (e.g. correlations between the reconstructed vs. observed activity, by voxel)

We can then make a brain map (reconstruction accuracy by voxel, averaged across subjects) for different values of x.

Key questions:

  • What is the minimum value of x that still leads to decent reconstruction accuracy?
  • Are there particular brain areas that are reconstructed especially well (or poorly)?

Analysis 2

Since everyone in each dataset experienced the same story, we could also use ISFC (in sliding windows, or using a new technique my lab is developing that estimates moment-by-moment ISFC matrices) to take into account dynamic correlation matrices. This would entail an analysis similar to the one above, except that:

  1. We'd estimate (using everyone else's data) the moment-by-moment ISFC matrices, using our "timecorr" method
  2. Separately for each timepoint, we'd then reconstructing the activity for the held-out voxels using that timepoint's ISFC matrix (for different proportions, x, of held-out voxels)

Key questions:

  • Do reconstruction accuracies improve when we take into account the dynamics of the stimulus-driven correlations?
  • Does this change the minimum value of x that is needed to reliably reconstruct held-out data?

Tasks to complete over break:

  • Write tests for helper functions

  • Clean up code and check breaks with pytest

  • Add/edit examples

  • Optional but would be cool: add plotting functions for brain object

Optimize code for reconstructing timesamples

I did some simulations and it seems like the answer that eq. 10 in the paper returns is not dependent on the number of time samples passed, so we have the option to pass as many time samples as we'd like. may be relevant for tensorflow implementation:

heres the code:

import numpy as np
import seaborn as sns

# unknown
Kba = np.random.rand(100,10)

# known
Kaa = np.random.rand(10,10)
Y = np.random.rand(10,5000)

# reconstruct 100 time samples
ts1 = np.squeeze(
    np.dot(np.dot(Kba, np.linalg.pinv(Kaa)), Y[:,:500]).T)

# reconstruct 2500 time samples
ts2 = np.squeeze(
    np.dot(np.dot(Kba, np.linalg.pinv(Kaa)), Y[:,:2500]).T)

print("TS 1 (one time sample) shape: " + str(ts1.shape))
print("TS 2 (2500 time samples) shape: " + str(ts2.shape))

# are the first 100 samples the same?

sns.plt.plot(ts1[:500,:])
sns.plt.title('500 time sample')
sns.plt.show()
sns.plt.plot(ts2[:500,:])
sns.plt.title('2500 time samples')
sns.plt.show()

sns.plt.scatter(ts1[:500,:], ts2[:500,:])
sns.plt.title('First 500 time samples')
sns.plt.show()

image

image

image

readthedocs

+description of toolbox
+API
+examples
+tutorial

plot functions

what kinds of plotting functions should we include int he package? here are some possibilities

  • plot of electrodes locations (glassbrain?)
  • heatmap of model (sns.heatmap)
  • timeseries plot of subject data (wrap mne function?)
  • static and dynamic brain plots of predicted data
    ...

distance metrics for averaging correlation matrices

Our current implementation computed subject-wise correlation matrices, fisher's r2zs them, computes the euclidean mean and then inverts the fisher's r2z (i.e. z2r)

We could explore other distance functions such as log-euclidean and riemannian. The logic being that SPD matrices live on a curved manifold, so euclidean averaging might be sub-optimal.

This figure gives a good intuition for why a non-euclidean distance metric might be better:

image

tensorflow backend

Suggestion from Tim Blakely (Google):

"Depending on the use case, calculating the channels on the fly may be practical (and likely much cheaper). Along those lines, if most/all of the calculations can be expressed as a series of matrix operations, it might be worth some engineering effort to see if the calculations could be ported to TensorFlow. It's the machine learning toolkit we use internally at Google, and while it's mainly used for deep learning and analysis it's actually a generalized platform for doing arbitrary tensor computations (e.g. n-d matrix operations). The reason it might be interesting for your project is that TensorFlow works on both CPUs and GPUs. The latter is particularly useful since 1) it can perform streaming matrix multiplications orders of magnitude faster than CPUs, 2) you can get GCE instances with GPUs in them today, and 3) it may allow for much-closer-to-real-time calculation of SuperEEG channels. Only a suggestion, however; it's by no means a requirement for the project."

expand_corrmat: mode='predict'

de43ac4
In order to make these computations somewhat tractable, the second expand_corrmat (after the expanded subject correlation matrix is averaged with the model) needs to only expand to the unknownxknown and knownxknown locations of the matrix.

Comparing the mode='predict' with the default (mode='fit') gives seemingly similar values for the correlation matrix:

mode='predict':
screen shot 2017-08-08 at 4 23 26 pm
mode='fit':
screen shot 2017-08-08 at 4 24 54 pm

But when comparing the reconstructions, they are way off (correlation=.07). So I'm still working through this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.