Git Product home page Git Product logo

pyitab's Introduction

sekupy

example workflow codecov Documentation Status Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. CodeFactor

sekupy is a python-package created for deterging your (dirty) (and) (multivariate) neuroimaging analyses. The package has been thought for decoding analyses but it includes also basic univariate analyses.

It has some utilities to vary sets of parameters of the analyses without struggling with for and if statements.

It deterges your results, by saving them in a safe manner, by also keeping in mind BIDS.

sekupy is the deterged version of pyitab.

Documentation

The documention can be found here.

Install

You can install it by using:

pip install sekupy

Example

The main idea is to use a dictionary to configure all parameters of your analysis, feed the configuration into an AnalysisPipeline object, call fit to obtain results, then save to store in a BIDS-ish way.

For example if we want to perform a RoiDecoding analysis using some preprocessing steps we will have a script like this (this is not a complete example):

from sekupy.analysis.configurator import AnalysisConfigurator
from sekupy.analysis.pipeline import AnalysisPipeline
from sekupy.analysis.decoding.roi_decoding import RoiDecoding

_default_config = {
                    # Here we specifiy that we have to transform the dataset labels
                    # then select samples and then balance data
                    'prepro': ['target_transformer', 'sample_slicer', 'balancer'],
                    
                    # Here we set which attribute to choose (dataset is a pymvpa dataset)
                    'target_transformer__attr': "image_type",
                    # Here we select samples with a image_type equal to I or O and evidence equal to 1
                    'sample_slicer__attr': {'image_type':["I", "O"], 'evidence':[1]},
                    # Then we say that we want to balance image_type at subject-level
                    "balancer__attr": 'subject',

                    # We setup the estimator in a sklearn way
                    'estimator': [
                        ('fsel', SelectKBest(k=50)),
                        ('clf', SVC(C=1, kernel='linear'))],
                    'estimator__clf__C': 1,
                    'estimator__clf__kernel': 'linear',
                    
                    # Then the cross-validation object (also sklearn)
                    'cv': LeaveOneGroupOut,
                    
                    'scores': ['accuracy'],
                    
                    # Then the analysis
                    'analysis': RoiDecoding,
                    'analysis__n_jobs': -1,
                    
                    'analysis__permutation': 0,
                    
                    'analysis__verbose': 0,
                    
                    # Here we say that we want use the region with value 1 in image+type mask
                    'kwargs__roi_values': [('image+type', [1]), ('image+type', [2]), ('image+type', [3]),
                                            ('image+type', [4]), ('image+type', [5])],
                    
                    # We want to use subject for our cross-validation
                    'kwargs__cv_attr': 'subject'
                    }

configuration = AnalysisConfigurator(**_default_config), 
                                     kind='configuration') 
kwargs = configuration._get_kwargs() 
a = AnalysisPipeline(conf, name="roi_decoding_across_full").fit(ds, **kwargs) 
a.save() 

Surf the code, starting from classes used here!!

pyitab's People

Contributors

dependabot[bot] avatar robbisg avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

pyitab's Issues

BIDS reader

Build a reader to get files in BIDS and use pymvpa ds.

sample_slicer bug in AnalysisIterator with configuration

When we use AnalysisIterator(kind='configuration') it may happen that sample_slicer__attribute is not overwrited, sometimes it is ok since we want to slice different attributes but sometimes not.

One solution could be to use a dictionary instead of different sample_slicer__attribute keys.

Cross-decoding

Cross-decoding should be performed with an ad-hoc split of examples during cross-validation:

Things to do:

  • Build a custom partitioner that separates labels from an experiment to train and other to test.

Issues

From @robbisg on June 13, 2018 12:38

  • PermutatorTransformer
  • Save function for Transformer
  • Cross-decoding How to
  • Within/Across Searchlight

Copied from original issue: robbisg/mvpa_itab_wu#24

States

From @robbisg on June 13, 2018 12:11

Implement states.

  • Build StateAnalyzer

Copied from original issue: robbisg/mvpa_itab_wu#22

FileNotFoundError in get_results

File "/home/robbis/git/joblib/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/home/robbis/git/joblib/joblib/externals/loky/process_executor.py", line 272, in call
return self.fn(*self.args, **self.kwargs)
File "/home/robbis/git/joblib/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/home/robbis/git/joblib/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/home/robbis/git/joblib/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/home/robbis/git/pyitab/pyitab/analysis/results/base.py", line 23, in get_values
with open(conf_fname) as f:
FileNotFoundError: [Errno 2] No such file or directory: configuration.json'

Generate an id for the Analyzer class

The Analyzer class should generate an id to be used in conjunction with analysis.name.

If we use the iterator, the responsibility of the id generation should be on the shoulders of the iterator?

Coverage

analysis

configurator.py

  • test _get_fname_info
  • test _get_kwargs (maybe deprecated)
  • test save

pipeline.py

  • Test fit
  • test lines 53/54
  • test save

iterator.py

  • Test iterator
  • test combination, list and dict types

base.py

  • test save

results/base.py

  • test get_values
  • test get_results
  • test filter_dataframe
  • test df_fx_over_keys

io

subjects.py

  • Test with selected_subjects

configuration.py

  • Test save_configuration

Loading package

  • Test fetch with n_subjects=1
  • Test load_ds with subjects without data (e.g. monks data)

Scoring Issue in TemporalDecoding

ValueError: Classification metrics can't handle a mix of binary and unknown targets

If scoring is set!

Working around with setting self.scoring = None in temporal_decoding.py at line 100.

Write results functions.

From @robbisg on June 5, 2018 10:23

I need to write functions for writing results.

Questions:

  • Where use path?
  • Use a different saver for the transformers/analyzers?

Searchlight results

  • Maps for each cv
  • Average map if a cv within subject is performed
  • Merged map for a statistical test
  • Files / something for AFNI test
  • Permutation maps ?

Decoding

  • Weight map
  • Feature selection
  • Cross validation indices
  • Accuracies / Avg. Accuracy
  • Permutation values

Connectivity

States

Cross-decoding

Copied from original issue: robbisg/mvpa_itab_wu#18

New features

Permutation

  • Is it better to use a transformer? #82

Cross-decoding

  • Build a custom CrossValidator

Transformers

Connectivity

See #60

States

  • Build StateAnalyzer
  • Build Transformers for states
  • Import metrics from mvpa_itab

Results

  • Refactor results using BIDS format of derivatives #27
  • Cope with subject-wise results.

Configurator, Pipelines and Iterator

  • Create a Configurator class that is general. #30
  • Create a DecodingConfigurator class that inherits from Configurator. #30
  • Add Dataset Loading into Configurator #31
  • Study the possibility to use iterator to run subject-wise analysis.

Deprecated

  • Evaluate load_spatiotemporal_dataset
  • Evaluate read_configuration_json
  • Evaluate read_remote_configuration

Averager/PCA/MVPC Transformer

We need a transformer that using fa attributes of rois or something else transforms data by averaging/ getting PCs or by MVPC.

Error on detrending

ValueError: Cannot detrend the dataset, since it neither provides location information of its samples in the space spanned by the polynomials, nor does it match the number of samples this this mapper has been trained on. (got: 360 and was trained on 240).

IndexError: too many indices for array

pyitab/analysis/searchlight/init.py in _split_name(self, X, y, cv, groups)
171 X = X[..., 0]
172
-->173 split = [np.unique(groups[:, 1][test])[0] for train, test in cv.split(X, y=y, groups=groups)]
174 return split

It depends on group attribute which sometimes could be a list, in that case it should be used a different solution to be thinked.

Within subject analyses

How to cope with this??

  1. Use the iterator, by iterating the fetch (#31) ➡️ maybe problems in saving (#27) (but we can use the AnalysisPipeline save subdir function!)
  2. Maybe is better to develop #27
  3. For some analyses (connectivity) we need to use the iterating stuff.

Create a general Configurator class.

This class is responsible of:

  • Loading dataset loader
  • Save configuration
  • Prepare all of the important objects for AnalysisPipeline class

Specific analysis configuration should inherit from this class!

Convert results dir structure in a bids-ish way!

Problem

Use BIDS specs to create dirs and so on!

Every analysis must be included in derivatives folder
the structure must be
<ds-root>/derivatives/<pipeline-name>/<subj-dir>

in the subj dir the files must be named as:
<source_keyword>[_keyword-value]_<suffix>.<ext>
in addition a dataset-description.json must be included.

  1. The best solution is that <pipeline-name> is the on provided by AnalysisPipeline or by Analyzer
  2. dataset-description.json must include PipelineDescription.Name
  3. Maybe we can use keyword-value in file to specify different analyses.
    • A possibility is to use <pipeline-name>-<variant>
  4. Each subdir file should be described by a filename.json file with:

TODO

  • Every analysis must have a pipeline name
  • We must check if dataset is composed by a single subject or by several subjects
  • The derivatives/<subdir> must be [pipeline-<pipeline_name>]_analysis-<searchlight | decoding | connectivity>_[<analysis-specific-key>-<value]
  • Analyzer must implement _get_pipeline_info to build dir name
  • For searchlight analyses we must have derivatives/analsysis-searchlight_radius-0.3
  • For decoding analyses we must have derivatives/analysis-roi_decoding_area-brain
  • Obtain <source_keywords>
  • Analyzer must implement _get_fname_info in order to build filename appropriately.
  • Filenames for searchlight analyses must be <source_keywords>_target-<values>_task-<value>_date-<datetime>_num-<number>_<keyf>-<value>_<avg | cv>.nii.gz
  • Filenames for decoding analyses must be <source_keywords>_target-<values>_task-<task>_mask-<mask>_value-<roi_value>_date-<datetime>_num-<num>_<key>-<value>_data.mat
  • Each filename is paired with filename.json which is our configuration file! #43
  • We can split information about dataset in dataset-description.json at the top of the pipeline dir #43

Fixes

  • Remove underscore from dir and files names #41
  • Analyses with sample_slicer__subjects must share same derivatives dir #42
  • Results reader
  • Remove hard-coded configuration.json #43

Within subject searchlight

From @robbisg on June 5, 2018 16:2

The problem is that I need to do within-subject searchlight using the AnalysisScript and save results.

Possible solutions:

  • Use script iterator that generates a unique id, then collect results using this id.
  • Insert a name in the ScriptConfigurator or AnalysisIterator that should be used by AnalysisScript to
    generate a folder to store data.

Copied from original issue: robbisg/mvpa_itab_wu#20

Connectivity Analyses

From @robbisg on June 13, 2018 9:37

I need to implement several things to perform a connectivity analysis

Transformers

  • Averager across ROI
  • PCA
  • Multivariate distance
  • ...

Analysis / Measures (look nilearn)

  • Seed based Correlation
  • ICA

Preprocessing / Trasformers?

  • GLM regressor

Copied from original issue: robbisg/mvpa_itab_wu#21

Searchlight partial results

The problem with searchlight is that some results may be lost if machines go down.
Maybe joblib Memory utility is useful to store partial results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.