ramp-kits / autism Goto Github PK

View Code? Open in Web Editor NEW

63.0 63.0 42.0 11.62 MB

Data Challenge on Autism Spectrum Disorder detection

Home Page: https://paris-saclay-cds.github.io/autism_challenge/

Python 16.11% Jupyter Notebook 61.30% Shell 22.59%

autism's People

Contributors

Stargazers

Watchers

autism's Issues

Advance notebook

Add a section regarding the extraction of the time series with the corresponding script
Reduce the number of rows when showing the dataframe
Add a description of the metrics
Give the details about the quality check

Can you provide functions that reconstruct images from values within parcels?

Typical neuroimaging pipelines work from images. Hence it would be interesting to turn the extracted signals into images.

About the prizes

The top 10 submission or the top 10 teams will win the prizes?

Subcortical volume measures in the data

Hi
In the anatomical features for parts of the brain in the LH and RH, i can spot the area and thickness measures (on running the starter kit) but it is not clear to me which labels correspond to the volumes of each of these regions.
The thickness and area labels are labelled like: anatomy_lh_thickness and anatomy_lh_area for lh and rh. I can see some volume measures with the 'Vol' substring. There are others called 'holes'. So, other than the area and thickness measures, the volume measure seems less in number (Freesurfer says there are 40 subcortical regions). Can someone pl explain how to interpret the volumes and holes measures ?

Pre-requisite: ramp_test_workflow

On the webpage, ramp_test_workflow is mentioned.

However, I am not sure whether executing this workflow is actually running the Jupyter notebook, executing ramp_test_submission at the end of the notebook, or executing something else which would be found in the Ramp-Worflow repository.

RAMP event page

Hi, I've signed up to ramp.studio but when I click on the event link on https://ramp.studio/problems/autism, it says "no event named "autism""

How can custom dependencies be installed?

Some submissions may need specific dependencies. What is the proper way to make sure that they are met?

Other datasets

Hi, is it against the rules use information of others datasets, for example, to help in the feature selection?

Add gcn package to environment

add gcn package to standard environment

gcn
https://github.com/parisots/gcn

and dependencies of this package

tensorflow (>0.12)
networkx
joblib

Cortical surfaces

Hi,

I was wondering if you could provide a couple more files so one can work with the cortical surfaces (*h.white, *h.pial, *h.sphere, *h.sphere.reg and talairach.xfm) as for now, features describing the shape of the brain can not be incorporated in the model and it's been previously shown that this is something relevant in autism.

Thanks a lot,
Pierre

Is coding in R( in r studio) allowed ?

Since I am more efficient working in R than Python, can I implement my algorithms in R/RStudio ( including replicating the starter kit).

Please guide.

accounting for confounds in timeseries

It is said in the doc that "For each subject, we preliminary extracted signals using different brain parcellations and atlases and accounting for motion correction and confounds." but while going through the script
preprocessing/extract_time_series.py it looks like the saved .csv files are obtained without confounds regression
https://github.com/ramp-kits/autism/blob/master/preprocessing/extract_time_series.py#L233

Am I misunderstanding something ?

WM/CSF regression

Dear all,

First of all, thank you for your efforts and publishing the dataset.
I am just wondering is whether or not the WM/CSF are regressed out. It's a pretty common practice for resting state fMRI if I'm not mistaken.

Thanks,
Makis

Building this enviroment allocated more than 128GiB of RAM

I'm using Arch linux with miniconda from AUR and I am aksing you guys how many gigs did you needed to build this enviroment?

FeatureExtractor() inside cross_validation

When I did some feature selection process in FeatureExtractor() or Classifier() based on some relationship between X and y. I realized different X_split and y_split in each cv fold make the selected features different for each cv fold. This gives different auc. So I was thinking would it be better for the FeatureExtractor() outside of cross_validation?
Something like:

    features=FeatureExtractor()
    X_new=features.fit_transform(X,y)
    cross_validate(Classifier(), X_new, y)

Not hundred sure about this. Maybe if the feature selection is stable enough, then the variance from this could be ignored.

When I tried to build FC maps with motion regressed out and compute the graph measures for the FC, it took 10 mins to finish the feature extraction. If the FeatureExtractor() is outside the cross_validation process, it would save a lot of time. This is just a small issue as I gave up building FC myself and calculating graph.

Post-Challenge Image Acquisition Question

Hello,

Thank you for organizing this challenge in the first place, it has been a great dataset to use. I know in issue #36 you addressed not making additional metadata while the challenge was running, but I was wondering if you would be able to now that the challenge has closed. I know you have made the TR available already. Specifically, I was wondering if you could make the following parameters available:

TE for the different imaging sites
Eyes being open, closed, or were blink artifacts removed?
What voxel sizes were used in the structural imaging?
What was the flip angle of the imaging sequence?

Thank you again,

-Cooper Mellema

correspondence between time series and brain regions

Hi
I was wondering whether you can give more details about the correspondence between time series and brain regions when given a specified atlas.

As far as I know, I can get a DataFrame object by using pandas to load a subject's fMRI data(*.csv files). I think the columns mean time series extracted from brain regions, is that right? Besides, given a specific time series, how to get the corresponding anatomic label?

I am considering using priori knowledge to do feature-selection. If you can answer these questions, I will be very grateful. Thank you. @GaelVaroquaux @glemaitre

Sliding window dependency

Hello,

Please install slidingwindow dependency: https://pypi.org/project/slidingwindow/

Thank you

Are scripts allowed to access the internet?

Information on fMRI acquisition

Dear all,

first of all thanks for organizing the challenge, it's a blast getting started with this :)

I was looking into creating features from the resting state data (not necessarily based on functional connectivity) and saw that the time series are of different length. Which is in a clinical setting with different sites definitely understandable.

However, now I am wondering whether the data was acquired using the same BOLD sequences. Or whether there are some information on acquisition parameters.
Especially the TR, which would be necessary / useful for applying some temporal filtering on the data or using temporal features.

Best,
Simon

(I hope I just didn't overlook the info somewhere...)

Modify download_data to be a function

We should make download_data.py more than a script.
Create a function fetch_fmri_time_series(atlas='all') which can be called in the __main__ to be executed.

Is submitting a script that downloads a pretrained model acceptable?

best submissions

How should we deal with the best_submission branch?
Should we keep it as an independent branch and merge master into it or merge it to the master branch?

Also, two of the best submissions now fail because of scikit-learn LogisticRegression classifier evolution. Should we freeze scikit-learn version in order to maintain compatibility?

Will these data be available after the challenge?

Hi,

I just wanted to ask what will happen to these data after the challenge? Also, is it possible to use these data for other purposes? For example, suppose one comes up with some novel idea during the challenge and would like to publish that (in collaboration with the owners of the data). Would it be possible?

Br,
Tuomas

The "Contact" on the notebook should be changed to "Questions"

I think that we want to avoid mail as much as possible and direct people to ask questions on the github issues.

Hence the last section of the notebook should be changed.

I am not doing it for fear of generating conflicts because of different Jupyter versions.

Is the raw MR data available?

The data shared is a set of signals extracted on brain parcellations. However, it might be interesting to work from raw MR images. Are these available?

Difficulty finding the right versions of the packets in 2024

Can somebody with a runnable virtual environment just list the version of Python and all the versions of the dependencies? I have many issues trying to find the right versions of all the packets. Two years of updates did the damage.

Update time-series using Craddock 2012 atlas

We need to update the public and private data using the this atlas.

Source Code

Now that the competition is over, are the source codes available? just to reproduce the results? and to see how it works?

update h5py module at the server

please update h5py module at the server,
because I get this "training_error" and I don't know what is reason. It works at local test.
I hop to solve it with update of h5py module.
can you help me to solve it with some else solutions ?

@GaelVaroquaux

Publishing the results

Thanks for posting such an interesting challenge. I am currently working as a Ph.D. student in the Digital Health group (https://hpi.de//boettinger/home.html). I was wondering the results we obtain during the whole process ( Insights + analysis + results) can be published in collaboration with the people who originally collected and put together the data.

Looking forward to your reply.

is column ''anatomy_select'' in the training data a binary one?

hi,
In the starter-kit it's said that the column 'anatomy_select' in the training data is a binary one, pointing to the quality of the data
But when calling a value_counts on it I've values ranging in 0, 1, 2, 3
Have I missed something?

Is Keras (or any DL framework) installed in the server ?

I am thinking of using a DL framework but I am wondering if using such framework is allowed or installed on the server.

cannot download the data

python download_data.py  basc122

Downloading the data from https://storage.ramp.studio/autism/basc122.zip ...
Traceback (most recent call last):
  File "download_data.py", line 170, in <module>
    fetch_fmri_time_series(atlas)
  File "download_data.py", line 153, in fetch_fmri_time_series
    _check_integrity_atlas(atlas)
  File "download_data.py", line 106, in _check_integrity_atlas
    _download_fmri_data(atlas)
  File "download_data.py", line 77, in _download_fmri_data
    _check_and_unzip(output_file, atlas, atlas_directory)
  File "download_data.py", line 62, in _check_and_unzip
    raise IOError('The file downloaded was corrupted. Try again '

@GaelVaroquaux @agramfort

ROI localization for spatial information in rsfMRI atlas

Is it possible to have a correspondance-dictionary between the .csv columns and ROIs localization? I.e column1 is the mean value for a ROI in an area of the prefrontal cortex. Probably the original atlas (or the version inside nilearn) have this information, but it would be nice to have the .nii.gz file to visually exploit spatial localization.
This could be useful to perform seed-analysis, were the seed localization is important. I saw your response in #7 but dont know if it would work (using nilearn's to_filename function). My concern is if the ROI with value 1 will belong to the column 1 of the .csv file.

Thanks for all the help!

ramp.studio: No event named "autism"

This is an issue with the RAMP website. There is a problem called autism, but no event.

Clicking on the link:

leads to:

Can we exclude certain data and labels based on a condition?

Based on the instructions, my personal comprehension is that we have to provide you the two basic functions, FeatureExtractor( ) and Classifier( ). I would like to access the whole data and exclude some of them, so afterwards I'll have to exclude their corresponding labels, as well. I can exclude the data based on the condition each time the FeatureExtractor is called but I can't do the same for the labels through it. So my question is if we will have to execute all the commands before FeatureExtractor is called (because that would solve my problem) or not.

Jupyter is missing from requirements.txt

Jupyter is mentioned in README.md but not in requirements.txt.

#20

Solutions to challenge

Hi
Can you please tell where we can find some of the best solutions submitted for this challenge ?Can you please post the winning code and approaches.

Dataset features description

Hi,
I was just wondering whether somewhere there is a more detailed and structured description of data features? For example, what is participants_site ?
Thanks

More information about "motions"

Hello,
Would it possible to have more information about the "motions" measures? Indeed, it doesn't seem to be a parcellation technique as described in the project documentation.

Regards,
Vincent

Regarding preprocessing

Hi,

I am working on the competition data after attended the challenge. May I ask a quick question regarding preprocessing? Do you mind specify what kind of preprocessing you did for rs-fMRI and T1 (especially for rs-fMRI)? For example, specific parameters for slice timing, realignment, registration, smoothing, frequency band, etc.

Looking forward to hearing from you. Thanks a lot!

Best,
Jongwoo Choi

Pandas version: TypeError: drop() got an unexpected keyword argument 'columns'

Hi,
I am trying to run the starting kit but got the following error for the evaluation function

results = evaluation(data_train, labels_train)

>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-1247bf8432df> in <module>()
----> 1 results = evaluation(data_train, labels_train)
      2 
      3 print("Training score ROC-AUC: {:.3f} +- {:.3f}".format(np.mean(results['train_roc_auc']),
      4                                                         np.std(results['train_roc_auc'])))
      5 print("Validation score ROC-AUC: {:.3f} +- {:.3f} \n".format(np.mean(results['test_roc_auc']),

<ipython-input-17-e7b8911b304f> in evaluation(X, y)
      8     results = cross_validate(pipe, X, y, scoring=['roc_auc', 'accuracy'], cv=cv,
      9                              verbose=1, return_train_score=True,
---> 10                              n_jobs=1)
     11 
     12     return results

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_validation.pyc in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score)
    204             fit_params, return_train_score=return_train_score,
    205             return_times=True)
--> 206         for train, test in cv.split(X, y, groups))
    207 
    208     if return_train_score:

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/model_selection/_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    456             estimator.fit(X_train, **fit_params)
    457         else:
--> 458             estimator.fit(X_train, y_train, **fit_params)
    459 
    460     except Exception as e:

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
    246             This estimator
    247         """
--> 248         Xt, fit_params = self._fit(X, y, **fit_params)
    249         if self._final_estimator is not None:
    250             self._final_estimator.fit(Xt, y, **fit_params)

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/pipeline.pyc in _fit(self, X, y, **fit_params)
    211                 Xt, fitted_transformer = fit_transform_one_cached(
    212                     cloned_transformer, None, Xt, y,
--> 213                     **fit_params_steps[name])
    214                 # Replace the transformer of the step with the fitted
    215                 # transformer. This is necessary when loading the transformer

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/memory.pyc in __call__(self, *args, **kwargs)
    360 
    361     def __call__(self, *args, **kwargs):
--> 362         return self.func(*args, **kwargs)
    363 
    364     def call_and_shelve(self, *args, **kwargs):

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/pipeline.pyc in _fit_transform_one(transformer, weight, X, y, **fit_params)
    579                        **fit_params):
    580     if hasattr(transformer, 'fit_transform'):
--> 581         res = transformer.fit_transform(X, y, **fit_params)
    582     else:
    583         res = transformer.fit(X, y, **fit_params).transform(X)

/home/salma/anaconda2/lib/python2.7/site-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit_params)
    518         else:
    519             # fit method of arity 2 (supervised transformation)
--> 520             return self.fit(X, y, **fit_params).transform(X)
    521 
    522 

<ipython-input-18-6d322cc43f0d> in transform(self, X_df)
     10         # get only the anatomical information
     11         X = X_df[[col for col in X_df.columns if col.startswith('anatomy')]]
---> 12         return X.drop(columns='anatomy_select')

TypeError: drop() got an unexpected keyword argument 'columns'

Private dataset sites

Do the private dataset have new sites? Or all of them have at least one sample in the available data?

AttributeError: type object 'IOLoop' has no attribute 'initialized'

Hi,

I used conda env create -f environment.yml to install packages, then source activate autism. ramp_test_submission works great but ramp_test_notebook gives me the following error:

> ----------------------------
> Testing if the notebook can be converted to html
> Testing if the notebook can be executed
> Traceback (most recent call last):
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/runpy.py", line 193, in _run_module_as_main
>     "__main__", mod_spec)
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/runpy.py", line 85, in _run_code
>     exec(code, run_globals)
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
>     app.launch_new_instance()
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
>     app.initialize(argv)
>   File "<decorator-gen-123>", line 2, in initialize
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
>     return method(app, *args, **kwargs)
>   File "/home/luis/anaconda3/envs/autism/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 452, in initialize
>     zmq_ioloop.install()
>   File "/home/luis/.local/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 210, in install
>     assert (not ioloop.IOLoop.initialized()) or \
> AttributeError: type object 'IOLoop' has no attribute 'initialized'

Does the test server have SPM, FSL statistical packages?

Will the above statistical packages be available on the test server or statistical testing tools have to be installed locally for (structural) anatomical feature analysis/extraction ?

ramp-kits / autism Goto Github PK

autism's People

Contributors

Stargazers

Watchers

Forkers

autism's Issues

Recommend Projects

Recommend Topics

Recommend Org