nimh-dsst / dataset-phenotypes Goto Github PK

View Code? Open in Web Editor NEW

7.0 4.0 2.0 7.82 MB

Preparatory scripts for BIDS tabular phenotypic data in large neuroimaging datasets.

Home Page: https://bids-phenotype.readthedocs.io

License: Creative Commons Zero v1.0 Universal

Python 100.00%

bids data-dictionary dataset datasets phenotype abcd abide hbn hcp phenotypic

dataset-phenotypes's Introduction

Big neuroimaging dataset BIDS tabular phenotype tools

Preparatory scripts to output BIDS phenotypic data dictionaries and transform phenotypic data to BIDS TSVs for common neuroimaging datasets.

These tools follow the guidelines of BIDS Extension Proposal 36 (BEP036). The hope is for researchers to embrace this BIDS Phenotypic Data standard by seeing examples in their favorite datasets.

Use GitHub Issues with this repository

The GitHub Issues page found here should be used to communicate all of the following:

Asking questions of any size or difficulty
Providing feedback on any file's inaccuracies or considering changes
Suggesting or requesting features
Making new dataset requests (describe how to download the phenotypic data dictionaries and phenotypic data and it will be added to the queue
Proposing ideas to improve the repository
Reporting bugs in scripts or any other repository files
Corrections to README.md files, dictionary entries, or whole assessments
Style suggestions or suggested common conventions for scripts, dictionaries, or README.md files
Anything else that isn't a code contribution or a pull request

How to Pull Request

Pull Requests (PRs) are also always welcome here to initiate a review of some fix or addition or change to the repository. Please fork, make your changes, and then initiate a PR from your fork back to this repository's main branch.

Contributors ✨

Contributors are always welcome to add GitHub Issues (see section above). Thanks especially go to these wonderful people (emoji key):

_{Eric Earl} 🐛 💻 🖋 🔣 🎨 🤔 🚧 📆 👀	_{Jessica Dafflon} 🐛 💻 🖋 🤔 📆 👀	_{Josh Faskowitz} 💻 🤔	_{RobertoFelipeSG} 💻 🎨 🤔
_{Arshitha Basavaraj} 🐛 🖋 🤔	_{Dustin Moraczewski} 🔣	_{Adam Thomas} 🖋 🤔 📆	_{Francisco Pereira} 🤔 📆

This project follows the all-contributors specification. Contributions of any kind are welcome!

dataset-phenotypes's People

Contributors

Stargazers

Watchers

Forkers

faskowit pedroferreiradacosta

dataset-phenotypes's Issues

Clean up the top-level README.md to summarize the results of the brainhack

I want to make a cleaner README with the GitHub Issues parts and contributions part very forward-facing.

Add brainhack DC badge to README.md top-level, NKI, HBN, and GUI

See title.

ABCD data files without corresponding data dictionaries

I found 14 tabular data files that don't have a corresponding data dictionaries. See attached for the list. Requires further investigation and testing.
data_files_without_dictionaries.txt

Refactor NKI script into Python

It is currently written in MATLAB, so it needs refactored into Python to satisfy the style in this repository.

Suggestion of columns to have on the GUI

One of the ideas is to have a GUI that follows this layout: https://docs.newrelic.com/attribute-dictionary/
In the linked example, there are three columns on the main table: "Attribute name", "Definition", and "Data types". We could have "Name" (the question), "Description" (Description of the question), and "Dependencies" (If there is any dependency between this question and any other on the questionnaire.

Make sure all data dictionaries and data produced by this repository remain BIDS-valid and pass the BIDS validator

We may need to inject a "false" dataset_description.json and some "blank" data for the validator to run, but we should still be able to validate the phenotypic data.

Add BIDS TSV conversion for NKI data

See issue #12 for more details.

ABCD data dictionaries that don't have a corresponding tabular data file

A total of 120 dictionaries don't have a corresponding data file in the ABCD Release 4.0 tabular data package (see attached file). Further investigation required as to why. Maybe someone else can download the data package and run the data_convert.py script before we look into it further?

data_dictionaries_without_data_files.txt

Add HBN data dictionary generation script

Inputs

The HBN README covers how to download the HBN Data Dictionaries/ folder into this repository's HBN/ subfolder.

Task

The real GitHub Issue here is to create a dictionary.py script using a copy of the ABCD/dictionary.py as an example starter. ABCD should be used as the starter because it is the only other study right now that works with multiple data dictionaries sitting inside one folder.

Implementation

I suggest using the pandas read_excel() function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
It looks like the first row of each Excel file is always an unnecessary title (for BIDS data dictionary JSONs anyway). But this assumption needs to be confirmed before trying to read in all the data dictionaries programmatically.
It looks like every Excel file always only has the content on Sheet 1, but that should be checked as well.
The Question looks like the BIDS Description, the Variable Name looks like the ShortName, the Variable Type and Values look unnecessary, but could be included if desired, and the Value Labels will have to be parsed into BIDS Levels usually.
If you're unsure about what should go where, refer to the BIDS Tabular Files reference here: https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#tabular-files

Draft a phenotypic data browsing, filtering, and selection GUI

Concept

A browsable, filterable, searchable GUI for selecting fields of interest from any data dictionary (or possibly all data dictionaries).

Inputs

The GUI would take as input this repository. It would read from all the available data dictionaries within <STUDY>/phenotype/ subfolders, as output by dictionary.py scripts.

Drafts

Please insert draft concepts/ideas into this Google Slides deck here:

https://docs.google.com/presentation/d/1OWgNupUz2FyKCEzoAjjDtgunaiWHmjT1ArG1asgndig/edit?usp=sharing

Implementation

I recommend using PySimpleGUI as I know it works well and is simple and stable and built in Python. Concepts can of course be drawn and pictured, or drafted, or even just a messy draft on a napkin.

Outputs

As output, you should receive a small data dictionary subset of the selected fields of interest. You should also be able to "auto-save" progress as you click so if the window closes you don't lose a bunch of reading and selection work.

The dream

In the far-flung future, this would also integrate with BIDS TSVs to provide a sneak preview to the user what the distribution of data points are. Like if it were a Yes/No question, then how many Yes and how many No are in the dataset?

Questions
Feedback
Features
Ideas
Bugs in scripts
Corrections in dictionaries
Corrections to READMEs
Style suggestions or common conventions for scripts, dictionaries, or READMEs
Anything else that isn't a code contribution or pull request

Add top-level unified CLI for all scripts

something like:

python bids-phenotype.py $STUDY $INPUT_DIR $OUTPUT_DIR --option

Where --option could be one of --dict, --data, or --all.

Add all dictionary (and data) LICENSE or USAGE_AGREEMENT info to ReadTheDocs pages

See title.

nimh-dsst / dataset-phenotypes Goto Github PK

dataset-phenotypes's Introduction

Big neuroimaging dataset BIDS tabular phenotype tools

Use GitHub Issues with this repository

How to Pull Request

Contributors ✨

dataset-phenotypes's People

Contributors

Stargazers

Watchers

Forkers

dataset-phenotypes's Issues

Inputs

Task

Implementation

Concept

Inputs

Drafts

Implementation

Outputs

The dream

Recommend Projects

Recommend Topics

Recommend Org