Git Product home page Git Product logo

dataset-phenotypes's Introduction

DOI All Contributors Brainhack DC Badge

Big neuroimaging dataset BIDS tabular phenotype tools

Preparatory scripts to output BIDS phenotypic data dictionaries and transform phenotypic data to BIDS TSVs for common neuroimaging datasets.

These tools follow the guidelines of BIDS Extension Proposal 36 (BEP036). The hope is for researchers to embrace this BIDS Phenotypic Data standard by seeing examples in their favorite datasets.

Use GitHub Issues with this repository

The GitHub Issues page found here should be used to communicate all of the following:

  1. Asking questions of any size or difficulty
  2. Providing feedback on any file's inaccuracies or considering changes
  3. Suggesting or requesting features
  4. Making new dataset requests (describe how to download the phenotypic data dictionaries and phenotypic data and it will be added to the queue
  5. Proposing ideas to improve the repository
  6. Reporting bugs in scripts or any other repository files
  7. Corrections to README.md files, dictionary entries, or whole assessments
  8. Style suggestions or suggested common conventions for scripts, dictionaries, or README.md files
  9. Anything else that isn't a code contribution or a pull request

How to Pull Request

Pull Requests (PRs) are also always welcome here to initiate a review of some fix or addition or change to the repository. Please fork, make your changes, and then initiate a PR from your fork back to this repository's main branch.

Contributors โœจ

Contributors are always welcome to add GitHub Issues (see section above). Thanks especially go to these wonderful people (emoji key):

Eric Earl
Eric Earl

๐Ÿ› ๐Ÿ’ป ๐Ÿ–‹ ๐Ÿ”ฃ ๐ŸŽจ ๐Ÿค” ๐Ÿšง ๐Ÿ“† ๐Ÿ‘€
Jessica Dafflon
Jessica Dafflon

๐Ÿ› ๐Ÿ’ป ๐Ÿ–‹ ๐Ÿค” ๐Ÿ“† ๐Ÿ‘€
Josh Faskowitz
Josh Faskowitz

๐Ÿ’ป ๐Ÿค”
RobertoFelipeSG
RobertoFelipeSG

๐Ÿ’ป ๐ŸŽจ ๐Ÿค”
Arshitha Basavaraj
Arshitha Basavaraj

๐Ÿ› ๐Ÿ–‹ ๐Ÿค”
Dustin Moraczewski
Dustin Moraczewski

๐Ÿ”ฃ
Adam Thomas
Adam Thomas

๐Ÿ–‹ ๐Ÿค” ๐Ÿ“†
Francisco Pereira
Francisco Pereira

๐Ÿค” ๐Ÿ“†

This project follows the all-contributors specification. Contributions of any kind are welcome!

dataset-phenotypes's People

Contributors

allcontributors[bot] avatar arshitha avatar ericearl avatar faskowit avatar jessyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dataset-phenotypes's Issues

Suggestion of columns to have on the GUI

One of the ideas is to have a GUI that follows this layout: https://docs.newrelic.com/attribute-dictionary/
In the linked example, there are three columns on the main table: "Attribute name", "Definition", and "Data types". We could have "Name" (the question), "Description" (Description of the question), and "Dependencies" (If there is any dependency between this question and any other on the questionnaire.

Add HBN data dictionary generation script

Inputs

The HBN README covers how to download the HBN Data Dictionaries/ folder into this repository's HBN/ subfolder.

Task

The real GitHub Issue here is to create a dictionary.py script using a copy of the ABCD/dictionary.py as an example starter. ABCD should be used as the starter because it is the only other study right now that works with multiple data dictionaries sitting inside one folder.

Implementation

  1. I suggest using the pandas read_excel() function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
  2. It looks like the first row of each Excel file is always an unnecessary title (for BIDS data dictionary JSONs anyway). But this assumption needs to be confirmed before trying to read in all the data dictionaries programmatically.
  3. It looks like every Excel file always only has the content on Sheet 1, but that should be checked as well.
  4. The Question looks like the BIDS Description, the Variable Name looks like the ShortName, the Variable Type and Values look unnecessary, but could be included if desired, and the Value Labels will have to be parsed into BIDS Levels usually.
  5. If you're unsure about what should go where, refer to the BIDS Tabular Files reference here: https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#tabular-files

Draft a phenotypic data browsing, filtering, and selection GUI

Concept

A browsable, filterable, searchable GUI for selecting fields of interest from any data dictionary (or possibly all data dictionaries).

Inputs

The GUI would take as input this repository. It would read from all the available data dictionaries within <STUDY>/phenotype/ subfolders, as output by dictionary.py scripts.

Drafts

Please insert draft concepts/ideas into this Google Slides deck here:

https://docs.google.com/presentation/d/1OWgNupUz2FyKCEzoAjjDtgunaiWHmjT1ArG1asgndig/edit?usp=sharing

Implementation

I recommend using PySimpleGUI as I know it works well and is simple and stable and built in Python. Concepts can of course be drawn and pictured, or drafted, or even just a messy draft on a napkin.

Outputs

As output, you should receive a small data dictionary subset of the selected fields of interest. You should also be able to "auto-save" progress as you click so if the window closes you don't lose a bunch of reading and selection work.

The dream

In the far-flung future, this would also integrate with BIDS TSVs to provide a sneak preview to the user what the distribution of data points are. Like if it were a Yes/No question, then how many Yes and how many No are in the dataset?

Add BIDS participant_id to all TSVs and JSONs

The rule should just be: "Whatever the study's unique subject ID is for each subject, then THAT turned into an alphanumeric-only becomes the alphanumeric after sub- in the participant_id column."

Do deeper validation of expected files per study

Something more than just the presence or absence of files. Look for them by exact name or regular expression pattern and check them for expected inputs (a header with commas for CSVs, etc.).

Include language in the README.md about ways of using GitHub Issues

For all of the following:

  1. Questions
  2. Feedback
  3. Features
  4. Ideas
  5. Bugs in scripts
  6. Corrections in dictionaries
  7. Corrections to READMEs
  8. Style suggestions or common conventions for scripts, dictionaries, or READMEs
  9. Anything else that isn't a code contribution or pull request

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.