Git Product home page Git Product logo

fhirizer's Introduction

fhirizer

Status License: MIT

mapping

Project overview:

Transforms and harmonizes data from Genomic Data Commons (GDC), Cellosaurus cell-lines, and International Cancer Genome Consortium (ICGC) repositories into ๐Ÿ”ฅ FHIR (Fast Healthcare Interoperability Resources) format.

  • GDC study simplified FHIR graph

mapping

Usage

Installation

  • from source
git clone repo
cd fhirizer
# create virtual env ex. 
# NOTE: package_data folders must be in python path in virtual envs 
python -m venv venv-fhirizer
source venv-fhirizer/bin/activate
pip install . 
  • Dockerfile
(sudo) docker build -t <tag-name>:latest .
(sudo) docker run -it  --mount type=bind,source=<path-to-input-ndjson>,target=/opt/data --rm <tag-name>:latest
  • Singularity
singularity build fhirizer.sif docker://quay.io/ohsu-comp-bio/fhirizer
singularity shell fhirizer.sif

Convert and Generate

Detailed step-by-step guide on FHIRizing data for a project's study can be found in the project's directory overview.

  • GDC

    • convert GDC schema keys to fhir mapping

    • generate fhir object models ndjson files in directory

      Example run for patient - replace path's to ndjson files or directories.

    fhirizer generate --name case --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/cases_key.ndjson
    
    • to generate document reference for the patients
    fhirizer generate --name file --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/files_key.ndjson
    
  • Cellosaurus

     fhirizer generate --name cellosaurus --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/<cellosaurus-celllines-ndjson>
    
  • ICGC

    • NOTE: Active site and data dictionary updates from ICGC DCC to ICGC ARGO is in progress.
     fhirizer generate --name icgc --icgc <ICGC_project_name> --has_files
    

Constructing GDC maps cli cmds

initialize initial structure of project, case, or file to add Maps

fhirizer project_init 
# to update Mappings run associated labels script ex ./labels/project.py 

fhirizer case_init 
fhirizer file_init 

Testing

pytest -cov 

fhirizer structure:

Data directories included in package data:

  • resources: data resources generated or used in mappings
  • mapping: json data maps produced by fhirizer pydantic schema maps

fhirizer/
|-- fhirizer/
|   |-- __init__.py
|   |-- labels/
|   |   |-- __init__.py
|   |   |-- files.py
|   |   |-- case.py
|   |   โ””โ”€โ”€ project.py
|   |   
|   |-- schema.py
|   |-- entity2fhir.py
|   |-- mapping.py
|   |-- utils.py
|   โ””โ”€โ”€ cli.py
|   
|-- mapping/
|   |-- project.json
|   |-- case.json
|   โ””โ”€โ”€ file.json
|  
|-- resources/
|   |-- gdc_resources/
|   |   |-- content_annotations/
|   |   |-- data_dictionary/
|   |   โ””โ”€โ”€ fields/
|   โ””โ”€โ”€ fhir_resources/
| 
|-- tests/
|   |-- __init__.py
|   |-- unit/
|   |   |-- __init__.py
|   |   โ””โ”€โ”€ test_mapping.py
|   |-- integration/
|   |   |-- __init__.py
|   |   |-- test_generate.py
|   |   โ””โ”€โ”€ test_convert.py
|   โ””โ”€โ”€ fixtures/
| 
|-- projects/
|   โ””โ”€โ”€ GDC/ 
|   |     โ””โ”€โ”€ TCGA-STUDY/
|   |           |-- cases.ndjson
|   |           |-- filess.ndjson
|   |           โ””โ”€โ”€ META/
|   โ””โ”€โ”€ ICGC/
|         โ””โ”€โ”€ ICGC-STUDY/ 
|                |-- data/
|                โ””โ”€โ”€ META/
|--README.md
โ””โ”€โ”€ setup.py

fhirizer's People

Contributors

teslajoy avatar

Watchers

Brian avatar Kyle Ellrott avatar  avatar

fhirizer's Issues

resolve many to one and one to many mappings

There are currently two scenarios for this:

  • high level:
    example of this is samples_ids -> Specimen.identifier mappings that are currently captured via the type of mapping ex. array -> string.

  • content level:
    aside from content_annotations, there are enum(s) in GDC where there can be 1...* enum(s) associated with the mapping key. In this case the enums key in source is currently capturing a list of 1...* enum lists.

FHIR functions

1 - fetch schema from FHIR ex. Coding.schema()['properties']['code']['description'] - need function to clean schema() regex
2- add function to extract element_required for all destination keys required and add keys to "destination_key_required": []
3 - pull out FHIR hierarchy from enum_reference_types ex.
{'title': 'Part of larger study', 'description': 'A larger research study of which this particular study is a component or step.', 'element_property': True, 'enum_reference_types': ['ResearchStudy'], 'type': 'array', 'items': {'type': 'Reference'}}
4 - map node relation based on uppercase classes ex. Patient.Extension.valueInteger extension -> patient etc.

update body-site code for http://snomed.info/sct

  • Other and unspecified major salivary glands: 126787005
  • Retroperitoneum and peritoneum: 236019001
  • Peripheral nerves and autonomic nervous system: 188321006
  • Bones, joints and articular cartilage of limbs : 126655004
  • Nasal cavity and middle ear: 187828007
  • Other and ill-defined sites: 74964007
  • Unknown/Not reported: 54690008

add testing

high-level existence and quantity testing:

  1. does data exists.
  2. if it does, how many keys in source (GDC) mapped successfully.
  3. how many destination keys were successfully mapped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.