Git Product home page Git Product logo

ohdsi-etl-prias's Introduction

PRIAS to OMOP v6.x

ETL scripts to convert the Prostate cancer Research International: Active Surveillance (PRIAS) datasets to a PIONEER modification of OMOP CDM v6.x.

Mapping Document

The mapping document can be found here.

Dependencies

  • Postgres (9.5+)
    • public schema
  • Python 3
  • OMOP vocabularies:
    • OMOP generated (Gender, Race, Type Concepts)
    • SNOMED
    • LIONC
    • NAACCR (note: NAACCR is not included in the Athena download by default, and should be selected manually)
    • UCUM
    • PIONEER custom vocabulary

Run

main.py -s <path_to_source_data> -h <host> -p <port> -d <target_db> -u <user_name> -w <password>

A log of the run will be written to logs/.log

Docker

cd ohdsi-etl-prias

docker-compose up -d will start the following containers:

  • postgresql
  • broadsea-webtools
  • broadsea-methods-library
  • jupyter
  • etl

To view the progress of the database setup and etl, view the logs:

  • To check the postgres database: docker-compose logs -f postgresql
  • To check the ETL: docker-compose logs -f etl

To run ETL again: docker-compose up -d --build etl and check the etl logs.

Target

The resulting OMOP CDM is written to the public schema.

Updating Docker image

To download the newest image run the following commands: docker-compose down -v followed by docker-compose up -d

Note that this will remove and recreate all the OMOP vocabularies and will take a while to complete. Check the postgres with docker-compose logs -f postgresql.

Another command option to download the newest image when down -v is not working properly: docker pull thehyve/ohdsi_postgresql

ohdsi-etl-prias's People

Contributors

anne0507 avatar k1hyve avatar maximmoinat avatar sofiamp avatar spayralbe avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lenamax2355

ohdsi-etl-prias's Issues

Update unit tests

Part of the current RiaH unit tests are not in sync with transformation logic.

Add database setup in ETL

Different from the erspc approach, Docker will not do the setup.
The ETL script should create target schema, vocabulary loading, add source daimon and run Achilles.

Using NAACCR

I see we use some mappings to NAACCR concepts, e.g. 35917476. This vocabulary is not by default included in a download from Athena (needs to be selected manually).

Is the use of NAACCR vocabularies documented somewhere? (both here and in the ohdsi-omop-pioneer repo).

Episode event domain lookup; better error if concept not found

The error below is thrown by wrapper.get_event_field_concept_id(concept_id) when the given concept_id cannot be found in the vocabulary. This actually indicates that the vocabulary has not been loaded correctly, but the error message is very cryptical AttributeError: 'NoneType' object has no attribute 'split'.

We should gracefully handle this exception by showing a simple warning message.

Please note that normally this would give an error in an earlier stage as the concept_id constraint do not hold for the earlier inserted (observation) record. I got below error when I removed the vocabulary constraints when running at the client who could not load the vocabs.

2020-05-12 10:45:50,386 - ERROR - No row was found for one()
2020-05-12 10:45:50,386 - INFO - Performing rollback
2020-05-12 10:45:50,389 - INFO - Rollback completed
2020-05-12 10:51:13,976 - ERROR - #!#! ERROR: Transformation 'basedata_to_episode_event' failed:
2020-05-12 10:51:13,981 - ERROR - Traceback (most recent call last):
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/model/EtlWrapper.py", line 217, in execute_transformation
    records_to_insert = statement(self)
AttributeError: 'NoneType' object has no attribute 'split'

2020-05-12 10:51:13,982 - ERROR - 'NoneType' object has no attribute 'split'
2020-05-12 10:51:13,982 - ERROR - ##### START FULL TRACEBACK #####
2020-05-12 10:51:13,984 - ERROR - 
Traceback (most recent call last):
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/model/EtlWrapper.py", line 217, in execute_transformation
    records_to_insert = statement(self)
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/transformation/basedata_to_episode_event.py", line 113, in basedata_to_episode_event
    event_field_concept_id = wrapper.get_event_field_concept_id(target.concept_id)
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/wrapper.py", line 321, in get_event_field_concept_id
    domain_prefix = domain_id.split('_')[0]
AttributeError: 'NoneType' object has no attribute 'split'
#Traceback (most recent call last):
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/model/EtlWrapper.py", line 217, in execute_transformation
    records_to_insert = statement(self)
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/transformation/basedata_to_episode_event.py", line 113, in basedata_to_episode_event
    event_field_concept_id = wrapper.get_event_field_concept_id(target.concept_id)
  File "/Users/Maxim/Develop/OHDSI/ohdsi-etl-prias/src/main/python/wrapper.py", line 321, in get_event_field_concept_id
    domain_prefix = domain_id.split('_')[0]
AttributeError: 'NoneType' object has no attribute 'split'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.