mit-lcp / mimic-omop Goto Github PK

Mapping the MIMIC-III database to the OMOP schema

License: MIT License

PLpgSQL 83.89% Shell 8.44% R 4.57% Makefile 2.20% SQLPL 0.90%

mimic-omop's Introduction

MIMIC-OMOP

This repository contains an Extract-Transform-Load (ETL) process for mapping the MIMIC-III database to the OMOP Common Data Model. This process involves both transforming the structure of the database (i.e. the relational schema), but also standardizing the many concepts in the MIMIC-III database to a standard vocabulary (primarily the Athena Vocabulary, which you can explore here).

DOCUMENTATION

Resources

"WHERE IS ..."

Below in the README, we provide two sections. The first section, OMOP TABLES LOADED, lists the OMOP tables which have been populated from MIMIC-III. You can use this section to figure out what data generated each OMOP TABLE. For example, we can see that the OMOP CDM table person was populated using data from the patients and admissions table in MIMIC-III.

The second section, MIMIC TABLES EQUIVALENCE, lists all the tables in MIMIC-III, and shows where the data now exists in the OMOP CDM. For example, we can see that the MIMIC-III table patients was used to populate the OMOP CDM tables person and death.

OMOP TABLES LOADED

MIMIC TABLES EQUIVALENCE

patients
- PERSON
- DEATH
admissions
- PERSON
- DEATH
- VISIT_OCCURRENCE
- VISIT_DETAIL
- OBSERVATION
- CONDITION_OCCURRENCE
transfers
- VISIT_DETAIL
icustays
- The ICUSTAYS table is fully derived from the transfers table
service
- VISIT_DETAIL
prescriptions
- DRUG_EXPOSURE
inputevents_cv
- DRUG_EXPOSURE
inputevents_mv
- DRUG_EXPOSURE
- MEASUREMENT
outputevents
- MEASUREMENT
labevents
- SPECIMEN
- MEASUREMENT
microbiologyevents
- SPECIMEN
- MEASUREMENT
chartevents
- SPECIMEN
- MEASUREMENT
- OBSERVATION
drgcodes
- OBSERVATION
datetimeevents
- OBSERVATION
procedure_icd
- PROCEDURE_OCCURRENCE
procedureevents_mv
- PROCEDURE_OCCURRENCE
cptevents
- PROCEDURE_OCCURRENCE
diagnosis_icd
- CONDITION_OCCURRENCE
notevents
- NOTE
- NOTE_NLP
caregivers
- PROVIDER
callout
- COHORT_ATTRIBUTES

mimic-omop's People

Contributors

Stargazers

Watchers

mimic-omop's Issues

Windows instead of Unix

I finally gave up replicating the PostgreSQL steps with SAS and got a PostgreSQL installation, importing MIMIC, but I still have to use Windows instead of Unix. So how would I implement steps under "0. Open up a terminal and define parameters"?

Reason for transfer event

In the TRANSFERS table (which goes to VISIT_DETAIL) there is a care_site which tells you where the patient is moved to. However, there is no field to store the reason why a transfer was made (e.g. a bed transfer, a ward transfer, and so on).

In discussion we agree that this should go in a derived table because (1) it will make analysis easier and (2) it is inferred information.

Updating the federated database diagram with OMOP tools

@parisni thanks for your work on the federated database diagram at: https://github.com/MIT-LCP/mimic-omop/blob/master/extras/mimi-omop-federation.png. It looks great.

I would like to make some changes, including adding the OMOP tools that you mentioned:

USAGI= mapping tool https://github.com/OHDSI/Usagi
ACHILLES= quality tool https://github.com/OHDSI/Achilles
ATHENA= standardized terminologies https://www.ohdsi.org/analytic-tools/athena-standardized-vocabularies/

Is an editable version of the file available?

mimic microbiologyevents -> omop specimen / fact_relationship / measurements

OMOP

[CLINICAL] The SPECIMEN domain contains the records identifying biological samples from a person

[CLINICAL] The FACT_RELATIONSHIP table contains records about the relationships between facts stored as records in any table of the CDM. Relationships can be defined between facts from the same domain (table), or different domains. Examples of Fact Relationships include: Person relationships (parent-child), care site relationships (hierarchical organizational structure of facilities within a health system), indication relationship (between drug exposures and associated conditions), usage relationships (of devices during the course of an associated procedure), or facts derived from one another (measurements derived from an associated specimen).

Then I d'say that microbiologyevents should be split as three tables (inspired from that discussion http://forums.ohdsi.org/t/mapping-microbiology-susceptibility-into-omop-cdm4-observations/167/3 ):

specimen: stores the specimen informations
measurements: store the resistances as categorical values, or the other microbiology things in various way
fact_relationship: links measurements to specimens

What about you @tompollard & @alistairewj ?

Use this ETL as a way to provide MIMIC in OMOP directly on the Physionet website

This ETL allows local user to download and convert-at-many sites

How about convert-once and allow sites to download the converted dataset.

This would save MIMIC users some effort and make MIMIC more used. (and published about; getting credit).

Warning, Error Fields

OMOP looks not having such fields. That would be great to propose a method to describe those error/warnings, with a set of codes, and rules.

Then a quality tool might explore the data and reason on it.

I may be wrong, and OMOP does have equivalent. At this step, not found.

ERROR: duplicate key value violates unique constraint "xpk_drug_exposure"

Has anyone run into this?

At the step psql "$MIMIC" -f "etl/etl.sql"

....
Time: 734.509 ms
INSERT 0 2082294
Time: 113601.293 ms
INSERT 0 1063525
Time: 69457.080 ms
INSERT 0 716595
Time: 26671.288 ms
psql:etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql:112: ERROR:  duplicate key value violates unique constraint "xpk_drug_exposure"
DETAIL:  Key (drug_exposure_id)=(394992270) already exists.
Time: 133169.830 ms

Duplicate concepts for prescriptions when NDC is null or zero

The ETL is currently duplicating rows because there are duplicate rows in one of the concept mappings:

select
  c3.*, c.numobs
from gcpt_prescriptions_ndcisnullzero_to_concept c3
inner join 
(
  select label, count(*) as numobs from gcpt_prescriptions_ndcisnullzero_to_concept group by label having count(*)>1
) c
  on c3.label = c.label
order by c3.label, c3.concept_id;

gives:

label	concept_id	concept_name	mimic_id	numobs
Advair Diskus 100/50MCG	40170634	fluticasone / salmeterol Dry Powder Inhaler [Advair]	2001045339	2
Advair Diskus 100/50MCG	40171027	Fluticasone propionate 0.1 MG/ACTUAT / salmeterol 0.05 MG/ACTUAT [Advair]	2001045340	2
desvenlafaxine 100 mg	1593106	desvenlafaxine succinate 100 MG	2001046244	2
desvenlafaxine 100 mg	19129663	Desvenlafaxine 100 MG	2001046243	2
Meperidine HCl 100MG/2ML AMP	1102527	Meperidine	2001045607	2
Meperidine HCl 100MG/2ML AMP	40164998	Meperidine Hydrochloride 100 MG/ML	2001045608	2

The offending rows are lines 308-309, lines 575-576, and lines 1211-1212.

@aparrot89 can you recommend which concept we should keep?

Laboratory measurements

There is no specimen identifier in MIMIC-III v1.4. However, it would be beneficial to have one so we can populate the SPECIMEN table in OMOP.

Currently, the repository has:

Added a fictitious specimen in order to store data in the specimen table
Added rows in FACT_RELATIONSHIP to link lab value to specimen
Added rows in FACT_RELATIONSHIP to link specimen to lab value

We will need to add the real specimen IDs from the next update of MIMIC-III.

Data federation ; consider using a bigint sequence

Hey,

I d'guess that's would be worth nothing to use an overall bigint sequence for all mimic objects (say mimic_id).
(ie, a row_id equivalent, but identifiing uniquely each mimic concept, a unique and reproductible way)

That will be useful for many use cases:

union all on different tables
international union all : using bigint will a necessity very soon
avoid collision between mimic concepts & omop concepts (for eg by begining sequence to 10 bilion
avoid join on tables by mistake (AFAIK, you already introduce such concept by mooving primary keys in order to ovoid bad joins between tables)

Those mimic_id will be used for the ETL, and they will be used as id to populate mimic_omop
The solution looks like a huge update on the OMOP model, but mooving from int4 to int8 has already been introduced in some omop tables, and mooving to int8 is obvious, and will definitely be done one day.

Then I will build a postgreql DDL that add a mimic_id in all the mimic tables as a proof of concept.
Interested in your feedback on that - you may just way I do the job to understand exaclty the details

mimic.callout : No omop equivalent ?

Hi,

Record of when patients were ready for discharge (called out), and the actual time of their discharge (or more generally, their outcome).

We may propose to add a field in omop.visit_details, to capture that information.

BTW, is that information really relevant for research and what uses cases ?

Visit detail column names are inconsistent

The visit detail table has some columns:

admitting_source_concept_id
admitting_source_value
discharge_to_concept_id
discharge_to_source_value

This seems inconsistent for two reasons:

"admitting" and "discharge" are different tenses (perhaps it should be "admission")
admitting_source_concept_id contains the word "source" but discharge_to_concept_id does not

I think, regarding (2), that it is an error. The description is: "A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the admitting source for a visit.". So I think that the idea is: admitting_source_value is the original data, and next to it we should have admitting_concept_id for the standardized data, not admitting_source_concept_id. (Actually, I'd further argue it should be admission_concept_id). Thoughts? I could raise this at OHDSI if we agree.

gcpt_note_section_to_concept

where does the table gcpt_note_section_to_concept come from?

DRUG_EXPOSURE start/end times may be reversed

Looking at some data from the DRUG_EXPOSURE table:

drug_exposure_id	person_id	drug_concept_id	drug_exposure_start_datetime	drug_exposure_end_datetime
802093	886218	35606152	5/17/35 18:22	5/17/35 17:22
801330	886282	35605935	11/14/30 21:24	11/14/30 20:54
807394	886296	35605935	6/16/81 16:00	6/16/81 15:00

Looks like flipped the start/end times here:

mimic-omop/etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql

Lines 103 to 104 in f23fd78

 , starttime as drug_exposure_end_datetime 

 , endtime as drug_exposure_start_datetime

Extracting numbers from free text fields

I was looking through the PRESCRIPTIONS -> DRUG_EXPOSURE ETL, and came across the use of the extract_value function here:

mimic-omop/etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql

Line 40 in 68e5212

 , extract_value(dose_val_rx) as quantity --extract quantity from pure numeric when possible 

The function seems to extract numbers from text - which is great - but it makes the assumption that either commas or decimals are used as the decimal separator, e.g. it allows for 1.05 and 1,05 to both map to the same number. However this is a regional thing, and in MIMIC a comma separator is almost always used as a thousandths separator.

For MIMIC, prescriptions, we see this is bad:

select
prod_strength
, count(*) as nobs
, max(dose_val_rx) as dose_str
, extract_value(max(dose_val_rx)) as dose_numeric
from prescriptions
where dose_val_rx like '%,%'
group by prod_strength
order by count(*) desc
limit 10;

Returns:

prod_strength	nobs	dose_str	dose_numeric
25,000 unit Premix Bag	31532	25,000	25
10,000 Units / mL - 5 mL Vial	842	25,000	25
10,000 Unit Vial	588	55,000	55
50,000 Unit Capsule	324	50,000	50
5000 Units / mL- 1mL Vial	194	30,000	30
500,000 Unit Tablet	110	500,000	500
200mg/mL-4mL Vial	104	16,000	16
10,000 Unit Capsule	97	50,000	50
20,000 Unit Vial	85	60,000	60
500,000 Unit UDCUP	75	500,000	500

I think there should be one function for European ETL and one function for non-European ETL. I am going to separate the functions and have extract_value_decimal_sep and looks_like_value_decimal_sep to be used with UK/American style separators. I'll fix this for prescriptions and leave this issue open because we should check the other ETLs for this bug. grep tells us to look at:

./etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql:40:, extract_value(dose_val_rx) as quantity --extract quantity from pure numeric when possible
./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:14: , extract_value(value) as value_as_number
./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:156: , extract_value(value) as value_as_number
./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:334: , extract_value(dilution_comparison) as value_as_number

Mapping MIMIC ITEMIDs to respective concept IDs

Hello,

I have been looking through the work that has been done to map MIMIC to the OMOP model and I was wondering if a resource exists that pairs each current MIMIC d_label with its existing itemid along with the new measurement_concept_id. In other words, taking SPO2 as an example (mapped as measurement_concept_id = 3016502 in the OMOP model), is there a reference that also lists the respective itemids that these concepts are currently listed under in MIMIC?

I may be wrongly assuming that in order to create this OMOP mapping, each concept was acquired from the clinical dataset and then assigned a measurement_concept_id based on its similarity to other concepts in the dataset. But if this is the case, wasn't the itemid for each concept also obtained during this process and (possibly) stored somewhere?

Appreciate the help with this.

Brett

Fluid output conversion issue

There is some inconsistency in the conversion of some of the output measurements. I copied over code from mimic-code and adapted it the schema:

select meas.visit_detail_id
, meas.person_id
, meas.measurement_datetime
, meas.measurement_concept_id
, mc.concept_name
, meas.value_as_number
, uc.concept_name
, meas.measurement_source_concept_id
, msc.concept_name as source_concept_name
, meas.value_source_value
-- vd.care_site_id
from visit_detail vd
left join measurement meas
 on vd.visit_detail_id = meas.visit_detail_id
 and meas.measurement_datetime >= vd.visit_start_datetime
 and meas.measurement_datetime  < (vd.visit_start_datetime + interval '1' day)
left join concept mc on meas.measurement_concept_id = mc.concept_id
left join concept msc on meas.measurement_source_concept_id = msc.concept_id
left join concept uc on meas.unit_concept_id = uc.concept_id
where measurement_source_concept_id in
(
select concept_id
from concept
where concept_code in
(
  -- these are the most frequently occurring urine output observations in CareVue
  '40055', -- "Urine Out Foley"
  '43175', -- "Urine ."
  '40069', -- "Urine Out Void"
  '40094', -- "Urine Out Condom Cath"
  '40715', -- "Urine Out Suprapubic"
  '40473', -- "Urine Out IleoConduit"
  '40085', -- "Urine Out Incontinent"
  '40057', -- "Urine Out Rt Nephrostomy"
  '40056', -- "Urine Out Lt Nephrostomy"
  '40405', -- "Urine Out Other"
  '40428', -- "Urine Out Straight Cath"
  '40086',--	Urine Out Incontinent
  '40096', -- "Urine Out Ureteral Stent #1"
  '40651', -- "Urine Out Ureteral Stent #2"

  -- these are the most frequently occurring urine output observations in MetaVision
  '226559', -- "Foley"
  '226560', -- "Void"
  '226561', -- "Condom Cath"
  '226584', -- "Ileoconduit"
  '226563', -- "Suprapubic"
  '226564', -- "R Nephrostomy"
  '226565', -- "L Nephrostomy"
  '226567', --	Straight Cath
  '226557', -- R Ureteral Stent
  '226558', -- L Ureteral Stent
  '227488', -- GU Irrigant Volume In
  '227489'  -- GU Irrigant/Urine Volume Out
)
)
and meas.person_id = 62101896
order by meas.measurement_datetime

I get the following:

visit_detail_id	person_id	measurement_datetime	measurement_concept_id	concept_name	value_as_number	concept_name	measurement_source_concept_id	source_concept_name	value_source_value
67094765	62101896	2174-03-21 20:02:21.000	3014315	Urine output	-3000	milliliter	2001028370	label:[GU Irrigant Volume In]dbsource:[metavision]linksto:[outputevents]unitname:[mL]param_type:[Numeric]	(null)
67094765	62101896	2174-03-21 20:02:21.000	3014315	Urine output	-2650	milliliter	2001028371	label:[GU Irrigant/Urine Volume Out]dbsource:[metavision]linksto:[outputevents]unitname:[mL]param_type:[Numeric]	(null)

However from the original MIMIC data:

select oe.subject_id, oe.icustay_id, oe.charttime
, oe.itemid
, di.label
, oe.value
, (
    -- we consider input of GU irrigant as a negative volume
    case when oe.itemid = 227488 then -1*value
    else value end
  ) as value_neg
from outputevents oe
inner join d_items di on oe.itemid = di.itemid
where oe.subject_id = 99031
and oe.itemid in
(
  -- these are the most frequently occurring urine output observations in CareVue
  40055, -- "Urine Out Foley"
  43175, -- "Urine ."
  40069, -- "Urine Out Void"
  40094, -- "Urine Out Condom Cath"
  40715, -- "Urine Out Suprapubic"
  40473, -- "Urine Out IleoConduit"
  40085, -- "Urine Out Incontinent"
  40057, -- "Urine Out Rt Nephrostomy"
  40056, -- "Urine Out Lt Nephrostomy"
  40405, -- "Urine Out Other"
  40428, -- "Urine Out Straight Cath"
  40086,--Urine Out Incontinent
  40096, -- "Urine Out Ureteral Stent #1"
  40651, -- "Urine Out Ureteral Stent #2"
  -- these are the most frequently occurring urine output observations in MetaVision
  226559, -- "Foley"
  226560, -- "Void"
  226561, -- "Condom Cath"
  226584, -- "Ileoconduit"
  226563, -- "Suprapubic"
  226564, -- "R Nephrostomy"
  226565, -- "L Nephrostomy"
  226567, --Straight Cath
  226557, -- R Ureteral Stent
  226558, -- L Ureteral Stent
  227488, -- GU Irrigant Volume In
  227489  -- GU Irrigant/Urine Volume Out
)
order by subject_id, charttime;

subject_id	icustay_id	charttime	itemid	label	value	value_neg

  99031 |     258971 | 2174-03-21 20:02:00 | 227488 | GU Irrigant Volume In        |  3000 | -3000
  99031 |     258971 | 2174-03-21 20:02:00 | 227489 | GU Irrigant/Urine Volume Out |  2650 | 2650

There are two things here:

(1) I think the ETL process tries to assign GU Irrigant Volume In as negative since this is technically a "negative" UO, but incorrectly makes both negative
(2) I am not sure if we should "fix" data in this way - this is a good example of the risk in this approach as the value is sometimes directly documented as a negative value and so we would incorrectly make those values positive. Perhaps better is a materialized view which has this logic.

service start_time after end_time

Apparently 167 services do have a bug in the omop etl since their start date is after the end time.

The french data provides both ward and service information too. I cannot imagin a better way of storing this information.

The alternative is to link the services directly to visit_occurrence. However this does not allow to store the parent_id linked to ward: some service cover multiple ward but only one parent_id is available. Then the way of knowing how ward and services are related would be based on calculus based on datetime that is a waste of time.

In the meantime, if we consider to propose an optimised version of omop, then this question will be reconsidered, and we will find a way to link efficiently the service information then.

That's why I suggest to go with a consistent focused version and implement the alternative way.

Any thought ?

Unit of measure

Just a general question, and forgive it if it sounds naive, but how do we harmonize unit of measure? For example, I was looking to map tidal volumes to SNOMED-CT, but I couldn't see how to specify the unit of measure (millilitres or ml/kg ideal body weight).

No standard for unknown admission/discharge location

We have hospital admission source of "** INFO NOT AVAILABLE **" and a discharge location of "OTHER FACILITY". For admission, the only decent match I found was 32199 - Information not available - from class "UB04 Point of Origin". The UB04 ontology seems to be for claims though, not sure what the impact of that would be.

For the discharge of other facility I used 8844 - Other Place of Service - though there was the option of 32209 - Unknown Value (but present in data) from class "UB04 Pt dis status".

Seems odd there is no standard concept for a location of "unknown", am I missing something?

concept mapping

Hello
Some questions about chart_label_to_concept.csv

first question = FiO2 : all the labels 'FiO2 Set', 'FiO2 (Analyzed)', 'HFO- FIO2', 'ecmo fio2' ... are linked to the same standard concept called 'Oxygen concentration breathed'. Do you think it's a good mapping because 'set' and 'analysed' mean different things!
second question = intracranial pressure. There are > 20 different items. The numbers of distinct values per label are quite differents. 'Intra Cranial Pressure' occurs 92306 times, 'icp' occurs 7 times.
Moreover the distribution of values are quite differents. The median of ICP Camino is 89.5, the median of 'ICP' is 10!!
Do you think we should map all to a unique standard concept?

general: missing fields in OMOP

Some fields are not provided by OMOP. Eg: admissions.insurance. I guess we should extend OMOP and propose new fields. That won't affect OHDSI tools. We could then propose those extension to OHDSI group.

We could then call those fields with the omop style: adding a local insurance_source & source_id and a insurance_id that would point to a standard concept.

That was just an idea, but we could also choose to loose those fields, or case by case

what about calculated variable (contrib)

Hey

eg: sofa, ventilation, sepsis....

There is 3 way of doing:

use the fact_relationship table, add them in observation, or measurement, and link them to the right table
add them as new columns (for eg: calc_venti), with documentation in the right table (visit_occurrence , measurement) according to their granularity
add a new table (measurement_calc) that contains the same primary key as measurement

I would say 2 is the best way, because SQL user friendly. Also, there is no problem with backward compatibility with OMOP, because existing queries won't be affected by new columns.

If agreed, then we need to choose witch variable need to be added, and where. I have a colleague that will be working on it 200% user = @aparrot89

Procedureevents_mv

Not very confortable with the mimic.procedureevents_mv table (that is not that well documented https://mimic.physionet.org/mimictables/procedureevents_mv/ )

omop.procedure_occurrence.code says:

The source code for the Procedure as it appears in the source data. This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes.

But that procedureevent_mv looks like to fit much better in "omop.observation" table with its value field and also units

Storing continuous infusions and drug administrations

There is a bit of flexibility in how drug administrations could be stored in OMOP and this issue aims to document our discussion so far and allow us to decide on a final approach.

Prescribed medications

It seems that our databases were relatively consistent in how medications are prescribed:

MIMIC have prescriptions which have a start date and an end date
HIAE (Sao Paulo) have prescriptions which have start date, end date, and have a start time/end time
APHP have prescriptions (@parisni can comment whether it's only dates)

So this works and can be placed into DRUG_EXPOSURE with drug_type_concept_id = 38000177 ("Prescription written"). There is probably a bit more we could discuss with that ETL but let's leave that for another issue.

Continuous infusions

Continuous infusions are challenging. First I will describe the three data sources that we must make a compromise for: CareVue (in MIMIC-III), MetaVision (in MIMIC-III), and HIAE (Sao Paulo).

CareVue

The amounts are stored with a single time which represents an end time, that is each row says "the patient received X amount by Y time"
The rate of the infusion is stored with a single time which represents a start time, that is each row says "the patient is now receiving at T time the compound at Z rate"

MetaVision

The rate is stored with two times: a start time and an end time. Any time the rate is changed, a new row occurs which has the new rate and the new start time and end time. Amounts are never stored, but can be calculated from the rate.

Sao Paulo

The amounts are stored with a single time which represents an end time, that is each row says "the patient received X amount by Y time" (i.e. the same as CareVue) (@lbulgarelli to confirm)
The rate of the infusion is stored at the same time as the amount, that is each row says "the patient is receiving Z rate at Y time"

Proposed solutions

There are a few options on where this data could go, and we mostly agree it's one of: DRUG_EXPOSURE, MEASUREMENT, or OBSERVATION. Here are the proposals we discussed.

Proposal 1 - Modify the OMOP data model

Amount -> DRUG_EXPOSURE (i.e. one row per documented amount delivered)
- If only endtime is available, leave starttime as null
- If both are available, put both
Rate -> DRUG_EXPOSURE (i.e. one row per documented rate delivered)
- If only starttime is available, leave endtime as null
- If both are available, put both

Advantages

this captures the data as is (i.e. no data loss)
DRUG_EXPOSURE is the intuitive location for these administrations

Disadvantages

this violates the data model - both starttime and endtime are required fields and we can't leave one blank
analysis would require special logic to deal with nullable start times and end times

Proposal 2 - Infer the missing times

If rate available with start time and end time -> DRUG_EXPOSURE
Otherwise, derive start time/end time for amount, convert to rate -> DRUG_EXPOSURE

Advantages

Conforms to the data model
Intuitive location for the data - both orders and administrations are in DRUG_EXPOSURE (which is suggested by the documentation)

Disadvantages

Errors will be introduced when inferring start time or end time
Mixing derived data with raw source data, without explicitly marking it as derived, is undesirable

Proposal 3 - If only one time, put it into MEASUREMENT

If rate available with start time and end time -> DRUG_EXPOSURE
Rates with start time -> MEASUREMENT
Amounts with end time -> MEASUREMENT

Advantages

Conforms to the data model
If only one time is available, one time is used

Disadvantages

Splits drug administration in two tables, depending on the hospital, making queries not generalize
Unintuitive location for the data - the description of MEASUREMENT in the documentation doesn't seem to fit for an administration (could put the data in OBSERVATION to avoid this)

Other proposals

If people have other proposals, I can edit the post and put them here.

Feed observation_period

This table has been described as mandatory.

This should be easy to load it like visit_occurrence table.

icustays

Hey

Well icustay_id now.

AFAIK, icustay_id is a pure myth. (from doc "is a generated identifier..."). Apparently the method you generate them from transfers table looks like "if readmission within 24h then keep the icustay_id else create a new icustay_id". Then for facts that are well timestamped (such chartevents), you are able to link each to an icustay and for those that contain approxymativ timestamps (labs) you don't have any icustay_id.

Is this correct or can you clarify a little ?

Thanks a lot,

Ethnicity concerns

Hey

As noticed by Paul, Ethnicity does not have the same granularity in mimic and omop.
mimic stores ethnicity for each admission. Then sometime people do have multiple different ethnicity values. What is the real ethnicity value to store in the omop.person then ?

Paul made a decision based on race priority (not sure to understand what the priority is based on). For now I choosed to pick the last one.
We can still store each values in the observation table.

ETL status after French hackathon

what is the state of the ELT now when the Paris event is over?

Mimic OMOP Common Data Model

Hello,

I am an undergraduate student at Texas A&M University, and I have gotten access to the MIMIC III database and have set up the database with postgres. We are working on a senior design project that will involve this database, however we must convert this database into a common data model. The OMOP data model seems ideal as there is already work on it by you guys so we are hoping to get permission to use this github.

I have seen the scripts that are on this github, but I have no clue as to what needs to be done first to get the data model converted to the OMOP model. I was hoping for further documentation on how all of this works, or for one of the contributors to help point me in the right direction. Otherwise, I would have to analyze this github, which will take some time for me as I am unfamiliar with a few things like etl, and R.

Thanks!

Split care_site.csv in two?

Seems like the care_site.csv contains a mapping for services and units, though I think it's cleaner to have these two mappings in separate files. Thoughts?

Clarification on the "standard concepts from Athena"

The doc says:

The standard concepts from Athena have been downloaded and are available somewhere (including running the extra script to download CPT code definitions)

what is meant by "standard"? the pre-selected ones on the webapp? or is there a list somewhere?

omop death table

hey

From MIT-LCP/mimic-code#190 and from the query above:

WITH
"dead_adm" AS (SELECT distinct on (hadm_id) hadm_id, deathtime, dischtime, subject_id from mimiciii.admissions ORDER BY hadm_id, dischtime DESC),
"tmp" as ( SELECT subject_id , dod_hosp , deathtime , dischtime FROM mimiciii.patients LEFT JOIN dead_adm USING (subject_id) WHERE dod_hosp IS NOT NULL AND (dod_hosp::date is distinct from deathtime::date OR dead_adm.subject_id IS NULL) ORDER BY deathtime),	 
"total" as (SELECT *, greatest(dod_hosp,dischtime) - least(dod_hosp,dischtime)::date between '0 day'::interval AND '2 day'::interval as is_dod_patient_ok, greatest(deathtime,dischtime) - least(deathtime,dischtime)::date between '0 day'::interval and '2 day'::interval as is_dod_admissions_ok FROM tmp)
SELECT * FROM total where is_dod_patient_ok IS FALSE OR is_dod_admissions_ok IS FALSE;

There is apparently some odd aspects with dod with both admissions / patients. The proposed solution is to consider first admissions as the source of hospitalisation death, and patients dod_ssn for the foreign death source of information.
For the admissions deathtime that are odd, the dischtime is taken in consideration if patient.dod_hosp is the same (this is the case for the 11 odd admissions)

Implement care_site fact_relationship

As discussed, each physical visit_detail/care_site should act as this example:

care_site

id	type
1	bed
2	ward
3	hospital

fact_relationship

fact_id_1	fact_id_2	relationship_id
1	1	Care Site is part of Care Site
1	2	Care Site is part of Care Site
1	3	Care Site is part of Care Site
2	2	Care Site is part of Care Site
2	3	Care Site is part of Care Site
3	3	Care Site is part of Care Site

trick

adding 1-1 or 2-2 auto centered fact_relationship allows to get in one query multiple level of detailled data.

with
  visit_detail (care_site_id, visit_idd) as (VALUES (1,'visit 1'),(2,'visit 2'),(3,'visit 3'))
, care_site (care_site_id, type_id) as (VALUES (1,'bed icu'),(2,'icu'),(3,'nicu'))
, fact_relationship (fact_1,fact_2) as (values
 (1,1) -- bed linked to itself
,(1,2) -- bed linked to ward
,(2,2) -- ward linked to itself
,(3,3) -- ward linked to itself
)
select visit_detail.*
from visit_detail
join fact_relationship on fact_1 = care_site_id
join care_site c on fact_2 = c.care_site_id and  c.type_id = 'icu';

care_site_id	visit_idd
1	visit 1
2	visit 2

Then this 2 join query should work in all cases to find any nicu data or whatever. Then fact (measurement, observation..) data can be linked to bed, or any nested level of visit_detail.
In case no "linked to itself" is loaded, then the query becomes way more complicated

CONDITION_OCCURRENCE missing `condition_status_source_concept_id`

OMOP philosophy seems to be have a concept_id, _source_value, and _source_concept_id.

Looking at CONDITION_OCCURRENCE, we have condition_status_source_value, condition_status_concept_id, but no condition_status_source_concept_id. Should we add?

Notes / section

Hi @alistairewj

I have been talking to people from comlumbia that made the note_nlp table born.
I have a good idea on how to put section in it, and I was wondering wether:

you had for each category the list of possible sections values
you had a mapping to LOINC
Or if you had some equivalent work ready to share

Thanks !

Storing microbiology data

We mostly agreed on how to store microbiology data, but there are some loose ends to tie up.

Specimen

Each microbiology culture should be associated with a specimen, and this specimen is intuitively stored in the SPECIMEN table. It's unclear how disease_concept_id should be used, so we have opted not to use it.

Culture

In order to indicate that a culture was done, we should have a row in the MEASUREMENT table that indicates that a culture was performed. This row should have a relationship to the specimen using FACT_RELATIONSHIP.

Bacterial count

If one or more bacteria types were cultured, then for each bacteria the numeric measurement (e.g. colony count) should be stored in MEASUREMENT with the value_as_number being the count and the concept_id being a standardized code for this count (I believe LOINC has a good vocabulary for this). Each bacterial growth should have a relationship to the specimen using FACT_RELATIONSHIP.

Antibiotic

Tests for antibiotic resistance are stored in MEASUREMENT. At the very least, we will have one row per antibiotic tested which indicates whether the bacteria was susceptible, resistance, or indeterminate. Then, depending on the raw data, we may have another row which indicates the dilution of that antibiotic. This will link to the bacteria using FACT_RELATIONSHIP. Then, arguably, the dilution level of the antibiotic should link to the antibiotic susceptibility using FACT_RELATIONSHIP. This last part hasn't really been discussed so I welcome feedback here.

Summary

specimens in SPECIMEN table
one MEASUREMENT row indicating that the blood was cultured and positive/negative
zero or more MEASUREMENT rows indicating a count for a specific bacteria (perhaps using a LOINC concept)
zero or more MEASUREMENT rows indicating if there is an antibiotic resistance (use fact_relationship bidirectional to the culture)
zero or more MEASUREMENT rows indicating the antibiotic dilution (use FACT_RELATIONSHIP bidirectional to the antibiotic susceptibility)

mimic_id_concept_seq

Where is the sequence 'mimic_id_concept_seq' created that is then referenced by nextval('mimic_id_concept_seq')? Thx

ERROR: duplicate key value violates unique constraint "xpk_measurement"

After trying the fix for #48, I got:

...
INSERT 0 11656610
Time: 1585618.783 ms
INSERT 0 630856
Time: 81014.760 ms
psql:etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:804: ERROR:  duplicate key value violates unique constraint "xpk_measurement"
DETAIL:  Key (measurement_id)=(301930857) already exists.
Time: 5491441.549 ms

Should `qualifier` in scripts for procedure_occurrence be `modifier`?

Got this error when rerunning the ETL

psql:omop/build-omop/postgresql/mimic-omop-alter.sql:70: ERROR:  column "qualifier_source_value" of relation "procedure_occurrence" does not exist

Looking at the docs https://github.com/OHDSI/CommonDataModel/wiki/PROCEDURE_OCCURRENCE - should this instead refer to modifier?

How can I get the start time, end time and total amount\rate?

Hi,
Which one of the proposed solutions was implemented?
For my work, I need to get the start time, end time and total amount\rate.

Nicolas's draft I2B2 paper

@parisni you mentioned that you were working on an I2B2 paper relevant to our proposal for a federated database. Do you have any more details (e.g. a draft citation, abstract etc)?

Hard-coded absolute path in loadTables.R

See
https://github.com/MIT-LCP/mimic-omop/blob/master/etl/ConceptTables/loadTables.R#L50

With the absolute path, the script silently does not load the CSVs, and then etl.sql fails on a missing table:
gcpt_note_section_to_concept (Cf. #40 )

mimic Chartevents / Labevents -> omop Measurement / Observation

OMOP

[CLINICAL] The MEASUREMENT table contains records of Measurement, i.e. structured values (numerical or categorical) obtained through systematic and standardized examination or testing of a Person or Person's sample. The MEASUREMENT table contains both orders and results of such Measurements as laboratory tests, vital signs, quantitative findings from pathology reports, etc.

[CLINICAL] The OBSERVATION table captures clinical facts about a Person obtained in the context of examination, questioning or a procedure. Any data that cannot be represented by any other domains, such as social and lifestyle facts, medical history, family history, etc. are recorded here.

MIMIC

CHARTEVENTS contains all the charted data available for a patient. During their ICU stay, the primary repository of a patient’s information is their electronic chart. The electronic chart displays patients’ routine vital signs and any additional information relevant to their care: ventilator settings, laboratory values, code status, mental status, and so on. As a result, the bulk of information about a patient’s stay is contained in CHARTEVENTS. Furthermore, even though laboratory values are captured elsewhere (LABEVENTS), they are frequently repeated within CHARTEVENTS. This occurs because it is desirable to display the laboratory values on the patient’s electronic chart, and so the values are copied from the database storing laboratory values to the database storing the CHARTEVENTS.
WARNING: Some items are duplicated between the labevents and chartevents tables. In cases where there is disagreement between measurements, labevents should be taken as the ground truth.

The LABEVENTS data contains information regarding laboratory based measurements. The process for acquiring a lab measurement is as follows: first, a member of the clinical staff acquires a fluid from a site in the patient’s body (e.g. blood from an arterial line, urine from a catheter, etc). Next, the fluid is bar coded to associate it with the patient and timestamped to record the time of the fluid acquisition. The lab analyses the data and returns a result within 4-12 hours

Based on that descriptions, I d'say the way to direct the data could be:
-> all labevents -> measurements
-> chartevents[duplicates related with labs] -> trash
-> chartevents[vital signs, with numerical values] -> measurements
-> chartevents[vital signs, with categorical values] -> measurements
-> chartevents[vital signs, with free text values] -> observations

What about you @alistairewj & @tompollard ?

Release NLP section pipeline

Right now, the NLP sectionizer is not released. This should be added as part of the ETL contributions.

visit naming improvement

As a reminder:

OHDSI/CommonDataModel#145
and OHDSI/CommonDataModel#153

tell how we go with columns mis-named by ohdsi

Urgency stays

Hey
@alistairewj @tompollard
So what about emergency stays?
That topic (MIT-LCP/mimic-code#159) confirms that emergency system and icu system are not linked.

Then, can we say that the real admittime is the edregtime when exists ?

OBSERVATION vs MEASUREMENT

The difference between these two tables is subtle:

Numeric/categorical (Quantitative) data in MEASUREMENT
String (Qualitative) data in OBSERVATION

So for example religion/ethnicity are placed in OBSERVATION, while heart rate/respiratory rate are placed in MEASUREMENT.

There are some edge cases worth discussing though.

Scales - for example Glasgow Coma Scale, Pain Scales, etc, which have a numeric component but are also qualitative - reasonably argued to go in MEASUREMENT as these are standardized scales (even if somewhat subjective)

... any other edge cases ?

test

testing

labeevents, high/low ranges

MIMIC only have a "abnormal" flag.
OMOP provides both HIGH / LOW ranges for a lab.

I d'say both are wrong. We should provide both, exactly as i2b2 does.

Does MIMIC have those ranges ?
What do you think about that ?

Thanks

	, starttime as drug_exposure_end_datetime
	, endtime as drug_exposure_start_datetime

mit-lcp / mimic-omop Goto Github PK

mimic-omop's Introduction

MIMIC-OMOP

DOCUMENTATION

"WHERE IS ..."

OMOP TABLES LOADED

MIMIC TABLES EQUIVALENCE

mimic-omop's People

Contributors

Stargazers

Watchers

Forkers

mimic-omop's Issues

OMOP

Prescribed medications

Continuous infusions

CareVue

MetaVision

Sao Paulo

Proposed solutions

Proposal 1 - Modify the OMOP data model

Proposal 2 - Infer the missing times

Proposal 3 - If only one time, put it into MEASUREMENT

Other proposals

care_site

fact_relationship

trick

Specimen

Culture

Bacterial count

Antibiotic

Summary

OMOP

MIMIC

Recommend Projects

Recommend Topics

Recommend Org