Git Product home page Git Product logo

mimic-omop's Introduction

MIMIC-OMOP

This repository contains an Extract-Transform-Load (ETL) process for mapping the MIMIC-III database to the OMOP Common Data Model. This process involves both transforming the structure of the database (i.e. the relational schema), but also standardizing the many concepts in the MIMIC-III database to a standard vocabulary (primarily the Athena Vocabulary, which you can explore here).

DOCUMENTATION

"WHERE IS ..."

Below in the README, we provide two sections. The first section, OMOP TABLES LOADED, lists the OMOP tables which have been populated from MIMIC-III. You can use this section to figure out what data generated each OMOP TABLE. For example, we can see that the OMOP CDM table person was populated using data from the patients and admissions table in MIMIC-III.

The second section, MIMIC TABLES EQUIVALENCE, lists all the tables in MIMIC-III, and shows where the data now exists in the OMOP CDM. For example, we can see that the MIMIC-III table patients was used to populate the OMOP CDM tables person and death.

OMOP TABLES LOADED

MIMIC

MIMIC TABLES EQUIVALENCE

mimic-omop's People

Contributors

alistairewj avatar aparrot89 avatar parisni avatar spfohl avatar tompollard avatar vojtechhuser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mimic-omop's Issues

Windows instead of Unix

I finally gave up replicating the PostgreSQL steps with SAS and got a PostgreSQL installation, importing MIMIC, but I still have to use Windows instead of Unix. So how would I implement steps under "0. Open up a terminal and define parameters"?

Reason for transfer event

In the TRANSFERS table (which goes to VISIT_DETAIL) there is a care_site which tells you where the patient is moved to. However, there is no field to store the reason why a transfer was made (e.g. a bed transfer, a ward transfer, and so on).

In discussion we agree that this should go in a derived table because (1) it will make analysis easier and (2) it is inferred information.

Updating the federated database diagram with OMOP tools

@parisni thanks for your work on the federated database diagram at: https://github.com/MIT-LCP/mimic-omop/blob/master/extras/mimi-omop-federation.png. It looks great.

I would like to make some changes, including adding the OMOP tools that you mentioned:

Is an editable version of the file available?

mimic microbiologyevents -> omop specimen / fact_relationship / measurements

OMOP

[CLINICAL] The SPECIMEN domain contains the records identifying biological samples from a person

[CLINICAL] The FACT_RELATIONSHIP table contains records about the relationships between facts stored as records in any table of the CDM. Relationships can be defined between facts from the same domain (table), or different domains. Examples of Fact Relationships include: Person relationships (parent-child), care site relationships (hierarchical organizational structure of facilities within a health system), indication relationship (between drug exposures and associated conditions), usage relationships (of devices during the course of an associated procedure), or facts derived from one another (measurements derived from an associated specimen).

Then I d'say that microbiologyevents should be split as three tables (inspired from that discussion http://forums.ohdsi.org/t/mapping-microbiology-susceptibility-into-omop-cdm4-observations/167/3 ):

  1. specimen: stores the specimen informations
  2. measurements: store the resistances as categorical values, or the other microbiology things in various way
  3. fact_relationship: links measurements to specimens

What about you @tompollard & @alistairewj ?

Warning, Error Fields

Hi

OMOP looks not having such fields. That would be great to propose a method to describe those error/warnings, with a set of codes, and rules.

Then a quality tool might explore the data and reason on it.

I may be wrong, and OMOP does have equivalent. At this step, not found.

ERROR: duplicate key value violates unique constraint "xpk_drug_exposure"

Has anyone run into this?

At the step psql "$MIMIC" -f "etl/etl.sql"

....
Time: 734.509 ms
INSERT 0 2082294
Time: 113601.293 ms
INSERT 0 1063525
Time: 69457.080 ms
INSERT 0 716595
Time: 26671.288 ms
psql:etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql:112: ERROR:  duplicate key value violates unique constraint "xpk_drug_exposure"
DETAIL:  Key (drug_exposure_id)=(394992270) already exists.
Time: 133169.830 ms

Duplicate concepts for prescriptions when NDC is null or zero

The ETL is currently duplicating rows because there are duplicate rows in one of the concept mappings:

select
  c3.*, c.numobs
from gcpt_prescriptions_ndcisnullzero_to_concept c3
inner join 
(
  select label, count(*) as numobs from gcpt_prescriptions_ndcisnullzero_to_concept group by label having count(*)>1
) c
  on c3.label = c.label
order by c3.label, c3.concept_id;

gives:

label concept_id concept_name mimic_id numobs
Advair Diskus 100/50MCG 40170634 fluticasone / salmeterol Dry Powder Inhaler [Advair] 2001045339 2
Advair Diskus 100/50MCG 40171027 Fluticasone propionate 0.1 MG/ACTUAT / salmeterol 0.05 MG/ACTUAT [Advair] 2001045340 2
desvenlafaxine 100 mg 1593106 desvenlafaxine succinate 100 MG 2001046244 2
desvenlafaxine 100 mg 19129663 Desvenlafaxine 100 MG 2001046243 2
Meperidine HCl 100MG/2ML AMP 1102527 Meperidine 2001045607 2
Meperidine HCl 100MG/2ML AMP 40164998 Meperidine Hydrochloride 100 MG/ML 2001045608 2

The offending rows are lines 308-309, lines 575-576, and lines 1211-1212.

@aparrot89 can you recommend which concept we should keep?

Laboratory measurements

There is no specimen identifier in MIMIC-III v1.4. However, it would be beneficial to have one so we can populate the SPECIMEN table in OMOP.

Currently, the repository has:

  • Added a fictitious specimen in order to store data in the specimen table
  • Added rows in FACT_RELATIONSHIP to link lab value to specimen
  • Added rows in FACT_RELATIONSHIP to link specimen to lab value

We will need to add the real specimen IDs from the next update of MIMIC-III.

Data federation ; consider using a bigint sequence

Hey,

I d'guess that's would be worth nothing to use an overall bigint sequence for all mimic objects (say mimic_id).
(ie, a row_id equivalent, but identifiing uniquely each mimic concept, a unique and reproductible way)

That will be useful for many use cases:

  • union all on different tables
  • international union all : using bigint will a necessity very soon
  • avoid collision between mimic concepts & omop concepts (for eg by begining sequence to 10 bilion
  • avoid join on tables by mistake (AFAIK, you already introduce such concept by mooving primary keys in order to ovoid bad joins between tables)

Those mimic_id will be used for the ETL, and they will be used as id to populate mimic_omop
The solution looks like a huge update on the OMOP model, but mooving from int4 to int8 has already been introduced in some omop tables, and mooving to int8 is obvious, and will definitely be done one day.

Then I will build a postgreql DDL that add a mimic_id in all the mimic tables as a proof of concept.
Interested in your feedback on that - you may just way I do the job to understand exaclty the details

mimic.callout : No omop equivalent ?

Hi,

Record of when patients were ready for discharge (called out), and the actual time of their discharge (or more generally, their outcome).

We may propose to add a field in omop.visit_details, to capture that information.

BTW, is that information really relevant for research and what uses cases ?

Visit detail column names are inconsistent

The visit detail table has some columns:

  • admitting_source_concept_id
  • admitting_source_value
  • discharge_to_concept_id
  • discharge_to_source_value

This seems inconsistent for two reasons:

  1. "admitting" and "discharge" are different tenses (perhaps it should be "admission")
  2. admitting_source_concept_id contains the word "source" but discharge_to_concept_id does not

I think, regarding (2), that it is an error. The description is: "A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the admitting source for a visit.". So I think that the idea is: admitting_source_value is the original data, and next to it we should have admitting_concept_id for the standardized data, not admitting_source_concept_id. (Actually, I'd further argue it should be admission_concept_id). Thoughts? I could raise this at OHDSI if we agree.

DRUG_EXPOSURE start/end times may be reversed

Looking at some data from the DRUG_EXPOSURE table:

drug_exposure_id person_id drug_concept_id drug_exposure_start_datetime drug_exposure_end_datetime
802093 886218 35606152 5/17/35 18:22 5/17/35 17:22
801330 886282 35605935 11/14/30 21:24 11/14/30 20:54
807394 886296 35605935 6/16/81 16:00 6/16/81 15:00

Looks like flipped the start/end times here:

, starttime as drug_exposure_end_datetime
, endtime as drug_exposure_start_datetime

Extracting numbers from free text fields

I was looking through the PRESCRIPTIONS -> DRUG_EXPOSURE ETL, and came across the use of the extract_value function here:

, extract_value(dose_val_rx) as quantity --extract quantity from pure numeric when possible

The function seems to extract numbers from text - which is great - but it makes the assumption that either commas or decimals are used as the decimal separator, e.g. it allows for 1.05 and 1,05 to both map to the same number. However this is a regional thing, and in MIMIC a comma separator is almost always used as a thousandths separator.

For MIMIC, prescriptions, we see this is bad:

select
prod_strength
, count(*) as nobs
, max(dose_val_rx) as dose_str
, extract_value(max(dose_val_rx)) as dose_numeric
from prescriptions
where dose_val_rx like '%,%'
group by prod_strength
order by count(*) desc
limit 10;

Returns:

prod_strength nobs dose_str dose_numeric
25,000 unit Premix Bag 31532 25,000 25
10,000 Units / mL - 5 mL Vial 842 25,000 25
10,000 Unit Vial 588 55,000 55
50,000 Unit Capsule 324 50,000 50
5000 Units / mL- 1mL Vial 194 30,000 30
500,000 Unit Tablet 110 500,000 500
200mg/mL-4mL Vial 104 16,000 16
10,000 Unit Capsule 97 50,000 50
20,000 Unit Vial 85 60,000 60
500,000 Unit UDCUP 75 500,000 500

I think there should be one function for European ETL and one function for non-European ETL. I am going to separate the functions and have extract_value_decimal_sep and looks_like_value_decimal_sep to be used with UK/American style separators. I'll fix this for prescriptions and leave this issue open because we should check the other ETLs for this bug. grep tells us to look at:

  • ./etl/StandardizedClinicalDataTables/DRUG_EXPOSURE/etl.sql:40:, extract_value(dose_val_rx) as quantity --extract quantity from pure numeric when possible
  • ./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:14: , extract_value(value) as value_as_number
  • ./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:156: , extract_value(value) as value_as_number
  • ./etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:334: , extract_value(dilution_comparison) as value_as_number

Mapping MIMIC ITEMIDs to respective concept IDs

Hello,

I have been looking through the work that has been done to map MIMIC to the OMOP model and I was wondering if a resource exists that pairs each current MIMIC d_label with its existing itemid along with the new measurement_concept_id. In other words, taking SPO2 as an example (mapped as measurement_concept_id = 3016502 in the OMOP model), is there a reference that also lists the respective itemids that these concepts are currently listed under in MIMIC?

I may be wrongly assuming that in order to create this OMOP mapping, each concept was acquired from the clinical dataset and then assigned a measurement_concept_id based on its similarity to other concepts in the dataset. But if this is the case, wasn't the itemid for each concept also obtained during this process and (possibly) stored somewhere?

Appreciate the help with this.

Brett

Fluid output conversion issue

There is some inconsistency in the conversion of some of the output measurements. I copied over code from mimic-code and adapted it the schema:

select meas.visit_detail_id
, meas.person_id
, meas.measurement_datetime
, meas.measurement_concept_id
, mc.concept_name
, meas.value_as_number
, uc.concept_name
, meas.measurement_source_concept_id
, msc.concept_name as source_concept_name
, meas.value_source_value
-- vd.care_site_id
from visit_detail vd
left join measurement meas
 on vd.visit_detail_id = meas.visit_detail_id
 and meas.measurement_datetime >= vd.visit_start_datetime
 and meas.measurement_datetime  < (vd.visit_start_datetime + interval '1' day)
left join concept mc on meas.measurement_concept_id = mc.concept_id
left join concept msc on meas.measurement_source_concept_id = msc.concept_id
left join concept uc on meas.unit_concept_id = uc.concept_id
where measurement_source_concept_id in
(
select concept_id
from concept
where concept_code in
(
  -- these are the most frequently occurring urine output observations in CareVue
  '40055', -- "Urine Out Foley"
  '43175', -- "Urine ."
  '40069', -- "Urine Out Void"
  '40094', -- "Urine Out Condom Cath"
  '40715', -- "Urine Out Suprapubic"
  '40473', -- "Urine Out IleoConduit"
  '40085', -- "Urine Out Incontinent"
  '40057', -- "Urine Out Rt Nephrostomy"
  '40056', -- "Urine Out Lt Nephrostomy"
  '40405', -- "Urine Out Other"
  '40428', -- "Urine Out Straight Cath"
  '40086',--	Urine Out Incontinent
  '40096', -- "Urine Out Ureteral Stent #1"
  '40651', -- "Urine Out Ureteral Stent #2"

  -- these are the most frequently occurring urine output observations in MetaVision
  '226559', -- "Foley"
  '226560', -- "Void"
  '226561', -- "Condom Cath"
  '226584', -- "Ileoconduit"
  '226563', -- "Suprapubic"
  '226564', -- "R Nephrostomy"
  '226565', -- "L Nephrostomy"
  '226567', --	Straight Cath
  '226557', -- R Ureteral Stent
  '226558', -- L Ureteral Stent
  '227488', -- GU Irrigant Volume In
  '227489'  -- GU Irrigant/Urine Volume Out
)
)
and meas.person_id = 62101896
order by meas.measurement_datetime

I get the following:

visit_detail_id person_id measurement_datetime measurement_concept_id concept_name value_as_number concept_name measurement_source_concept_id source_concept_name value_source_value
67094765 62101896 2174-03-21 20:02:21.000 3014315 Urine output -3000 milliliter 2001028370 label:[GU Irrigant Volume In]dbsource:[metavision]linksto:[outputevents]unitname:[mL]param_type:[Numeric] (null)
67094765 62101896 2174-03-21 20:02:21.000 3014315 Urine output -2650 milliliter 2001028371 label:[GU Irrigant/Urine Volume Out]dbsource:[metavision]linksto:[outputevents]unitname:[mL]param_type:[Numeric] (null)

However from the original MIMIC data:

select oe.subject_id, oe.icustay_id, oe.charttime
, oe.itemid
, di.label
, oe.value
, (
    -- we consider input of GU irrigant as a negative volume
    case when oe.itemid = 227488 then -1*value
    else value end
  ) as value_neg
from outputevents oe
inner join d_items di on oe.itemid = di.itemid
where oe.subject_id = 99031
and oe.itemid in
(
  -- these are the most frequently occurring urine output observations in CareVue
  40055, -- "Urine Out Foley"
  43175, -- "Urine ."
  40069, -- "Urine Out Void"
  40094, -- "Urine Out Condom Cath"
  40715, -- "Urine Out Suprapubic"
  40473, -- "Urine Out IleoConduit"
  40085, -- "Urine Out Incontinent"
  40057, -- "Urine Out Rt Nephrostomy"
  40056, -- "Urine Out Lt Nephrostomy"
  40405, -- "Urine Out Other"
  40428, -- "Urine Out Straight Cath"
  40086,--Urine Out Incontinent
  40096, -- "Urine Out Ureteral Stent #1"
  40651, -- "Urine Out Ureteral Stent #2"
  -- these are the most frequently occurring urine output observations in MetaVision
  226559, -- "Foley"
  226560, -- "Void"
  226561, -- "Condom Cath"
  226584, -- "Ileoconduit"
  226563, -- "Suprapubic"
  226564, -- "R Nephrostomy"
  226565, -- "L Nephrostomy"
  226567, --Straight Cath
  226557, -- R Ureteral Stent
  226558, -- L Ureteral Stent
  227488, -- GU Irrigant Volume In
  227489  -- GU Irrigant/Urine Volume Out
)
order by subject_id, charttime;
subject_id icustay_id charttime itemid label value value_neg
  99031 |     258971 | 2174-03-21 20:02:00 | 227488 | GU Irrigant Volume In        |  3000 | -3000
  99031 |     258971 | 2174-03-21 20:02:00 | 227489 | GU Irrigant/Urine Volume Out |  2650 | 2650

There are two things here:

(1) I think the ETL process tries to assign GU Irrigant Volume In as negative since this is technically a "negative" UO, but incorrectly makes both negative
(2) I am not sure if we should "fix" data in this way - this is a good example of the risk in this approach as the value is sometimes directly documented as a negative value and so we would incorrectly make those values positive. Perhaps better is a materialized view which has this logic.

service start_time after end_time

Apparently 167 services do have a bug in the omop etl since their start date is after the end time.

The french data provides both ward and service information too. I cannot imagin a better way of storing this information.

The alternative is to link the services directly to visit_occurrence. However this does not allow to store the parent_id linked to ward: some service cover multiple ward but only one parent_id is available. Then the way of knowing how ward and services are related would be based on calculus based on datetime that is a waste of time.

In the meantime, if we consider to propose an optimised version of omop, then this question will be reconsidered, and we will find a way to link efficiently the service information then.

That's why I suggest to go with a consistent focused version and implement the alternative way.

Any thought ?

Unit of measure

Just a general question, and forgive it if it sounds naive, but how do we harmonize unit of measure? For example, I was looking to map tidal volumes to SNOMED-CT, but I couldn't see how to specify the unit of measure (millilitres or ml/kg ideal body weight).

No standard for unknown admission/discharge location

We have hospital admission source of "** INFO NOT AVAILABLE **" and a discharge location of "OTHER FACILITY". For admission, the only decent match I found was 32199 - Information not available - from class "UB04 Point of Origin". The UB04 ontology seems to be for claims though, not sure what the impact of that would be.

For the discharge of other facility I used 8844 - Other Place of Service - though there was the option of 32209 - Unknown Value (but present in data) from class "UB04 Pt dis status".

Seems odd there is no standard concept for a location of "unknown", am I missing something?

concept mapping

Hello
Some questions about chart_label_to_concept.csv

  • first question = FiO2 : all the labels 'FiO2 Set', 'FiO2 (Analyzed)', 'HFO- FIO2', 'ecmo fio2' ... are linked to the same standard concept called 'Oxygen concentration breathed'. Do you think it's a good mapping because 'set' and 'analysed' mean different things!

  • second question = intracranial pressure. There are > 20 different items. The numbers of distinct values per label are quite differents. 'Intra Cranial Pressure' occurs 92306 times, 'icp' occurs 7 times.
    Moreover the distribution of values are quite differents. The median of ICP Camino is 89.5, the median of 'ICP' is 10!!
    Do you think we should map all to a unique standard concept?

general: missing fields in OMOP

Hi

Some fields are not provided by OMOP. Eg: admissions.insurance. I guess we should extend OMOP and propose new fields. That won't affect OHDSI tools. We could then propose those extension to OHDSI group.

We could then call those fields with the omop style: adding a local insurance_source & source_id and a insurance_id that would point to a standard concept.

That was just an idea, but we could also choose to loose those fields, or case by case

what about calculated variable (contrib)

Hey

eg: sofa, ventilation, sepsis....

There is 3 way of doing:

  1. use the fact_relationship table, add them in observation, or measurement, and link them to the right table
  2. add them as new columns (for eg: calc_venti), with documentation in the right table (visit_occurrence , measurement) according to their granularity
  3. add a new table (measurement_calc) that contains the same primary key as measurement

I would say 2 is the best way, because SQL user friendly. Also, there is no problem with backward compatibility with OMOP, because existing queries won't be affected by new columns.

If agreed, then we need to choose witch variable need to be added, and where. I have a colleague that will be working on it 200% user = @aparrot89

Procedureevents_mv

Hi

Not very confortable with the mimic.procedureevents_mv table (that is not that well documented https://mimic.physionet.org/mimictables/procedureevents_mv/ )

omop.procedure_occurrence.code says:

The source code for the Procedure as it appears in the source data. This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes.

But that procedureevent_mv looks like to fit much better in "omop.observation" table with its value field and also units

Storing continuous infusions and drug administrations

There is a bit of flexibility in how drug administrations could be stored in OMOP and this issue aims to document our discussion so far and allow us to decide on a final approach.

Prescribed medications

It seems that our databases were relatively consistent in how medications are prescribed:

  1. MIMIC have prescriptions which have a start date and an end date
  2. HIAE (Sao Paulo) have prescriptions which have start date, end date, and have a start time/end time
  3. APHP have prescriptions (@parisni can comment whether it's only dates)

So this works and can be placed into DRUG_EXPOSURE with drug_type_concept_id = 38000177 ("Prescription written"). There is probably a bit more we could discuss with that ETL but let's leave that for another issue.

Continuous infusions

Continuous infusions are challenging. First I will describe the three data sources that we must make a compromise for: CareVue (in MIMIC-III), MetaVision (in MIMIC-III), and HIAE (Sao Paulo).

CareVue

  1. The amounts are stored with a single time which represents an end time, that is each row says "the patient received X amount by Y time"
  2. The rate of the infusion is stored with a single time which represents a start time, that is each row says "the patient is now receiving at T time the compound at Z rate"

MetaVision

  1. The rate is stored with two times: a start time and an end time. Any time the rate is changed, a new row occurs which has the new rate and the new start time and end time. Amounts are never stored, but can be calculated from the rate.

Sao Paulo

  1. The amounts are stored with a single time which represents an end time, that is each row says "the patient received X amount by Y time" (i.e. the same as CareVue) (@lbulgarelli to confirm)
  2. The rate of the infusion is stored at the same time as the amount, that is each row says "the patient is receiving Z rate at Y time"

Proposed solutions

There are a few options on where this data could go, and we mostly agree it's one of: DRUG_EXPOSURE, MEASUREMENT, or OBSERVATION. Here are the proposals we discussed.

Proposal 1 - Modify the OMOP data model

  • Amount -> DRUG_EXPOSURE (i.e. one row per documented amount delivered)
    • If only endtime is available, leave starttime as null
    • If both are available, put both
  • Rate -> DRUG_EXPOSURE (i.e. one row per documented rate delivered)
    • If only starttime is available, leave endtime as null
    • If both are available, put both

Advantages

  • this captures the data as is (i.e. no data loss)
  • DRUG_EXPOSURE is the intuitive location for these administrations

Disadvantages

  • this violates the data model - both starttime and endtime are required fields and we can't leave one blank
  • analysis would require special logic to deal with nullable start times and end times

Proposal 2 - Infer the missing times

  • If rate available with start time and end time -> DRUG_EXPOSURE
  • Otherwise, derive start time/end time for amount, convert to rate -> DRUG_EXPOSURE

Advantages

  • Conforms to the data model
  • Intuitive location for the data - both orders and administrations are in DRUG_EXPOSURE (which is suggested by the documentation)

Disadvantages

  • Errors will be introduced when inferring start time or end time
  • Mixing derived data with raw source data, without explicitly marking it as derived, is undesirable

Proposal 3 - If only one time, put it into MEASUREMENT

  • If rate available with start time and end time -> DRUG_EXPOSURE
  • Rates with start time -> MEASUREMENT
  • Amounts with end time -> MEASUREMENT

Advantages

  • Conforms to the data model
  • If only one time is available, one time is used

Disadvantages

  • Splits drug administration in two tables, depending on the hospital, making queries not generalize
  • Unintuitive location for the data - the description of MEASUREMENT in the documentation doesn't seem to fit for an administration (could put the data in OBSERVATION to avoid this)

Other proposals

If people have other proposals, I can edit the post and put them here.

Feed observation_period

This table has been described as mandatory.

This should be easy to load it like visit_occurrence table.

icustays

Hey

Well icustay_id now.

AFAIK, icustay_id is a pure myth. (from doc "is a generated identifier..."). Apparently the method you generate them from transfers table looks like "if readmission within 24h then keep the icustay_id else create a new icustay_id". Then for facts that are well timestamped (such chartevents), you are able to link each to an icustay and for those that contain approxymativ timestamps (labs) you don't have any icustay_id.

Is this correct or can you clarify a little ?

Thanks a lot,

Ethnicity concerns

Hey

As noticed by Paul, Ethnicity does not have the same granularity in mimic and omop.
mimic stores ethnicity for each admission. Then sometime people do have multiple different ethnicity values. What is the real ethnicity value to store in the omop.person then ?

Paul made a decision based on race priority (not sure to understand what the priority is based on). For now I choosed to pick the last one.
We can still store each values in the observation table.

Mimic OMOP Common Data Model

Hello,

I am an undergraduate student at Texas A&M University, and I have gotten access to the MIMIC III database and have set up the database with postgres. We are working on a senior design project that will involve this database, however we must convert this database into a common data model. The OMOP data model seems ideal as there is already work on it by you guys so we are hoping to get permission to use this github.

I have seen the scripts that are on this github, but I have no clue as to what needs to be done first to get the data model converted to the OMOP model. I was hoping for further documentation on how all of this works, or for one of the contributors to help point me in the right direction. Otherwise, I would have to analyze this github, which will take some time for me as I am unfamiliar with a few things like etl, and R.

Thanks!

Split care_site.csv in two?

Seems like the care_site.csv contains a mapping for services and units, though I think it's cleaner to have these two mappings in separate files. Thoughts?

Clarification on the "standard concepts from Athena"

The doc says:

The standard concepts from Athena have been downloaded and are available somewhere (including running the extra script to download CPT code definitions)

what is meant by "standard"? the pre-selected ones on the webapp? or is there a list somewhere?

omop death table

hey

From MIT-LCP/mimic-code#190 and from the query above:

WITH
"dead_adm" AS (SELECT distinct on (hadm_id) hadm_id, deathtime, dischtime, subject_id from mimiciii.admissions ORDER BY hadm_id, dischtime DESC),
"tmp" as ( SELECT subject_id , dod_hosp , deathtime , dischtime FROM mimiciii.patients LEFT JOIN dead_adm USING (subject_id) WHERE dod_hosp IS NOT NULL AND (dod_hosp::date is distinct from deathtime::date OR dead_adm.subject_id IS NULL) ORDER BY deathtime),	 
"total" as (SELECT *, greatest(dod_hosp,dischtime) - least(dod_hosp,dischtime)::date between '0 day'::interval AND '2 day'::interval as is_dod_patient_ok, greatest(deathtime,dischtime) - least(deathtime,dischtime)::date between '0 day'::interval and '2 day'::interval as is_dod_admissions_ok FROM tmp)
SELECT * FROM total where is_dod_patient_ok IS FALSE OR is_dod_admissions_ok IS FALSE;

There is apparently some odd aspects with dod with both admissions / patients. The proposed solution is to consider first admissions as the source of hospitalisation death, and patients dod_ssn for the foreign death source of information.
For the admissions deathtime that are odd, the dischtime is taken in consideration if patient.dod_hosp is the same (this is the case for the 11 odd admissions)

Implement care_site fact_relationship

As discussed, each physical visit_detail/care_site should act as this example:

care_site

id type
 1  bed
 2  ward
 3  hospital

fact_relationship

fact_id_1 fact_id_2 relationship_id
1 1 Care Site is part of Care Site
1 2 Care Site is part of Care Site
1 3 Care Site is part of Care Site
2 2 Care Site is part of Care Site
2 3 Care Site is part of Care Site
3 3 Care Site is part of Care Site

trick

adding 1-1 or 2-2 auto centered fact_relationship allows to get in one query multiple level of detailled data.

with
  visit_detail (care_site_id, visit_idd) as (VALUES (1,'visit 1'),(2,'visit 2'),(3,'visit 3'))
, care_site (care_site_id, type_id) as (VALUES (1,'bed icu'),(2,'icu'),(3,'nicu'))
, fact_relationship (fact_1,fact_2) as (values
 (1,1) -- bed linked to itself
,(1,2) -- bed linked to ward
,(2,2) -- ward linked to itself
,(3,3) -- ward linked to itself
)
select visit_detail.*
from visit_detail
join fact_relationship on fact_1 = care_site_id
join care_site c on fact_2 = c.care_site_id and  c.type_id = 'icu';
care_site_id visit_idd
1 visit 1
2 visit 2

Then this 2 join query should work in all cases to find any nicu data or whatever. Then fact (measurement, observation..) data can be linked to bed, or any nested level of visit_detail.
In case no "linked to itself" is loaded, then the query becomes way more complicated

Notes / section

Hi @alistairewj

I have been talking to people from comlumbia that made the note_nlp table born.
I have a good idea on how to put section in it, and I was wondering wether:

  1. you had for each category the list of possible sections values
  2. you had a mapping to LOINC
    Or if you had some equivalent work ready to share

Thanks !

Storing microbiology data

We mostly agreed on how to store microbiology data, but there are some loose ends to tie up.

Specimen

Each microbiology culture should be associated with a specimen, and this specimen is intuitively stored in the SPECIMEN table. It's unclear how disease_concept_id should be used, so we have opted not to use it.

Culture

In order to indicate that a culture was done, we should have a row in the MEASUREMENT table that indicates that a culture was performed. This row should have a relationship to the specimen using FACT_RELATIONSHIP.

Bacterial count

If one or more bacteria types were cultured, then for each bacteria the numeric measurement (e.g. colony count) should be stored in MEASUREMENT with the value_as_number being the count and the concept_id being a standardized code for this count (I believe LOINC has a good vocabulary for this). Each bacterial growth should have a relationship to the specimen using FACT_RELATIONSHIP.

Antibiotic

Tests for antibiotic resistance are stored in MEASUREMENT. At the very least, we will have one row per antibiotic tested which indicates whether the bacteria was susceptible, resistance, or indeterminate. Then, depending on the raw data, we may have another row which indicates the dilution of that antibiotic. This will link to the bacteria using FACT_RELATIONSHIP. Then, arguably, the dilution level of the antibiotic should link to the antibiotic susceptibility using FACT_RELATIONSHIP. This last part hasn't really been discussed so I welcome feedback here.

Summary

  • specimens in SPECIMEN table
  • one MEASUREMENT row indicating that the blood was cultured and positive/negative
  • zero or more MEASUREMENT rows indicating a count for a specific bacteria (perhaps using a LOINC concept)
  • zero or more MEASUREMENT rows indicating if there is an antibiotic resistance (use fact_relationship bidirectional to the culture)
  • zero or more MEASUREMENT rows indicating the antibiotic dilution (use FACT_RELATIONSHIP bidirectional to the antibiotic susceptibility)

mimic_id_concept_seq

Where is the sequence 'mimic_id_concept_seq' created that is then referenced by nextval('mimic_id_concept_seq')? Thx

ERROR: duplicate key value violates unique constraint "xpk_measurement"

After trying the fix for #48, I got:

...
INSERT 0 11656610
Time: 1585618.783 ms
INSERT 0 630856
Time: 81014.760 ms
psql:etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:804: ERROR:  duplicate key value violates unique constraint "xpk_measurement"
DETAIL:  Key (measurement_id)=(301930857) already exists.
Time: 5491441.549 ms

Nicolas's draft I2B2 paper

@parisni you mentioned that you were working on an I2B2 paper relevant to our proposal for a federated database. Do you have any more details (e.g. a draft citation, abstract etc)?

mimic Chartevents / Labevents -> omop Measurement / Observation

hi

OMOP

[CLINICAL] The MEASUREMENT table contains records of Measurement, i.e. structured values (numerical or categorical) obtained through systematic and standardized examination or testing of a Person or Person's sample. The MEASUREMENT table contains both orders and results of such Measurements as laboratory tests, vital signs, quantitative findings from pathology reports, etc.

[CLINICAL] The OBSERVATION table captures clinical facts about a Person obtained in the context of examination, questioning or a procedure. Any data that cannot be represented by any other domains, such as social and lifestyle facts, medical history, family history, etc. are recorded here.

MIMIC

CHARTEVENTS contains all the charted data available for a patient. During their ICU stay, the primary repository of a patient’s information is their electronic chart. The electronic chart displays patients’ routine vital signs and any additional information relevant to their care: ventilator settings, laboratory values, code status, mental status, and so on. As a result, the bulk of information about a patient’s stay is contained in CHARTEVENTS. Furthermore, even though laboratory values are captured elsewhere (LABEVENTS), they are frequently repeated within CHARTEVENTS. This occurs because it is desirable to display the laboratory values on the patient’s electronic chart, and so the values are copied from the database storing laboratory values to the database storing the CHARTEVENTS.
WARNING: Some items are duplicated between the labevents and chartevents tables. In cases where there is disagreement between measurements, labevents should be taken as the ground truth.

The LABEVENTS data contains information regarding laboratory based measurements. The process for acquiring a lab measurement is as follows: first, a member of the clinical staff acquires a fluid from a site in the patient’s body (e.g. blood from an arterial line, urine from a catheter, etc). Next, the fluid is bar coded to associate it with the patient and timestamped to record the time of the fluid acquisition. The lab analyses the data and returns a result within 4-12 hours

Based on that descriptions, I d'say the way to direct the data could be:
-> all labevents -> measurements
-> chartevents[duplicates related with labs] -> trash
-> chartevents[vital signs, with numerical values] -> measurements
-> chartevents[vital signs, with categorical values] -> measurements
-> chartevents[vital signs, with free text values] -> observations

What about you @alistairewj & @tompollard ?

OBSERVATION vs MEASUREMENT

The difference between these two tables is subtle:

  1. Numeric/categorical (Quantitative) data in MEASUREMENT
  2. String (Qualitative) data in OBSERVATION

So for example religion/ethnicity are placed in OBSERVATION, while heart rate/respiratory rate are placed in MEASUREMENT.

There are some edge cases worth discussing though.

  1. Scales - for example Glasgow Coma Scale, Pain Scales, etc, which have a numeric component but are also qualitative - reasonably argued to go in MEASUREMENT as these are standardized scales (even if somewhat subjective)

... any other edge cases ?

labeevents, high/low ranges

Hi

MIMIC only have a "abnormal" flag.
OMOP provides both HIGH / LOW ranges for a lab.

I d'say both are wrong. We should provide both, exactly as i2b2 does.

Does MIMIC have those ranges ?
What do you think about that ?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.