arch-commons / arch-ontology Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 13.0 520.56 MB

ARCH i2b2 PCORnet Ontology

License: Other

SQLPL 30.23% PLpgSQL 69.77%

arch-ontology's People

Contributors

Stargazers

Watchers

Forkers

njgraham university-of-wisconsin-himc conniez pitviper6 jayped007 nihr43 lenamax2355 iben-mickrick ygdmxy

arch-ontology's Issues

Age at visit does work in oracle

dbo.stringpart function doesn't use @delimiter arg

This is a minor issue, I suppose. But the dbo.stringpart function in ontology_utils_mssql.sql script doesn't make use of @delimiter argument. It seems, instead of the '' it should say @delimeter in the following lines:
WHILE @num!=@el and CHARINDEX('\', @stringToSplit) > 0
SELECT @pos = CHARINDEX('\', @stringToSplit)
Please reject this issue, if this is an incorrect assumption. Thanks.

Same Dx and PX c_basecodes

Both Pcornet_Proc and Pcornet_Diag ontology table have the following c_basecodes:
ICD10:B00, ICD10:B01, ICD10:B02, ICD10:B03, ICD10:B04, ICD10:B20, ICD10:B30, ICD10:B33, ICD10:B34, ICD10:B40, ICD10:B41, ICD10:B42, ICD10:B43, ICD10:B44, ICD10:B50, ICD10:B51, ICD10:B52, ICD10:B53, ICD10:B54, ICD10:B70, ICD10:B80, ICD10:B82, ICD10:B83, ICD10:B90, ICD10:B91, ICD10:B92, ICD10:C01, ICD10:C02, ICD10:C03, ICD10:C05, ICD10:C21, ICD10:C22, ICD10:C23, ICD10:C25, ICD10:C51, ICD10:C71, ICD10:C72, ICD10:C75, ICD10:C76, ICD10:C81, ICD10:C91, ICD10:D00, ICD10:D01, ICD10:D02, ICD10:D70, ICD10:D71, ICD10:D72, ICD10:D80, ICD10:D81, ICD10:D82, ICD10:F01, ICD10:F02, ICD10:F06, ICD10:F07, ICD10:F09, ICD10:F13, ICD10:F14, ICD10:F15

This may cause incorrect counts query results. For example: If Diagnosis code 'ICD10:F15' is queried for then both diagnosis and procedure records will be included in the patient count. Likewise, if Procedure code 'ICD10:F15' is queried.

In genera, diagnosis codes should always be different from procedure codes in the ICD coding system. Please look into this.

Labs ontology: creatinine appears twice

Demographics: large ages still queryable

Large ages are still queryable in the new ontology. They are inactive in the >=85 tree but not the >=65.

Ontology file import - literal "NULL" converted to actual null

This may be database/upload client specific. Using Oracle SQL Developer backed by Oracle 11g literal "NULL" values were converted to actual nulls. Lori's suggestion of Adding a leading space: " NULL" resulted in successful literal upload.

Oracle Ontology utility script needs updating

I fixed the MSSQL utility script yesterday, but the Oracle utility script is out of date and needs updating

Loading medication ontology with Oracle sqlldr results in error "column C_TOOLTIP. second enclosure string not present"

Command:
ORACLE_SID=ORCL sqlldr <user>/<password> control=PCORNET_MED.ctl data=PCORNET_MED.TXT bad=PCORNET_MED.bad log=PCORNET_MED.log errors=10000

Errors from sqlldr log file (nearly all the rows):
Record 1: Rejected - Error on table BLUEHERONMETADATA.PCORNET_MED, column C_TOOLTIP. second enclosure string not present

The control file I'm using is attached (changed to .txt since oddly github only allows specific extensions).

PCORNET_MED_CTL.txt

Age >85 should not be queryable individually!

Ages >85 should be in a single bucket for de-identified querying compliance

concept_dim updater excludes empty exclusion_codes

concept_dim updater in the documentation excludes empty exclusion_codes - need fix if sites don’t null it out

Umlauts Replaced by ? inside Diag Ontology

20 ICD10 codes in ForUpgradeOnly folder txt file had ? instead of 'oe' umlaut values, and in the pcornet_diag.txt v2.1.2 file they appear to have been partially fixed (c_tooltip still has ?). Was that part of the 2.1.2 fix you applied earlier?

Example:
Inside ForUpgradeOnly folder - ICD10-CM-SCILHS-2015AA.zip:
6|"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7(C88.0) Walden~cfpb\"|"**Waldenstr?m macroglobulinemia**"|"N"|"FAE"|0|"ICD10:C88.0"||"concept_cd"|"CONCEPT_DIMENSION"|"concept_path"|"T"|"LIKE"|"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7(C88.0) Walden~cfpb\"||"Diagnoses \ Neoplasms (c00-d49) \ Malignant neoplasms of lymphoid, hematopoietic and related tissue (c81-c96) \ Malignant immunoproliferative diseases and certain other b-cell lymphomas \ **Waldenstr?m macroglobulinemia**"|"@"|2015/01/01 12:00:00 AM|2016/06/29 04:46:17 PM|2016/06/29 04:46:18 PM|"RPDR_2015"|||"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7\"|"(C88.0) Walden~cfpb"|"C88.0"

Inside pcornet_diag.zip in ontology folder
6|"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7(C88.0) Walden~cfpb\"|**"Waldenstroem macroglobulinemia"**|"N"|"FAE"|0|"ICD10:C88.0"||"concept_cd"|"CONCEPT_DIMENSION"|"concept_path"|"T"|"LIKE"|"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7(C88.0) Walden~cfpb\"||"Diagnoses \ Neoplasms (c00-d49) \ Malignant neoplasms of lymphoid, hematopoietic and related tissue (c81-c96) \ Malignant immunoproliferative diseases and certain other b-cell lymphomas \ **Waldenstr?m macroglobulinemia**"|"@"|2015/01/01 12:00:00 AM|2016/06/29 04:46:17 PM|2016/06/29 04:46:18 PM|"RPDR_2015"|||"\PCORI\DIAGNOSIS\10(C00-D49) Neop~k3n8(C81-C96) Mali~u9p7(C88) Malignan~9aq7\"|"(C88.0) Walden~cfpb"|"C88.0"
This affects the following fields: c_name, c_tooltip, but not c_dimcode, so the other bug I found for 2.1.2 milestone doesn't seem to address this issue.

For consistency's sake, it would make sense to apply the change to the ForUpgradeOnly file as well.

Question: should this fix be applied consistently to both c_name and c_tooltip?

concept_dim updater skips hiddens but not inactives

Behavior should be the reverse

Minor cleanup

Some c_symbols have '/'
Some pcori basecodes have prefixes
Doesn't affect anything, just for cleanliness

Diagnosis ontology: ICD-10 tree upgrade

The ICD-10 diagnosis tree needs to be upgraded to the 2015AA version with interleaved ICD-9 codes.

Medications: several duplicates

c_dimcode needs a fix for several more pcornet_agetree.txt rows

@jklann , per our dicussion Monday, I checked and HIPAA requires obscuring age if greater than 89, so my original comment was wrong. However, while I was checking for that, I noticed some more bad c_dimcodes in pcornet_agetree.txt for v2.1.1 of ontology. (https://github.com/SCILHS/scilhs-ontology/tree/master/Ontology/ForUpgradingOnly/pcornet_agetree.txt)

Date query in c_dimcode reversed.
C_FULLNAME|C_OPERATOR|C_DIMCODE
\PCORI\ENCOUNTER\Age at visit>= 65 years old\80| BETWEEN|((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 81)-1) AND ((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 80)-1)

\PCORI\ENCOUNTER\Age at visit>= 65 years old\81| BETWEEN|((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 82)-1) AND ((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 81)-1)

Date query in c_dimcode reversed and wrong numbers
C_FULLNAME|C_OPERATOR|C_DIMCODE
\PCORI\ENCOUNTER\Age at visit>= 65 years old\89|BETWEEN|((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 83)-1) AND ((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 82)-1)

Please review and make corrections accordingly. Thank you!

Update RxNorm ontology to newest version.

Labs ontology: duplicate codes with same fullname

In the labs ontology, (LOINC:LP38332-0, LOINC:LP43019-6) have the same fullname!

About 1200 codes don't have pcori_ndc or pcori_rxnorm

About 1200 codes don't have pcori_ndc or pcori_rxnorm and thus are not transformed!

Hiddens and inactives cleanup

Hiddens and inactives could be confusing. Consider making hidden anything in the data model and inactive anything for a future release. Also consider releasing without hiddens.

SCILHS ontology meta data tables do not include indexes for labs

The script found at the link below does not include additional indexes for lab results. This is a minor issue being that lab result table is small
https://github.com/SCILHS/scilhs-ontology/blob/master/Scripts/SqlServer/create_sqlserver_metadata_tables.sql

Diagnosis ontology: PDX

Pcori_basecode is wrong - should be P/S, not 1/2

PCORI_BASECODE incorrect for some principal discharge diagnosis flags

The PCORI_BASECODE values are inconsistent with the [CDM v3 specification](http://www.pcornet.org/wp-content/uploads/2014/07/2015-07-29-PCORnet-Common-Data-Model-v3dot0-RELEASE.pdf] for "Principal", "Secondary", and "Unable to classify" for the principal discharge diagnosis flag %28PDX).

When running SCILHS/i2p-transform, I get values like '1' and '2' in my Diagnosis table rather than 'P' or 'S'.

The following ...

select c_fullname, c_name, c_basecode, pcori_basecode 
from "&&i2b2_meta_schema".pcornet_diag 
where c_fullname like '\PCORI_MOD\PDX\%'
and c_name in ('Principal', 'Secondary', 'Unable to classify');`

...yields

\PCORI_MOD\PDX\S\   Secondary   2   2
\PCORI_MOD\PDX\X\   Unable to classify  0   0
\PCORI_MOD\PDX\P\   Principal   DiagObs:PRIMARY_DX_YN   1

c_fullname in pcornet_lab fix. Connies issue #3.

In pcornet_lab c_fullname for Version row is actually "\PCORI\LAB_RESULT_CM\Version", so the setversion() function doesn't work on it, because it's case-sensitive (defined in ontology-utils-mssql.sql). The two solutions would be a) make the c_fullname consistent with other tables ('VERSION'), or b) adapt the ontology-utils-mssql script to disregard case when updating the Version c_name.

If these assumptions above are correct, we will go ahead and make the corrections on our side, not waiting for official corrections on GitHub, however, perhaps down the road we wish these could be made official, so we are consistent with other sites.

Also, could you please recap what the versions for each pcornet table is going to look like with the latest upgrade? Here's how we believe it should look in the end:

pcornet_demo
2.1.1
pcornet_diag
2.1.1
pcornet_enc
2.1.1
pcornet_enroll
2.1
pcornet_lab
2.1
pcornet_med
2.2
pcornet_proc
2.1
pcornet_vital
2.1

need to correct metadata for non-hisppanic in demographics

c_tablename should be concept_dimension and c_columnname should be concept_cd

'Hispanic' demographic fix. Connies issue #2.

The insert statement for '\PCORI\DEMOGRAPHIC\HISPANIC\R' has some probably erroneous values.
line 79: INSERT INTO [PCORI_Dev].[dbo].[pcornet_demo]([C_HLEVEL], [C_FULLNAME], [C_NAME], [C_SYNONYM_CD], [C_VISUALATTRIBUTES], [C_TOTALNUM], [C_BASECODE], [C_METADATAXML], [C_FACTTABLECOLUMN], [C_TABLENAME], [C_COLUMNNAME], [C_COLUMNDATATYPE], [C_OPERATOR], [C_DIMCODE], [C_COMMENT], [C_TOOLTIP], [M_APPLIED_PATH], [UPDATE_DATE], [DOWNLOAD_DATE], [IMPORT_DATE], [SOURCESYSTEM_CD], [VALUETYPE_CD], [M_EXCLUSION_CD], [C_PATH], [C_SYMBOL], [PCORI_BASECODE])
VALUES(3, '\PCORI\DEMOGRAPHIC\HISPANIC\R', 'Refuse to Answer', 'N', 'LAE', NULL, 'ETHNICITY:R', '', 'concept_cd', 'PATIENT_DIMENSION', 'RACE_CD', 'T', 'IN', '''07'',''r'',''refused''', '', 'Non-Hispanic', '@', '20140509 11:12:04.0', '20140509 11:12:04.0', '20140509 11:12:04.0', 'PCORNET_CDM', '', '', '\PCORI\DEMOGRAPHIC\HISPANIC', 'R', 'R')

Should that instead be :
INSERT INTO [PCORI_Dev].[dbo].[pcornet_demo]([C_HLEVEL], [C_FULLNAME], [C_NAME], [C_SYNONYM_CD], [C_VISUALATTRIBUTES], [C_TOTALNUM], [C_BASECODE], [C_METADATAXML], [C_FACTTABLECOLUMN], [C_TABLENAME], [C_COLUMNNAME], [C_COLUMNDATATYPE], [C_OPERATOR], [C_DIMCODE], [C_COMMENT], [C_TOOLTIP], [M_APPLIED_PATH], [UPDATE_DATE], [DOWNLOAD_DATE], [IMPORT_DATE], [SOURCESYSTEM_CD], [VALUETYPE_CD], [M_EXCLUSION_CD], [C_PATH], [C_SYMBOL], [PCORI_BASECODE])
VALUES(3, '\PCORI\DEMOGRAPHIC\HISPANIC\R', 'Refuse to Answer', 'N', 'LAE', NULL, 'ETHNICITY:R', '', 'PATIENT_NUM', 'PATIENT_DIMENSION', 'RACE_CD', 'T', 'IN', '''07'',''r'',''refused''', '', 'Refuse to Answer', '@', '20140509 11:12:04.0', '20140509 11:12:04.0', '20140509 11:12:04.0', 'PCORNET_CDM', '', '', '\PCORI\DEMOGRAPHIC\HISPANIC', 'R', 'R')

Encounter: Add Age at Visit tree

By request, add age at visit tree to SCILHS ontology.

Diagnosis ontology: ENCTYPE

ENCTYPE tree is in the diagnosis ontology; should be deleted.

Write propagation script for meds

Write propagation script for local children - pcori_rxnorm and pcori_ndc

Procedures: px_source needs a (default) value

PX_SOURCE should at least be getting a default value (even though it's an unsupported modifier at present)

Concept dimension: inactives in the diagnoses/ICD-10 tree should be included

Inactives in the diagnoses/ICD-10 tree should be included because they refer to ICD-9 codes. The updater script needs to be updated. Additionally, propagation of ICD-9 mappings to the interleaved tree needs to be supported.

Diagnosis 2.1.2: wacky dimcodes

Non-standard dimcodes snuck into diagnosis 2.1.2.

Medications 2.2: missing 78k NDC codes!

78,000 NDC codes that we'd previously mapped are missing in the newest medication ontology!

Labs ontology: HepC AB children have inconsistent metadata

Labs ontology: HepC AB children have inconsistent metadata. Some are quantitative, some are qualitative. It is a qualitative value.

pcornet_agetree weird c_dimcode for Age 24

@jklann , we noticed that inside https://github.com/SCILHS/scilhs-ontology/blob/master/Ontology/ForUpgradingOnly/pcornet_agetree.txt file, c_dimcode value for '\PCORI\ENCOUNTER\Age at visit\18-34 years old\24 years old' is not consistent with the other calculation formulas. For example, for 23 years old it's
((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 23)-1) AND ((select birth_date from PCORI_Dev.dbo.patient_dimension where patient_num = PCORI_Dev.dbo.visit_dimension.patient_num) + (365.25 * 24)-1)
But for 24 years old, it's:
getdate() - (365.25 *25) + 1 AND getdate() - (365.25 * 24) + 1

Could you please look into that and if necessary adjust the released file? Thank you!

Hispanic R refuse to answer missing from ontology

ICD-10 needs PCS and c_name fixes

ICD-10 ontology for procedures needs a different basecode prefix to differentiate it from diagnoses. Also the names have '@' signs strewn throughout, which is cluttering.

meds and labs in scope now? README out of date?

The README says:

What's not in the ontology?

Medications, labs, and patient-reported outcomes will be included in CDM v2.1

But I can see pcornet_lab and pcornet_med in MetadataFiles.

Labs: consider all LOINC codes listed by PCORI

Compare lab ontology to the upcoming LOINC code list that PCORI DRNOC is releasing soon.

Update ICD-10 diagnoses to be a crosswalk ontology.

And also make sure it's labeled ICD-10-CM, per Jim Campbell.

Enviromental exposure might need to be smoking & non-smoking both

Environmental exposure to tobacco should be outside the non-smoking folder

Update schemes table

schemes table is a few releases out of date

How are the "D~s1uh" parts of Diagnosis paths computed?

I'm thinking about adopting the SCILHS pcornet diagnosis hierarchy, but I'd like to retain our ability to update our ontologies directly from UMLs. How are the D~s1uh parts of Diagnosis paths computed?

<key>\\PCORI_DIAG\PCORI\DIAGNOSIS\09\(001-999.99) D~qlur\(460-519.99) D~s1uh\(480-488.99) P~xhpo\(486) Pneumoni~ido6\</key>
<dimcode>\PCORI\DIAGNOSIS\09\(001-999.99) D~qlur\(460-519.99) D~s1uh\(480-488.99) P~xhpo\(486) Pneumoni~ido6\</dimcode>
<tooltip>ICD9CM \ Diseases and injuries \ Diseases of the respiratory system \ Pneumonia and influenza \ Pneumonia, organism unspecified</tooltip>

Release new QT_Breakdowns table

QT_BREAKDOWN is not released with ontology, so stratified queries must be configured manually

Release new pcornet_med ontology with pcori_ndc and pcori_rxnorm

Release new pcornet_med ontology with pcori_ndc and pcori_rxnorm already populated, to obviate the need to run my complex script that fills these in. Perhaps also release a simpler propagation script that can pull the columns forward into local children.

Enrollment-based encounters should reference visit_dimension

This change is needed to follow the spec but also for speed

Labs ontology: possibly some invalid characters

From @psreeder...

On the lab file, look at 380-386. The all have the invalid characters. Here’s what it looks like in my Text Wrangler app on my mac. Could also be a mac issue.

Diagnosis: remove ICD-9 from ICD-10 tree?

There is a feeling that ICD-9 should be removed from the ICD-10 tree (there is still a separate ICD-9 tree). Please comment if you have feelings on this!

E-mail thread:

Shawn Murphy said:
Thinking [removal of combined tree] would make it less mapping dependent for this audience. Clearly it depends of the type of user we are trying to satisfy.

From the transforms point of view, it seems that repeating codes in various ontologies is causing problems.

Thanks,
Shawn.

From: "Weber, Griffin M" [email protected]

My opinion on this is:

Partners created the mixed ICD10 and ICD9 ontology to make queries easier for investigators. This is useful in a stand alone i2b2 instance. However, I don't think this is appropriate for a federated network. There is no official ICD10-ICD9 mapping. So, different institutions might handle this differently. It is also challenging to map data across institutions. Mixing ontologies makes this more complicated. My suggestion would be to have an ICD10 ontology and a separate ICD9 ontology in SCILHS. If they have to be mixed, then update the ETL to exclude ICD9 codes within the ICD10 ontology. This can probably be done using the c_fullname in combination with a regular expression on the pcori_basecode field.

HIV RNA tests in labs should be hidden

The folder is hidden but the leaves should be hidden as well to ensure compliance with regulatory law in MA

c_tooltip in pcornet_lab update. Connies Issue #1.

The pcornet_lab table upgrade, had this code:

line 47: update pcornet_lab set c_tooltip=replace(c_fullname,'Renal Function','Electrolytes') where c_fullname like '%\CREATININE%'

Should that instead be :
update pcornet_lab set c_tooltip=replace(c_tooltip,'Renal function','Electrolytes') where c_fullname like '%\CREATININE%'

We ran the original code, but while reviewing that just didn't make much sense to replace c_tooltip with data from c_fullname, because now c_tooltip looked different from other c_tooltip values (and none of the c_fullname values contained 'Renal Function', and also currently c_tooltip has lower-case 'function', so the intended replacement didn't occur anyway.