This project creates a knowledge graph version of (a subset of) the RXNORM and the FDA databases.
The graphical representation of the processing steps appears at: https://github.com/vlalcsci/RXNorm_FDA_Triples/blob/main/rxnorm_fda_kg/architecture/architecture_diagram.JPG
An annotated notebook implementing the steps below is at https://github.com/vlalcsci/RXNorm_FDA_Triples/blob/main/rxnorm_fda_kg/notebooks/generate_triples_rxnorm_fda.ipynb
RXNORM:
- Use this DATABASE creation automation script from RXNORM Technical Documentation: https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html#s13_0
- The RXNORM database files are available at: https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html
- Convert the Resultant SQL Files to get required CSV Files
FDA:
- Use this JSON from OPENFDA Documentation to get required - https://api.fda.gov/download.json
- Get the Required FDA files from: fda_json["results"]["drug"]["ndc"], fda_json["results"]["drug"]["label"], fda_json["results"]["drug"]["drugsfda"] & fda_json["results"]["drug"]["enforcement"]
The csv files for RXNORM and json files for FDA used in this project are at: https://drive.google.com/drive/folders/1oB-_gNqc29ZplYAQ5MlWyCRX2hXmR5y4
RXNCONSO table provides the following information:
- RXNorm General Information: rxcui (RXNormID), name (label), tty (description), language (lat) and suppress (Y/N)
- RXNorm Synonym Information: synonym (alias)
- RXNorm Related Identifier Information: such as MSH, DRUGBANK and SNOMEDCT etc.
RXNCONSO Headers for Reference:
RXCUI,LAT,TS,LUI,STT,SUI,ISPREF,RXAUI,SAUI,SCUI,SDUI,SAB,TTY,CODE,STR,SRL,SUPPRESS,CVF
RXNorm General Information:
Example for Entresto sample record:
1656341,ENG,,,,,,7249807,7249807.0,1656341,,RXNORM,BN,1656341,Entresto,,N,4096.0
Subject: RXCUI (1656341)
Predicates & Object: [Read as Predicate: Source Header (Object Value)]
- label: STR (Entresto)
- description: TTY (BN)
- language: LAT (ENG)
- suppress: SUPPRESS (N)
- rxcui: CODE (1656341)
RXNorm Synonym Information:
Example for Entresto sample record:
3 entries- One Regular Information for Term Type SBD followed by 2 synonyms- PSN (Prescribable Name) and SY (Synonym)
Regular Record:
1656346,ENG,,,,,,7249812,7249812.0,1656346,,RXNORM,SBD,1656346,sacubitril 24 MG / valsartan 26 MG Oral Tablet [Entresto],,N,4096.0
2 Synonym Records:
1656346,ENG,,,,,,7249813,7249813.0,1656346,,RXNORM,PSN,1656346,Entresto 24 MG / 26 MG Oral Tablet,,N,4096.0
1656346,ENG,,,,,,7249814,7249814.0,1656346,,RXNORM,SY,1656346,Entresto (sacubitril 24 MG / valsartan 26 MG) Oral Tablet,,N,4096.0
Subject: RXCUI (1656346)
Predicates & object:
- alias: STR (Entresto 24 MG / 26 MG Oral Tablet)
- alias: STR (Entresto (sacubitril 24 MG / valsartan 26 MG) Oral Tablet)
RXNorm Identifier Information:
Example for Entresto sample record:
2 entries- One for MMSL and other for MSH:
1656341,ENG,,,,,,7255921,,,,MMSL,BN,234762,Entresto,,N,
1656341,ENG,,,,,,8138471,,M000614616,C549068,MSH,PCE,C549068,entresto,,N,
Subject: RXCUI (1656346)
Predicates & Object:
- MMSL: CODE (234762)
- MSH: CODE (C549068)
RXNREL table provides the following information:
- RXNorm Relation Information: has_tradename, ingredient_of etc.
RXNREL Headers for Reference:
RXCUI1,RXAUI1,STYPE1,REL,RXCUI2,RXAUI2,STYPE2,RELA,RUI,SRUI,SAB,SL,RG,DIR,SUPPRESS,CVF
Note: Relationship is what RXCUI2 HAS TO RXCUI1
RXNorm Relationship Information:
Example for Entresto sample record:
1656355.0,,CUI,RO,1656341.0,,CUI,ingredient_of,86154613.0,,RXNORM,,,,,4096.0
1656346.0,,CUI,RO,1656341.0,,CUI,ingredient_of,86154533.0,,RXNORM,,,,,4096.0
1656328.0,,CUI,RN,1656341.0,,CUI,tradename_of,86154502.0,,RXNORM,,,,,4096.0
Subject: RXCUI2 (1656341)
Predicates & Object: [Read as Predicate: Source Header (Object Value)]
- ingredient_of: RELA (1656355)
- ingredient_of: RELA (1656346)
- tradename_of: RELA (1656328)
RXNSAT table provides the following information:
- RXNorm Strength Information: RXN_STRENGTH, RXN_AVAILABLE STRENGTH
- NDC Code Information: NDC11 Code, NDC 2 Segment, NDC 3 Segment, SPL_SET_ID, DrugsFDA Application Number
- UMLS Code Information: UMLSCUI, UMLSAUI
Note: NDC Code Information Identifiers provide the Link to FDA
RXNSAT Headers for Reference:
RXCUI,LUI,SUI,RXAUI,STYPE,CODE,ATUI,SATUI,ATN,SAB,ATV,SUPPRESS,CVF
RXNorm Strength Information:
Example for Entresto sample record:
1656340,,,7249806,AUI,1656340,,,RXN_AVAILABLE_STRENGTH,RXNORM,24 MG / 26 MG,N,4096.0
Subject: RXCUI (1656340)
Predicates & Object: [Read as Predicate: Source Header (Object Value)]
- RXN_AVAILABLE_STRENGTH: ATV (24 MG / 26 MG)
NDC Code Information:
Example for NDC Code:
1305100,,,12332251,AUI,1305100,,,NDC,RXNORM,75142000109,N,4096.0
1305100,,,12332251,AUI,1305100,,,NDC,RXNORM,52687000201,N,4096.0
1305100,,,12387798,AUI,50563-195,,,NDC,MTHSPL,50563-195-08,N,4096.0
1305100,,,12374233,AUI,75556-001,,,NDC,MTHSPL,75556-001-05,N,4096.0
Subject: RXCUI (1305100)
Predicates & object:
- NDC11: ATV (75142000109)
- NDC11: ATV (52687000201)
- NDC 3 Segment: ATV (50563-195-08)
- NDC 3 Segment: ATV (75556-001-05)
- NDC 2 Segment: ATV (50563-195)
- NDC 2 Segment: ATV (75556-001)
Example for SPL_SET_ID Code:
1305100,,,12388790,AUI,76861-001,,,SPL_SET_ID,MTHSPL,a4f6c932-fe40-7226-e053-2a95a90a2205,N,4096.0
Subject: RXCUI (1305100)
Predicates & object:
- SPL_SET_ID: ATV (a4f6c932-fe40-7226-e053-2a95a90a2205)
Example for Application Number:
995253,,,12387879,AUI,55700-860,,,ANDA,MTHSPL,ANDA040156,N,4096.0
Subject: RXCUI (995253)
Predicates & object:
- ANDA: ATV (ANDA040156)
UMLS Code Information:
Example for Entresto sample record:
2 entries- One for UMLSCUI and other for UMLSAUI:
1656341,,,7249807,AUI,1656341,,,UMLSCUI,RXNORM,C4033616,,4096.0
1656341,,,7255921,AUI,234762,,,UMLSAUI,RXNORM,A24842892,,
Subject: RXCUI (1656341)
Predicates & Object:
- UMLSCUI: ATV (C4033616)
- UMLSAUI: ATV (A24842892)
- Identifier Source List
- RXNORM Relationship Types
- RXNorm Term Type Dictionary
- Predicates in Wikidata Dictionary
Create Functions to:
Generate Query for RXNorm Identifier
Generate Results from the Query
Note: Wikidata SPARQL Endpoint is used to get Full Coverage. To Remove API Dependency, Wikidata Dump must be used
- Get the results from function created
- Create Dictionary qnode_dict_inwiki
- Write the results for P3345 [RXNORMID] to RXNorm QRXNode_PNode file
- Get the RXNORMIDs from Intermediate Triples [using results from Step 1]
- Create Dictionary qnode_dict_notinwiki by checking if RXNORMID is in Wikidata or not [using results from Step 2]
- Assign QRXNode to RXNormIDs NOT in Wikidata
- Add P31 as 'Pharmaceutical Product' for this QRXNode.
- Write the results to QRXNode_PNode file
Uses NDC Code information from RXNORM intermediate Triples generated in Step 1C:
- NDC 2 Segment: Used to link Predicates in Drug-NDC source
- NDC 3 Segment: Used to link Predicates in Drug-NDC and Drug-Enforcement Source
- SPL_SET_ID: Used to link Predicates in Drug-NDC and Drug-Label source
- Application Number: Used to link Predicates in Drug-Drugs@FDA source
- Load the data present in JSON Format. There is 1 file for this source
- For Drug-NDC, we get information at 3 levels: NDC 2 Segment, NDC 3 Segment and SPL_SET_ID
For FDA-NDC, we get predicates for Product NDC Code [NDC 2 segment] E.g. marketing_start_date, product_type, marketing_category etc.
For FDA-NDC, we also get predicates for OpenFDA attributes which uses [SPL_SET_ID] E.g. is_original_packager, manufacturer_name, unii etc.
For FDA-NDC, we also get predicates for Packaging attributes which uses Package_NDC_Code [NDC 3 Segment] E.g. marketing_start_date, sample, description etc. - For Active Ingredients, we also get Qualifiers for Strength
- Write the results to 3 Intermediate Triple Files: fda_triples_product_ndc, fda_triples_package_ndc and fda_triples_spl_ndc
- Load the data present in JSON Format. There are 9 file for this source
- For Drug-Label, we get information at 1 levels: SPL_SET_ID
For FDA-Labeling, we get predicates at SPL_SET_ID level E.G package_label_principal_display_panel, pregnancy, pharmacokinetics, drug_interactions etc.
- Write the results to 1 Intermediate Triple Files: fda_triples_spl_label
- Load the data present in JSON Format. There is 1 file for this source
- For Drug-Drugs@FDA, we get information at 1 level: Application Number
For FDA-Drugs@FDA, we get predicates for Application Number: Openfda related predicates, sponsor_name, products information and submissions information
- For Products Information and Submissions Information, we also get a set of Related Qualifiers
- Write the results to 1 Intermediate Triple Files: fda_triples_application_drugsfda
- Load the data present in JSON Format. There is 1 file for this source
- For Drug-Drugs@FDA, we get information at 1 level: Package-NDC Code which is extracted from the Product Description
For FDA-Drugs@FDA, we get predicates for Package-NDC: Recall, Location, Reason for Recall, Event_id
- Write the results to 1 Intermediate Triple Files: fda_triples_package_enforcement
Uses the RXNORM Intermediate Triples File generated in Step 1
Uses the 2 dictionaries created in Step 2 and Step 3:
qnode_dict_inwiki: RXNormIDs with QNODE in Wikidata
qnode_dict_notinwiki: RXNormIDs with QRXNODE NOT in Wikidata
- Get the data in required KGTK Format
- Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_RXNORM
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_RXNORM
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_RXNORM
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_RXNORM
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Segregate and Get the data in required KGTK Format for Edges and DataType using the Predicates NOT in Wikidata dictionary
- Dump the Output in 3 files [Naming convention is as follows]:
- Predicates NOT in Wikidata: PRXNODE_RXNORM [For Reference Only]
- Predicates NOT in Wikidata Edges: PRXNODE_Edges_RXNORM
- Predicates NOT in Wikidata DataType: PRXNODE_DataType_RXNORM
- Write the results to these 3 KGTK Triples Files
Uses the FDA Intermediate Triple files generated in Step 4
Uses the 2 dictionaries created in Step 2 and Step 3:
qnode_dict_inwiki: RXNormIDs with QNODE in Wikidata
qnode_dict_notinwiki: RXNormIDs with QRXNODE NOT in Wikidata
- Uses the fda_product_ndc Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Uses the fda_package_ndc Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Uses the fda_spl_ndc Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Uses the fda_spl_labeling Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Uses the fda_application_drugsfda Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Uses the fda_package_enforcment Intermediate Triple files generated in Step 4
- Get the data in required KGTK Format
- Handle the 4 cases to Dump the Output in 4 files [Naming convention is as follows]:
- Subject in Wikidata, Predicate in Wikidata: QNODE_PNODE_FDA
- Subject in Wikidata, Predicate NOT in Wikidata: QNODE_PRXNODE_FDA
- Subject NOT in Wikidata, Predicate in Wikidata: QRXNODE_PNODE_FDA
- Subject NOT in Wikidata, Predicate NOT in Wikidata: QRXNODE_PRXNODE_FDA
- Create the Predicates NOT in Wikidata dictionary
- Write the results to these 4 KGTK Triples Files
- Segregate and Get the data in required KGTK Format for Edges and DataType using the Predicates NOT in Wikidata dictionary
- Dump the Output in 3 files [Naming convention is as follows]:
- Predicates NOT in Wikidata: PRXNODE_FDA [For Reference Only]
- Predicates NOT in Wikidata Edges: PRXNODE_Edges_FDA
- Predicates NOT in Wikidata DataType: PRXNODE_DataType_FDA
- Write the results to these 3 KGTK Triples Files
- Perform KGTK Compact Transformation for RXNorm KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Compact Transformation for FDA KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Compact Transformation for RXNorm KGTK triples PROPERTIES:
1 File- PRXNode - Perform KGTK Compact Transformation for FDA KGTK triples PROPERTIES:
1 File- PRXNode
- Perform KGTK ADD-ID Transformation for RXNorm KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK ADD-ID Transformation for FDA KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK ADD-ID Transformation for RXNorm KGTK triples PROPERTIES:
1 File- PRXNode - Perform KGTK ADD-ID Transformation for FDA KGTK triples PROPERTIES:
1 File- PRXNode
- Perform KGTK Validate for RXNorm KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Validate for FDA KGTK triples NODES:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Validate for RXNorm KGTK triples PROPERTIES:
1 File- PRXNode - Perform KGTK Validate for FDA KGTK triples PROPERTIES:
1 File- PRXNode
Uses the FDA and RXNORM KGTK Triples with IDs [created in Step 7] to generate Qualifier Edges
- Generate KGTK Triples for Qualifiers Related to Active Ingredients in Drug-NDC, Products information in Drugs@FDA and Submissions Information in Drugs@FDA
- Get the qualifiers for 2 files: QNODE_PRXNODE_QUALIFIER and QRXNODE_PRXNODE_QUALIFIER
- Write the results to these 2 files
- Generate KGTK Triples for Qualifiers Related to RXNORM Identifiers
- Get the qualifiers for 4 files: QNODE_PRXNODE_QUALIFIER, QRXNODE_PRXNODE_QUALIFIER, QNODE_PNODE_QUALIFIER, QRXNODE_PNODE_QUALIFIER
- Write the results to these 4 files
- Generate KGTK Triples for Properties Related to FDA Properties
- Get the properties:
PRXNODE_QUALIFIER_FDA
PRXNODE_edges_QUALIFIER_FDA
PRXNODE_datatype_QUALIFIER_FDA - Write the results to these 3 files
- Perform KGTK Compact Transformation for RXNORM Qualifiers
- Perform KGTK Compact Transformation for FDA Qualifiers
- Perform KGTK Compact Transformation for FDA Properties related to Qualifiers
- Perform KGTK ADD-ID Transformation for RXNORM Qualifiers
- Perform KGTK ADD-ID Transformation for FDA Qualifiers
- Perform KGTK ADD-ID Transformation for FDA Properties related to Qualifiers
- Perform KGTK Validate for RXNORM Qualifiers
- Perform KGTK Validate Transformation for FDA Qualifiers
- Perform KGTK Validate Transformation for FDA Properties related to Qualifiers
- Perform KGTK Concatenate Transformation for RXNorm KGTK triples- NODES edges:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Concatenate Transformation for RXNorm KGTK triples- QUALIFIERS edges:
4 Files- QRXNode_PNode_Qualifier, QRXNode_PRXNode_Qualifier, QNode_PNode_Qualifier, QNode_PRXNode_Qualifier - Perform KGTK Concatenate Transformation for RXNorm KGTK triples- PROPERTIES edges:
1 Files- PRXNode_edges - Perform KGTK Concatenate Transformation for RXNorm KGTK triples- PPROPERTIES datatype:
1 Files- PRXNode_datatype
- Perform KGTK Concatenate Transformation for FDA KGTK triples- NODES edges:
4 Files- QRXNode_PNode, QRXNode_PRXNode, QNode_PNode, QNode_PRXNode - Perform KGTK Concatenate Transformation for FDA KGTK triples- QUALIFIERS edges:
4 Files- QRXNode_PNode_Qualifier, QRXNode_PRXNode_Qualifier, QNode_PNode_Qualifier, QNode_PRXNode_Qualifier - Perform KGTK Concatenate Transformation for FDA KGTK triples- PROPERTIES edges:
2 Files- PRXNode_edges, PRXNode_edges_Qualifier - Perform KGTK Concatenate Transformation for FDA KGTK triples- PPROPERTIES datatype:
2 Files- PRXNode_datatype, PRXNode_datatype_Qualifier
- Perform KGTK Concatenate Transformation for RXNROM KGTK triples- ALL edges:
3 Files- NODES_EDGES, PROPERTIES_EDGES and QUALFIERS_EDGES - Perform KGTK Concatenate Transformation for FDA KGTK triples- ALL edges:
3 Files- NODES_EDGES, PROPERTIES_EDGES and QUALFIERS_EDGES
- Perform KGTK Concatenate Transformation for ALL edges:
2 Files- ALLEDGES_RXNORM, ALLEDGES_FDA - Perform KGTK Concatenate Transformation for ALL Property DataType:
2 Files- PRXNODE_DATATYPE_RXNORM, PRXNODE_DATATYPE_FDA