The xmeso-termextractor's intro from frdougal

The xmeso-TermExtractor code builds a dictionary of terms for use by Ruta from i2b2's ontology. The code interfaces with the Unified Medical Language System (UMLS) API

https://documentation.uts.nlm.nih.gov/

** Pre-requisites **

-- dbconf.py -- The code contains a dbconf_example.py file. This file is an example. Please copy it and rename it dbconf.py. In dbconf.py, you need to set the following values:

Oracle connection information (username, password, etc.) to connect to your i2b2 metadata schema
UMLS api key, username and password. To apply for a UMLS account, please see this link:

https://documentation.uts.nlm.nih.gov/rest/authentication.html

outputDir the directory where the completed

-- Python pre-requisites -- You must install the following python modules: PyQuery cx_Oracle urllib2 requests

-- Running the code -- Use python to execute the main method in TermExtractor.py The output will contain different messages depending on the codes extracted from i2b2.

"Found new category. Loading terms for category: XXXX" this indicates the code found a new category of i2b2 terms to load.

"Could not find prefix for basecode: XXXX" this message is caused by some i2b2 terms with basecodes not found in UMLS (ex: PATH|IMMUNH:UN)

"Error retrieving synonyms for concept [XXXXX]. Error: concept" this message is caused by i2b2 terms with basecodes not found in UMLS (see above).

"Error retrieving extractSeedConcept: 'NoneType' object has no attribute 'group'" this indicates that the code is attempting to process a top-level node (ex: /SURGICAL MARGINS/). These i2b2 items do not need to be searched.

The code generate several lines when it outputs the data to files (where XXXX is the "category"):

Opening files for output. Opening file: /Users/chb69/borromeocd/nmvb/nlp_work/ruta_XXXX.txt Closing file ... Done.

The code performs the following steps:

Execute a SQL query to retrieve the term list from i2b2's ontology (the metadata schema)
For each item in the i2b2 term list, add it to the Ruta dictionary.
For each item in the i2b2 term list, determine its "category".
For each item in the i2b2 term list, find all synonyms from UMLS. Add the UMLS synonyms to the Ruta dictionary.
After all the items in the i2b2 term list are finished, dump the data into separate semi-colon delimited files for Ruta. Create one file for each "category".

frdougal / xmeso-termextractor Goto Github PK

xmeso-termextractor's Introduction

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent