Git Product home page Git Product logo

xmeso-termextractor's Introduction

The xmeso-TermExtractor code builds a dictionary of terms for use by Ruta from i2b2's ontology. The code interfaces with the Unified Medical Language System (UMLS) API

https://documentation.uts.nlm.nih.gov/

** Pre-requisites **

-- dbconf.py -- The code contains a dbconf_example.py file. This file is an example. Please copy it and rename it dbconf.py. In dbconf.py, you need to set the following values:

  • Oracle connection information (username, password, etc.) to connect to your i2b2 metadata schema
  • UMLS api key, username and password. To apply for a UMLS account, please see this link:

https://documentation.uts.nlm.nih.gov/rest/authentication.html

  • outputDir the directory where the completed

-- Python pre-requisites -- You must install the following python modules: PyQuery cx_Oracle urllib2 requests

-- Running the code -- Use python to execute the main method in TermExtractor.py The output will contain different messages depending on the codes extracted from i2b2.

"Found new category. Loading terms for category: XXXX" this indicates the code found a new category of i2b2 terms to load.

"Could not find prefix for basecode: XXXX" this message is caused by some i2b2 terms with basecodes not found in UMLS (ex: PATH|IMMUNH:UN)

"Error retrieving synonyms for concept [XXXXX]. Error: concept" this message is caused by i2b2 terms with basecodes not found in UMLS (see above).

"Error retrieving extractSeedConcept: 'NoneType' object has no attribute 'group'" this indicates that the code is attempting to process a top-level node (ex: /SURGICAL MARGINS/). These i2b2 items do not need to be searched.

The code generate several lines when it outputs the data to files (where XXXX is the "category"):

Opening files for output. Opening file: /Users/chb69/borromeocd/nmvb/nlp_work/ruta_XXXX.txt Closing file ... Done.

The code performs the following steps:

  1. Execute a SQL query to retrieve the term list from i2b2's ontology (the metadata schema)
  2. For each item in the i2b2 term list, add it to the Ruta dictionary.
  3. For each item in the i2b2 term list, determine its "category".
  4. For each item in the i2b2 term list, find all synonyms from UMLS. Add the UMLS synonyms to the Ruta dictionary.
  5. After all the items in the i2b2 term list are finished, dump the data into separate semi-colon delimited files for Ruta. Create one file for each "category".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.