Git Product home page Git Product logo

ctakes-client-py's Introduction

Purpose: Extract Medical Concepts from Physician Notes

This package simplifies communication with cTAKES NLP servers which produce matches with UMLS Concepts.

  • Clinical Text and Knowledge Extraction System (cTAKES)
  • Unified Medical Language System (UMLS)

Quickstart

Clinical text fragment or entire physician note.

physician_note = 'Chief Complaint: Patient c/o cough, denies fever, recent COVID test negative. Denies smoking.'
output = await ctakesclient.client.post(physician_note)

Note that ctakesclient uses an async API. If your code is not async, you can simply wrap calls in asyncio.run():

output = asyncio.run(ctakesclient.client.post(physician_note))

Output

This client parses responses into lists of MatchText and UmlsConcept.

CtakesJSON(output)

list_match() -> List[MatchText]

list_concept() -> List[UmlsConcept]

list_sign_symptom() -> List[MatchText]

list_disease_disorder() -> List[MatchText]

list_medication() -> List[MatchText]

list_procedure() -> List[MatchText]

list_anatomical_site() -> List[MatchText]

MatchText: Physician Notes

MatchText(s) are the character positions in the physician note where a UmlsConcept was found.

MatchText::= begin end text polarity UmlsConcept+

MatchText: Polarity

UMLS Concept

UMLS Vocabulary

UMLS Semantic Types and Groups

You can browse the list of UMLS Semantic Types at the National Library of Medicine.

ctakes-client-py's People

Contributors

mikix avatar comorbidity avatar dogversioning avatar

Stargazers

 avatar  avatar Patrick Alba  avatar  avatar

Watchers

Tim Miller avatar  avatar

Forkers

sudarsun

ctakes-client-py's Issues

Some ICD10 codes should be added to COVID SYMPTOMS dictionary BSV

Per review with ED doctors Amy Zipursky and Alon Geva, we expanded the set of ICD10 codes to make for fair comparison to ICD10. Note that cTAKES parsing of period "." means this will not vastly increase performance and in most cases these ICD10 codes are not present in the original text.

Feature request: helper methods for testing Span overlaps

def overlaps(span1: Span, span2: Span) -> set:
"""
:param span1: 1st text Span
:param span2: 2nd text Span
:return: set of CHAR positions (convertible to range or Span)
"""
range1 = range(span1.begin, span1.end)
range2 = range(span2.begin, span2.end)
return set(range1).intersection(set(range2))

def overlaps(span1: Span, span2: Span, min_length=2, max_length=20) -> bool:
"""
True/False text overlap exists between two spans of 'highlighted' text.

:param span1: 1st text Span
:param span2: 2nd text Span
:param min_length: MIN length of comparison, default 2 chars
:param max_length: MAX length of comparison, default 20 chars (or equals)
:return: true/false the two spans overlap
"""
shared = intersect(span1, span2)
if len(shared) == len(range(span1.begin, span1.end)):
    return True
elif (len(shared) >= min_length) and (len(shared) <= max_length):
    return True
else:
    return False

scripts/polarity-diff-report should allow user to filter by UMLS Semantic Type Mention(s)

Suggested Examples with and without filtering of one or more semantic types
see also:
ctakesclient.typesystem.UmlsTypeMention(Enum)

./scripts/polarity-diff-report show MyNoteCohort.ndjson
./scripts/polarity-diff-report show MyNoteCohort.ndjson --mention=SignSymptomMention
./scripts/polarity-diff-report show MyNoteCohort.ndjson --mention=SignSymptomMention,DiseaseDisorderMention

Directory to CACHE ctakes JSON results should be $env variable

default: no cache.

`def cache_ctakes(physician_note: str) -> CtakesJSON:
"""
Write through cache -- this probably belongs in cTAKES.
@param physician_note: optionally cleaned, will call clean_text(...)
@return: ctakes response from cache or lazy-loaded
"""
path = _target_filename(physician_note) # currently lives in cumulus-etl, uses sha256 to generate "key" for JSON result

if os.path.exists(path):
    return CtakesJSON(common.read_json(path))
else:
    dir_folder(path)
    res = ctakesclient.client.extract(physician_note)
    common.write_json(path, res.as_json())
    return res

`

duplicate concepts written to text2fhir coding statement

example: Appearance of rash consistent with erythema migrans. Given recent travel in Lyme disease endemic area, will treat with doxycycline 50mg BID x 2 weeks.

includes response
"medicationCodeableConcept": { "coding": [ { "code": "10504007", "system": "http://snomed.info/sct" }, { "code": "C0013090", "system": "http://terminology.hl7.org/CodeSystem/umls" }, { "code": "3640", "system": "http://www.nlm.nih.gov/research/umls/rxnorm" }, { "code": "C0013090", "system": "http://terminology.hl7.org/CodeSystem/umls" }, { "code": "372478003", "system": "http://snomed.info/sct" }, { "code": "C0013090", "system": "http://terminology.hl7.org/CodeSystem/umls" } ], "text": "doxycycline"

Allow CI tests to run even the client/server tests

We'll need to add a mock server or add some test framework setup (mock is my gut preference right now, to make it easier to start hacking on this client, and to make it easier to focus on testing just this python wrapper - easier to create error conditions and the like).

add README for Text2FHIR

Text2FHIR should have documentation for
FHIR DocumentReference
FHIR Observation
FHIR Condition
FHIR Procedure
FHIR MedicationStatement

Fix integration tests

I don't know when they broke, but many of the tests are failing right now.

Reminder that to run the tests, you need a running cTAKES, a running negation transformer (at port 8000), and a running termexists transformer (at port 8001).

add support for MedSpacy compatible endpoint

MedSpacy
{'IdentifiedAnnotation': [{'begin': 4, 'conceptAttributes': [{'cui': 'COVID-19', 'tui': ['COVID']}], 'end': 24, 'polarity': 1, 'text': 'patient has COVID-19'}, {'begin': 31, 'conceptAttributes': [{'cui': 'C0242429', 'tui': ['T184']}], 'end': 42, 'polarity': 1, 'text': 'sore throat'}]}

Ctakes JSON
{'AnatomicalSiteMention': [{'begin': 36, 'conceptAttributes': [{'code': '49928004', 'codingScheme': 'SNOMEDCT_US', 'cui': 'C0230069', 'tui': 'T029'}], 'end': 42, 'polarity': 0, 'text': 'throat', 'type': 'AnatomicalSiteMention'}], 'DiseaseDisorderMention': [{'begin': 16, 'conceptAttributes': [{'code': '840539006', 'codingScheme': 'SNOMEDCT_US', 'cui': 'C5203670', 'tui': 'T047'}], 'end': 24, 'polarity': 0, 'text': 'COVID-19', 'type': 'DiseaseDisorderMention'}], 'SignSymptomMention': [{'begin': 31, 'conceptAttributes': [{'code': '267102003', 'codingScheme': 'SNOMEDCT_US', 'cui': 'C0242429', 'tui': 'T184'}, {'code': '162397003', 'codingScheme': 'SNOMEDCT_US', 'cui': 'C0242429', 'tui': 'T184'}], 'end': 42, 'polarity': 0, 'text': 'sore throat', 'type': 'SignSymptomMention'}, {'begin': 31, 'conceptAttributes': [{'code': 'n/a', 'codingScheme': 'custom', 'cui': 'C0242429', 'tui': 'T184'}], 'end': 42, 'polarity': 0, 'text': 'sore throat', 'type': 'SignSymptomMention'}]}

Some ICD10 codes missing precision due to sentence splitting "." between codes

Affects only consumers of this dictionary that want to know which ICD10 codes are in the set.
CTAKES sentence sometimes splitting truncates ICD10 codes like R05.1 as R05 and .1, therefore only R05 was included in the COVID symptoms BSV. These are being readded into the BSV for documentation purposes.

Cough | R05.1 | Acute Cough
Loss of taste or smell (Anosmia)| R43 | Disturbances of smell and taste

This was most prominent in SOB/Dyspnea:

Dyspnea R06.00 Dyspnea,unspecified Dyspnea R06.01 Orthopnea Dyspnea R06.02 Shortness of breath Dyspnea R06.03 Acute respiratory distress Dyspnea R06.09 Other forms of dyspnea

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.