Git Product home page Git Product logo

oncologywg's People

Contributors

ansi90 avatar crabcakeworld avatar cukarthik avatar dsonnetiqvia avatar fdefalco avatar github-actions[bot] avatar golozara avatar jam560 avatar marchakov-ody avatar meerapatelmd avatar mgurley avatar rimusia avatar rtmill avatar sratwani avatar tfalcs avatar xj2193 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oncologywg's Issues

Provide to Hemonc.org examples of timestamped begin/end date concepts and concept relationship assertions.

  • Example 1: Name change examples stable identifier stays stable (ICD10CM).
    • Reason: Giving more explicit definition of a disorder. In this case we just keep the last version of the name.
    • Revise from: Panic disorder without agoraphobia
    • Revise to: Panic disorder [episodic paroxysmal anxiety]
    • Reason: Typo
    • Revise from: Trichiasis without entropian
    • Revise to: Trichiasis without entropion

Before:

concept_id concept_name concept_code valid_start_date valid_end_date invalid_reason
1 Panic disorder without agoraphobia F41.0 1/1/2007 12/31/2099
2 Trichiasis without entropian H02.05 1/1/2012 12/31/2099

After:

concept_id concept_name concept_code valid_start_date valid_end_date invalid_reason
1 Panic disorder [episodic paroxysmal anxiety] F41.0 1/1/2007 12/31/2099
2 Trichiasis without entropion H02.05 1/1/2012 12/31/2099
  • Example 2: Concept deprecation (ICD10CM).
    • Reason: The concepts created by mistake. There're two phalangs only in thumb, so "proximal or distal interphalangeal joint" localization is applied to other fingers but not a thumb. In this case we put the date of concept withdrawal and mark invalid_reason as "D" (deprecated).
    • Deprecation: Subluxation and dislocation of proximal interphalangeal joint of thumb
    • Deprecation: Subluxation and dislocation of distal interphalangeal joint of thumb

Before:

concept_id concept_name concept_code valid_start_date valid_end_date invalid_reason
3 Subluxation and dislocation of proximal interphalangeal joint of thumb S63.13 1/1/2012 12/31/2099
4 Subluxation and dislocation of distal interphalangeal joint of thumb S63.14 1/1/2012 12/31/2099

After:

concept_id concept_name concept_code valid_start_date valid_end_date invalid_reason
3 Subluxation and dislocation of proximal interphalangeal joint of thumb S63.13 1/1/2012 9/30/2017 D
4 Subluxation and dislocation of distal interphalangeal joint of thumb S63.14 1/1/2012 9/30/2017 D
  • Example 3: Relationship assertion between two concepts retired.
    • Reason: Better relationship found (it relates now "Laryngeal structure" and "Neoplasm of uncertain behavior" respectively). We put the date of relationship withdrawal and mark invalid_reason as "D" (deprecated)
    • Deprecation: 22274(Neoplasm of uncertain behavior of larynx) Has finding site 4240671 (Anatomical structure)

Before:

concept_id_1 concept_id_2 relationship_id valid_start_date valid_end_date invalid_reason
22274(Neoplasm of uncertain behavior of larynx) 4240671 (Anatomical structure) Has finding site 7/31/2011 12/31/2099

After:

concept_id_1 concept_id_2 relationship_id valid_start_date valid_end_date invalid_reason
22274(Neoplasm of uncertain behavior of larynx) 4240671 (Anatomical structure) Has finding site 7/31/2011 1/30/2013 D
22274(Neoplasm of uncertain behavior of larynx) 4262229 (Laryngeal structure) Has finding site 1/31/2013 12/31/2099

Add 'Registry' type concept to all domains.

concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
next concept id Tumor Registry Type Concept Meas Type Meas Type S OMOP generated 1970-01-01 2099-12-31

Add formal ontology properties to OROT.

We are planning on ingesting the Observational Research in Oncology Toolbox(OROT) into the OMOP vocabulary tables to provide a bridge between the low-level codes used in observational sources (EHRs/claims databases) to higher-level Procedure/Treatment concepts to be placed in the episode_object_concept_id column in the EPISODE table for 'Treatment Regimen '/'Treatment Cycle' episodes. See here: https://seer.cancer.gov/oncologytoolbox/

We are asking NCI/SEER to add some formal ontology properties to OROT to facilitate the ongoing ingestion of OROT into the OMOP tables.

Formal Ontology Properties

  • Use RxNorm codes as an anchor code for each entry.
  • Use a stable identifier for classification concepts ('SEER*Rx Category', 'Major Drug Class' and 'Minor Drug Class').
  • Timestamp the retirement of classification concepts (valid_start_date, valid_end_date).
  • Distribute the OROT classification concepts in a separate file from the classification assertions.
  • Timestamp the retirement of classification assertions. For example, if a HCPCS/NDC/RxNorm code moves from 'Chemotherapy' to 'Immunotherapy' in the SEER*Rx Category, please update the valid_end_date for the existing classification assertion and add another row with the new classification assertion with a valid_start_date.

Create documentation to support the Oncology Extension Proposal.

This documentation will be the basis for a public facing website and content for tutorials.

  • Describe overall description and mission of the project.
  • Describe major entities and concepts.
  • Instructions for implementing the oncology extension.
  • Include use cases on public facing website.

Decide on convention whether linking between EPISODE and clinical events should happen at all levels or only at the leaf.

At the CDM and Vocabulary Development Working Group meeting on 12/4/2018, the issue was raised whether the linkages between the EPISODE table and clinical event tables should be made at all levels of an episode hierarchy or only at the leaf. A convention should be decided upon and recommended in the EPISODE_EVENT table conventions documentation. Options:

  1. Only link at the leaf level of an episode hierarchy.
    For example, if an episode hierarchy includes a 'Treatment Regimen' episode as the parent of one or more 'Treatment Cycle' episodes, only link clinical events via the EPISODE_EVENT table to the 'Treatment Cycle' episode.
  2. Link all levels of an episode hierarchy.
    For example, if an episode hierarchy includes a 'Treatment Regimen' episode as the parent of one or more 'Treatment Cycle' episodes, link clinical events via the EPISODE_EVENT table to the both the ''Treatment Cycle' and 'Treatment Regimen' episode.

Curate NAACCR Items/Item Codes and OMOP domains.

Dmytry's ingestion script needs to be frontloaded with a curated list of NAACCR items/NAACCR item codes. The curated list should be maintained in the OncologyWG repository. The curated NAACCR list should identify the following:

  • NAACCR items/NAACCR item codes to be ingested.
  • Numeic NAACCR items.
  • Date NAACCR items .
  • The OMOP vocabulary domain of each NAACCR item/NAACCR item code.

How do we handle different versions of NAACCR?

We need to calculate the delta between different versions of NAACCR. The SEER API should be able to be used at least on the NAACCR item level and non-schema specific NAACCR item code level to give us the ability to compare different versions of NAACCR. If NAACCR item numbers and NAACCR item codes are stable over time then the representation of NAACCR version within OMOP will be unnecessary.

Map NAACCR modifiers to a standardized vocabularies. Maybe Nebrasaka Lexicon.

The idea is that in the end NAACCR will be a 'source' vocabulary and that Nebraska Lexicon will be the 'standard' vocabulary for condition/disease episodes. This will task will be a placeholder for mapping all anatomic sites in NAACCR to all anatomic sites in the Nebraska Lexicon. Each anatomic site or groups of anatomic sites will have separate tasks.

Decide on where to house condition/episode modifiers.

A modified measurement table or a new modifier table or something else.

  • MEASUREMENT with polymorphic key
    • Pros
      • Not a new redundant structure.
      • Some modifiers can be recorded independent of a diagnosis.
    • Cons
      • Nullable foreign key.
  • MODIFIER table
    • Pros
      • Constrained by a domain to distinguish which measurements are modifiers.
      • No nullable foreign key.
    • Cons
      • Some modifiers can be recorded independent of a diagnosis.
      • Add new redundant structure.
  • Other option?

Decide on whether to create a new Treatment domain, reuse the Procedure and Drug domains, or create a new Procedure/Treatment domain.

Recommend creating a new 'Procedure/Treatment' domain. There exist currently concepts in the 'Procedure Domain' that are more akin to the concept of a Treatment than a Procedure. These concepts most properly belong to a Treatment domain but we don't want to force users to create Treatment Episodes that just want to stick with Procedures.

Ingest concepts into the 'Procedure/Treatment' domain based on NAACCR treatment variables. Reuse standardized SNOMED concepts when possible. Will require moving some concepts from the 'Procedure' domain to the new 'Procedure/Treatment' domain.

CONCEPT

concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
4273629 Chemotherapy Procedure/Treatment SNOMED Procedure S 367336001 1969-12-31 2099-12-30
4061650 Hormone therapy Procedure/Treatment SNOMED Procedure S 169413002 1969-12-31 2099-12-30
4295112 Immunological therapy Procedure/Treatment SNOMED Procedure S 76334006 1969-12-31 2099-12-30
4039581 Combination therapy Procedure/Treatment SNOMED Procedure S 229554006 1969-12-31 2099-12-30
? External beam, NOS Procedure/Treatment NAACCR NAACCR Code S 1506-01 1969-12-31 2099-12-30
? External beam, NOS Procedure/Treatment NAACCR NAACCR Code S 1506-01 1969-12-31 2099-12-30
? External beam, photons Procedure/Treatment NAACCR NAACCR Code S 1506-02 1969-12-31 2099-12-30
? External beam, protons Procedure/Treatment NAACCR NAACCR Code S 1506-03 1969-12-31 2099-12-30
? External beam, electrons Procedure/Treatment NAACCR NAACCR Code S 1506-04 1969-12-31 2099-12-30
? External beam, neutrons Procedure/Treatment NAACCR NAACCR Code S 1506-05 1969-12-31 2099-12-30
? External beam, carbon ions Procedure/Treatment NAACCR NAACCR Code S 1506-06 1969-12-31 2099-12-30
? Brachytherapy, NOS Procedure/Treatment NAACCR NAACCR Code S 1506-07 1969-12-31 2099-12-30
? Brachytherapy, intracavitary, LDR Procedure/Treatment NAACCR NAACCR Code S 1506-08 1969-12-31 2099-12-30
? Brachytherapy, intracavitary, HDR Procedure/Treatment NAACCR NAACCR Code S 1506-09 1969-12-31 2099-12-30
? Brachytherapy, Interstitial, LDR Procedure/Treatment NAACCR NAACCR Code S 1506-10 1969-12-31 2099-12-30
? Brachytherapy, Interstitial, HDR Procedure/Treatment NAACCR NAACCR Code S 1506-11 1969-12-31 2099-12-30
? Brachytherapy, electronic Procedure/Treatment NAACCR NAACCR Code S 1506-12 1969-12-31 2099-12-30
? Radioisotopes, NOS Procedure/Treatment NAACCR NAACCR Code S 1506-13 1969-12-31 2099-12-30
? Radioisotopes, Radium-232 Procedure/Treatment NAACCR NAACCR Code S 1506-14 1969-12-31 2099-12-30
? Radioisotopes, Strontium-89 Procedure/Treatment NAACCR NAACCR Code S 1506-15 1969-12-31 2099-12-30
? Radioisotopes, Strontium-90 Procedure/Treatment NAACCR NAACCR Code S 1506-16 1969-12-31 2099-12-30

How should we handle 'Numeric' NAACCR items that define numeric ranges?

For example NAACCR #752 'Tumor Size Clinical', has the following list of NAACCR item codes

  000 	  No mass/tumor found
  001 	  1 mm or described as less than 1 mm
  002-988 Exact size in millimeters (2 mm to 988 mm)
  989     989 millimeters or larger
  990     Microscopic focus or foci only and no size of focus is given
  999     Unknown
          Size not stated
          Not documented in patient record
          Size of tumor cannot be assessed
          Not applicable

Rimma's proposed solution: The NAACCR item should contain no list of possible values in the 'Meas Value' domain and should be recorded in Measurement.value_as_number as it appears in the source.

How should we handle NAACCR provenance concepts?

There are two types of provenance concepts: one describing diagnostic source (e.g. pathologist, clinician) and another describing record source (e.g. pathology report, EHR). Which one should we use? Or should we simply import these as separate Measurement entries?

Decide on naming of polymorphic columns in EPISODE_EVENT table.

In the most current version of the Oncology/Episode of Care Combined Proposal #23, the polymorphic pair of columns that allow the EPISODE_EVENT table to reference any clinical event table are named 'event_id' and 'event_table_concept_id'. This diverges from the current OMOP naming convention for polymorphic column pairs:

  • NOTE.note_event_id, NOTE.note_event_field_concept_id
  • COST.cost_event_id, COST.cost_event_field_concept_id

Options:

  • EPISODE_EVENT.event_id, EPISODE_EVENT.event_table_concept_id
  • EPISODE_EVENT.event_id, EPISODE_EVENT.episode_event_field_concept_id
  • EPISODE_EVENT.episode_event_id, EPISODE_EVENT.episode_event_field_concept_id

Add concepts, domains and concept classes for EPISODE table.

DOMAIN

domain_id domain_name domain_concept_id
Episode Episode ?

VOCABULARY

vocabulary_id vocabulary_name vocabulary_reference vocabulary_version vocabulary_concept_id
Episode OMOP Episode OMOP generated ?

CONCEPT_CLASS

concept_class_id concept_class_name concept_class_concept_id
Disease Episode Disease Episode ?
Treatment Episode Treatment Episode ?
Episode of Care Episode of Care ?

CONCEPT

concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
? Episode Metadata Domain Domain OMOP generated 1970-01-01 2099-12-31
? Disease Episode Metadata Concept Class Concept Class OMOP generated 1970-01-01 2099-12-31
? Treatment Episode Metadata Concept Class Concept Class OMOP generated 1970-01-01 2099-12-31
? Episode of Care Episode Metadata Concept Class Concept Class OMOP generated 1970-01-01 2099-12-31
? Disease First Occurrence Episode Episode Disease Episode S OMOP generated 1970-01-01 2099-12-31
? Disease Recurrence Episode Episode Disease Episode S OMOP generated 1970-01-01 2099-12-31
? Disease Remission Episode Episode Disease Episode S OMOP generated 1970-01-01 2099-12-31
? Treatment Regimen Episode Episode Treatment Episode S OMOP generated 1970-01-01 2099-12-31
? Treatment Cycle Episode Episode Treatment Episode S OMOP generated 1970-01-01 2099-12-31
? Episode of Care Episode Episode Episode of Care S OMOP generated 1970-01-01 2099-12-31

How should we handle versions of staging variables? For both TNM and AJCC variables?

Currently the SEER API only provides possible values for Union for International Cancer Control (UICC) TNM 7th edition classification? TNM categories, stage groups, and definitions are based on the Union for International Cancer Control (UICC) TNM 7th edition classification. UICC 7th edition and AJCC 7th edition TNM categories and stage groups are very similar; however, there are some differences.

Should we try to obtain possible values for other TNM versions? This would require that ETL writers would need to scope the look up of their staging variables with NAACCR item #1060 TNM Edition Number. See here:http://datadictionary.naaccr.org/default.aspx?c=10#1060

Add Episode Type concepts.

VOCABULARY

vocabulary_id vocabulary_name vocabulary_reference vocabulary_version vocabulary_concept_id
Episode Tpe OMOP Episode Type OMOP generated ?

CONCEPT_CLASS

concept_class_id concept_class_name concept_class_concept_id
Episode Type Episode Type ?

CONCEPT

concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
? Episode Type Metadata Concept Class Concept Class OMOP generated 1970-01-01 2099-12-31
? Episode Type Metadata Vocabulary Vocabulary OMOP generated 1970-01-01 2099-12-31
? Pre-made treatment abstraction with pre-made clinical events connections. Type Concept Episode Type Episode Type S OMOP generated 1970-01-01 2099-12-31
? Pre-made treatment abstraction with algorithmically derived clinical events connections. Type Concept Episode Type Episode Type S OMOP generated 1970-01-01 2099-12-31
? Pre-made treatment abstraction with no clinical events connections. Type Concept Episode Type Episode Type S OMOP generated 1970-01-01 2099-12-31
? Algorithmically derived treatment abstraction and clinical event connections. Type Concept Episode Type Episode Type S OMOP generated 1970-01-01 2099-12-31

Ingest NAACCR into the OMOP vocabulary tables.

  • Call the SEER API to ingest into the OHDSI vocabulary tables NAACCR data item/data item categorical values.
    • Get an account and API key from here: https://api.seer.cancer.gov/new_account
    • Return all NAACCR item numbers/item names for the latest NAACCR version from this REST resource: https://api.seer.cancer.gov/rest/naaccr/latest. This will return an array of hashes for reach NAACCR data item with the following JSON object structure: { "item": 10, "name": "Record Type" }
    • For each NAACCR data item, call the REST resource (replacing 10 with each NAACCR data item number): https://api.seer.cancer.gov/rest/naaccr/latest/item/10. This will return a hash with the following structure (the “documentation” key value can be seen in the separate file ‘documentation.html’): { "item": 1390, "name": "RX Summ--Chemo", "start_col": 2243, "end_col": 2244, "alignment": "RIGHT","padding_char": "0", "documentation": “documentation.html” }
      • Parse the HTML value in the “documentation’ key. Here are CSS selectors that will help you find the right content:
        • ‘div.content.chap10-para .code-row’ This will find each categorical value for the NACCAR data item.
        • ‘td.code-nbr’ to find the code value
        • ‘td.code-dsc’ to find the code description.
    • Not all NAACCR data items will have its list of categorical values within the ‘documentation’ key. Some NAACCR data items are non-categorical. Other NAACCR data items are what is called ‘site-specific’ (SSDI). SSDI means the data item’s categorical values are based on the anatomical site/topography being reported upon. For example:
      • NAACCR data item #1290 ‘RX SUMM--SURG PRIM SITE’: tracks the site-specific codes for the type of surgery to the primary site performed as part of the first course of treatment. This includes treatment given at all facilities as part of the first course of treatment. Here are the high-level grouping of site-specific surgery codes.
        • 00 None
        • 10-19 Site-specific code; tumor destruction
        • 20-80 Site-specific codes; resection
        • 90 Surgery, NOS
        • 98 Site specific codes; special
        • 99 Unknown
    • Return all NAACCR site-specific surgery codes table titles from this REST resource: https://api.seer.cancer.gov/rest/surgery/latest/tables. This will return an array of strings each string representing a site-specific surgery codes table title. Each table tile is a short hand for a group of anatomical ICDO site codes.
    • For each NAACCR site-specific surgery codes table title, call the REST resource (replacing ‘Breast’ with each NAACCR site-specific surgery code table title): https://api.seer.cancer.gov/rest/surgery/latest/table?title=Breast. This will return a hash with the following structure (the “row” key value can be seen in the separate file ‘row.json’): { "title": "Breast", "site_inclusions": "C500-C509", "hist_exclusions": "9727,9733,9741-9742,9764-9809,9832,9840-9931,9945-9946,9950-9967,9975-9992", "pre_note": "C500-C509
      (Except for M-9727, 9733, 9741-9742, 9764-9809, 9832, 9840-9931, 9945-9946, 9950-9967 and 9975-9992)", "row":[ {}, {}]}. The “site_inclusions” key details the range of ICDO topography/anatomical sites bound to this list of surgery codes. The “row” key is an array of hashes that represents the site-specific surgery code hierarchy. Some of the hashes in the “row” key are for line-break formatting and instructional text. Ignore all entries in the “row” key with a “line_break”: true key/value pair or that have a “code”:”” key/value pair. The hierarchal structure of the “row” key is based on the order of the entries. The “level” key can go from 0 to 3.
  • Screen scrape the NAACCR EOD data SSDI data item possible values. A whole set of all SSDI NAACCR data items fall into the category ‘EOD Data’. See here a list of EOD schemas (anatomical site/histology groupings): https://staging.seer.cancer.gov/eod_public/list/1.4/. Unfortunately, it does not appear that the SEER API yet supports detailing the list of possible values for these data items. So we will need to scrape.
    • Visit https://staging.seer.cancer.gov/eod_public/list/1.4/
    • Click each “schema”.
    • Scrape the list of possible ‘Primary Site’ and ‘Histology’ associated with the “schema”.
    • For each row in the table labeled “Data Items”, if the “Metadata” column indicates “SSDI”, click on the link in the “Name” column.
    • On the page for the SSDI NAACCR data item, scrape the NAACCR data item and collect the possible values from the table with the headers “Code” and “Description”.
  • Screen scrape the NAACCR TNM data SSDI data item possible values. A whole set of all SSDI NAACCR data items fall into the category ‘TNM Data’. See here a list of TNM schemas (anatomical site/histology groupings): https://staging.seer.cancer.gov/tnm/list/1.9/ . Unfortunately, it does not appear that the SEER API yet supports detailing the list of possible values for these data items. So we will need to scrape.
    • Visit https://staging.seer.cancer.gov/tnm/list/1.9/
    • Click each “schema”.
    • Scrape the list of possible ‘Primary Site’ and ‘Histology’ associated with the “schema”.
    • For each row in the table labeled “Main Data Items”, click on the link in the “Name” column.
    • On the page for the SSDI NAACCR data item, scrape the NAACCR data item and collect the possible values from the table with the header specific the SSDI data item. For example, 'Clinical T' and 'Clinical T Display’.

row.json
documentation.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.