Git Product home page Git Product logo

laat's Introduction

A Label Attention Model for ICD Coding from Clinical Text Twitter

GitHub top language GitHub issues GitHub repo size GitHub last commit GitHub forks GitHub stars

This project provides the code for our JICAI 2020 A Label Attention Model for ICD Coding from Clinical Text paper.

The general architecture and experimental results can be found in our paper:

  @inproceedings{ijcai2020-461-vu,
      title     = {A Label Attention Model for ICD Coding from Clinical Text},
      author    = {Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony},
      booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}},             
      pages     = {3335--3341},
      year      = {2020},
      month     = {7},
      note      = {Main track}
      doi       = {10.24963/ijcai.2020/461},
      url       = {https://doi.org/10.24963/ijcai.2020/461},
   }

Please CITE our paper when this code is used to produce published results or incorporated into other software.

Requirements

  • python>=3.6
  • torch==1.4.0
  • scikit-learn==0.23.1
  • numpy==1.16.3
  • scipy==1.2.1
  • pandas==0.24.2
  • tqdm==4.31.1
  • nltk>=3.4.5
  • psycopg2==2.7.7
  • gensim==3.6.0
  • transformers==2.11.0

Run pip install -r requirements.txt to install the required libraries

Run python3 and run import nltk and nltk.download('punkt') for tokenization

Data preparation

MIMIC-III-full and MIMIC-III-50 experiments

data/mimicdata/mimic3

  • The id files are from caml-mimic
  • Install the MIMIC-III database with PostgreSQL following this instruction
  • Generate the train/valid/test sets using src/util/mimiciii_data_processing.py. (Configure the connection to PostgreSQL at Line 139)

MIMIC-II-full experiment

data/mimicdata/mimic2

  • Place the MIMIC-II file (MIMIC_RAW_DSUMS) to data/mimicdata/mimic2
  • Generate the train/valid/test sets using src/util/mimicii_data_processing.py.

Note that: The code will generate 3 files (train.csv, valid.csv, and test.csv) for each experiment.

Pretrained word embeddings

data/embeddings

We used gensim to train the embeddings (word2vec model) using the entire MIMIC-III discharge summary data.

Our code also supports subword embeddings (fastText) which helps produce better performances (see src/args_parser.py).

How to run

The problem and associated configurations are defined in configuration/config.json. Note that there are 3 files in each data folder (train.csv, valid.csv and test.csv)

There are common hyperparameters for all the models and the model-specific hyperparameters. See src/args_parser.py for more detail

Here is an example of using the framework on MIMIC-III dataset (full codes) with hierarchical join learning

python -m src.run \
    --problem_name mimic-iii_2_full \
    --max_seq_length 4000 \
    --n_epoch 50 \
    --patience 5 \
    --batch_size 8 \
    --optimiser adamw \
    --lr 0.001 \
    --dropout 0.3 \
    --level_projection_size 128 \
    --main_metric micro_f1 \
    --embedding_mode word2vec \
    --embedding_file data/embeddings/word2vec_sg0_100.model \
    --attention_mode label \
    --d_a 512 \
    RNN  \
    --rnn_model LSTM \
    --n_layers 1 \
    --bidirectional 1 \
    --hidden_size 512 

laat's People

Contributors

tienthanhdhcn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.