Git Product home page Git Product logo

deep-similarity-using-ehr's Introduction

Deep Similarity Using EHR

This is an attempt to reproduce the paper "A Novel Deep Similarity Learning Approach to Electronic Health Records Data"

V. Gupta, S. Sachdeva and S. Bhalla, ”A Novel Deep Similarity Learning Approach
to Electronic Health Records Data,” in IEEEAccess, vol. 8, pp. 209278-209295, 2020, 
doi:10.1109/ACCESS.2020.3037710.

Setup

All the code is in Jupyter notebooks. It is important to setup a virtal environement and jupyter kernel

1. Create a Virtual Environemnt

This project requires Python 3.7+

Setup a virtual environment using the following commands

python3 -m venv ~/.venv

2. Activate virtual environment

source ~/.venv/bin/activate pip install --upgrade pip

2. Install dependencies

All the dependencies are specified in requirements.txt file. Install dependencies using

pip install -r requirements.txt

3. Create Jupyter kernel to use with jupyter notebooks

python -m ipykernel install --name=my-kernel

4. Start jupyter notebook

jupyter notebook

Open the notebooks and choose kernel "my-kernel" to run the notebooks.

Files

Data

Raw Data

The paper uses OpenEHR data which is a benchmark data available at http://www.lampada.uerj.br/en/orbda/

Data can be imported to PostgreSQL database using the scripts provided on the webiste. For this work, nephrology data was downloaded from the database and saved to csv. A sample of saved data is available under 'raw-data' folder.

  • data-nephrology-s.csv.gz - Sample of ORBDA Nephrology data. This is just a 25% of data. Full data file is too big to store here. Complete data can be downloaded from orbda website. For this work, pre-processed data is available under "processed-data".

Preprocessed Data

  • For reusing data with different models, preprocessed data is saved under prcocessed-data folder.
    • data-interpolated.csv.gz - Pre-processed and interpolated data
    • data-normlized.csv.gz - Normalized data
    • p_x.pt.gz - Patient data with multiple visits grouped. Size (18432, 40, 42). Visits are clipped to 40 with 42 features. Uncompress this before using with the model
    • p_y.pt.gz - Target label for the patient. Visits are sorted and last label is used. Patient can have more than one label across all the visits. The dataset is specifically for chronic kidney failures. So, visits are sorted by date and picking the last one to represent the patient. Uncompress this before using with the model
    • p_y_oh.pt.gz - One hot encoded data for the target lables. Uncompress this before using with the model

Baseline models

  • baseline-ml.ipynb - Baseline ml models using the following algorithms
    • Logistic Regression
    • Naive Bayes
    • KNN
    • Decision Trees
  • baseline-cnn.ipynb - Baseline CNN model for muti-class classification
  • baseline-rnn.ipynb - Baseline RNN model for muti-class classification
  • baseline-mlp.ipynb - Baseline MLP model for muti-class classification
  • baseline-cnn-cosine-euclidean-manhattan.ipynb - Baseline cnn model for patient similarity using cosine similarity, euclidean and manhattan distance.

All the baseline CNN models are using 2d convolution

CNN Softmax

  • cnn-softmax.ipynb - Patient similarity using CNN softmax.
  • cnn-softmax-balanced-data.ipynb - Patient similarity using CNN softmax using a balanced dataset.

Results

Baseline ML Models

Method Accuracy Recall Precision F1 Score
Logsitic Regression 0.9825 0.9825 0.9653 0.9738
Naïve Bayes 0.9825 0.9825 0.9653 0.9738
KNN 0.9817 0.9817 0.9696 0.9742
Decision Trees 0.9762 0.9815 0.9717 0.9737

Baseline DNN Models

Method Accuracy Loss
RNN 0.9000 1.9711
CNN 0.9937 1.4911
MLP 0.9829 1.4788

Baseline deep similairy Models

Method Accuracy Recall Precision F1 Score
CNN Manhattan 0.9533 0.4993 0.4773 0.4881
CNN Euclidean 0.9533 0.5013 0.5305 0.4921
CNN Cosine 0.2833 0.5750 0.5174 0.2527

CNN Softmax

Method Accuracy Recall Precision F1 Score
CNN Softmax 0.9945 0.9942 0.9750 0.9844

CNN Softmax using balanced data

Method Accuracy Recall Precision F1 Score
CNN Softmax 0.9361 0.8727 0.7337 0.7830

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.