Git Product home page Git Product logo

dai-net's Introduction

DAI-Net

Dual Adaptive Interaction Network for Coordinated Drug Recommendation
This is an implementation of our model DAI-Net and the baselines in the paper.


Requirements

torch == 1.8.0+cu111
torch-geometric == 1.0.3
torch-scatter == 2.0.9
torch-sparse == 0.6.12

Package Dependency

  • first, install the rdkit (RDKit: Open-Source Cheminformatics Software) conda environment.
conda create -c conda-forge -n DAI-Net  rdkit
conda activate DAI-Net

# can also use the following in your current env
pip install rdkit-pypi
  • then, in DAI-Net environment, install the following package
pip install scikit-learn, dill, dnc

Note that torch setup may vary according to GPU hardware. Generally, run the following

pip install torch

If you are using RTX 3090, then plase use the following, which is the right way to make torch work.

python3 -m pip install --user torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  • Finally, install other packages if necessary
pip install [xxx] # any required package if necessary, maybe do not specify the version, the packages should be compatible with rdkit

Here is a list of reference versions for all package

pandas: 1.3.0
dill: 0.3.4
torch: 1.8.0+cu111
rdkit: 2021.03.4
scikit-learn: 0.24.2
numpy: 1.21.1

Let us know any of the package dependency issue. Please pay special attention to pandas, some report that a high version of pandas would raise error for dill loading.

Processing

  wget -r -N -c -np --user [account] --ask-password https://physionet.org/files/mimiciii/1.4/
  • Go into the folder and unzip required three files and copy them to the ~/cs598dl4h-project/data/input/ folder
  cd ~/physionet.org/files/mimiciii/1.4
  gzip -d PROCEDURES_ICD.csv.gz # procedure information
  gzip -d PRESCRIPTIONS.csv.gz  # prescription information
  gzip -d DIAGNOSES_ICD.csv.gz  # diagnosis information
  cp PROCEDURES_ICD.csv PRESCRIPTIONS.csv DIAGNOSES_ICD.csv ~/cs598dl4h-project/data/input/
  • Download additional files in the ~/cs598dl4h-project/data/input/ folder
  cd ~/cs598dl4h-project/data/input/
  ./get_additional_files.sh
  • Processing the data to get a complete records_final.pkl

    cd ~/cs598dl4h-project/data
    python processing.py

Project Structure

  • data/
    • processing.py: The data preprocessing file.
  • input/
    • PRESCRIPTIONS.csv: the prescription file from MIMIC-III raw dataset
  • DIAGNOSES_ICD.csv: the diagnosis file from MIMIC-III raw dataset
  • PROCEDURES_ICD.csv: the procedure file from MIMIC-III raw dataset
  • RXCUI2atc4.csv: this is a NDC-RXCUI-ATC4 mapping file, and we only need the RXCUI to ATC4 mapping. This file is obtained from https://github.com/ycq091044/SafeDrug.
  • drug-atc.csv: this is a CID-ATC file, which gives the mapping from CID code to detailed ATC code (we will use the prefix of the ATC code latter for aggregation). This file is obtained from https://github.com/ycq091044/SafeDrug.
  • rxnorm2RXCUI.txt: rxnorm to RXCUI mapping file. This file is obtained from https://github.com/ycq091044/SafeDrug.
  • drugbank_drugs_info.csv: drug information table downloaded from drugbank here https://www.dropbox.com/s/angoirabxurjljh/drugbank_drugs_info.csv?dl=0, which is used to map drug name to drug SMILES string.
  • drug-DDI.csv: this a large file, containing the drug DDI information, coded by CID. The file could be downloaded from https://drive.google.com/file/d/1mnPc0O0ztz0fkv3HF-dpmBb8PLWsEoDz/view?usp=sharing
    • output/
      • atc3toSMILES.pkl: drug ID (we use ATC-3 level code to represent drug ID) to drug SMILES string dict
  • ddi_A_final.pkl: ddi adjacency matrix
  • ehr_adj_final.pkl: used in GAMENet baseline (if two drugs appear in one set, then they are connected)
  • records_final.pkl: The final diagnosis-procedure-medication EHR records of each patient, used for train/val/test split.
  • voc_final.pkl: diag/prod/med index to code dictionary
  • src/
    • SafeDrug.py: our model
  • baseline models:
  • GAMENet.py
    • DMNC.py
    • Leap.py
    • Retain.py
    • ECC.py
    • LR.py
    • setting file
  • model.py
    • util.py
    • layer.py
    • analysis file
  • Result-Analysis.ipynb
  • dependency.sh
  • requirements.txt
  • README.md

After the processing have been done, we get the following statistics:

# patients  6350
# clinical events  15032
# diagnosis  1958
# med  112
# procedure 1430
# avg of diagnoses  10.5089143161256
# avg of medicines  11.647751463544438
# avg of procedures  3.8436668440659925
# avg of vists  2.367244094488189
# max of diagnoses  128
# max of medicines  64
# max of procedures  50
# max of visit  29

Process Data

The processed data is in the path

\DAI-Net\data

You can also process data with

  • MIMIC-III
python processing.py
  • MIMIC-IV
python processing_4.py

Run the code

python DAI-Net.py

here is the argument:

usage: DAI-Net.py [-h] [--Test] [--model_name MODEL_NAME]
               [--resume_path RESUME_PATH] [--lr LR]
               [--target_ddi TARGET_DDI] [--kp KP] [--dim DIM]

optional arguments:
  -h, --help            show this help message and exit
  --Test                test mode
  --model_name MODEL_NAME
                        model name
  --resume_path RESUME_PATH
                        resume path
  --lr LR               learning rate
  --target_ddi TARGET_DDI
                        target ddi
  --kp KP               coefficient of P signal
  --dim DIM             dimension

Credits

Our work followed the original codes at https://github.com/ycq091044/SafeDrug.

dai-net's People

Contributors

obananas avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.