Git Product home page Git Product logo

pssam-gan's Introduction

Description

Repository of Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN) - Paper accepted at the Journal of Computers in Biology and Medicine.

Abstract

Understanding causality is of crucial importance in biomedical sciences, where developing prediction models is insufficient because the models need to be actionable. However, data sources, such as electronic health records, are observational and often plagued with various types of biases, e.g. confounding. Although randomized controlled trials are the gold standard to estimate the causal effects of treatment interventions on health outcomes, they are not always possible. Propensity score matching (PSM) is a popular statistical technique for observational data that aims at balancing the characteristics of the population assigned either to a treatment or to a control group, making treatment assignment and outcome independent upon these characteristics. However, matching subjects can reduce the sample size. Inverse probability weighting (IPW) maintains the sample size, but extreme values can lead to instability. While PSM and IPW have been historically used in conjunction with linear regression, machine learning methods –including deep learning with propensity dropout– have been proposed to account for nonlinear treatment assignments. In this work, we propose a novel deep learning approach –the Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)– that aims at keeping the sample size, without IPW, by generating synthetic matches. PSSAM-GAN can be used in conjunction with any other prediction method to estimate treatment effects. Experiments performed on both semi-synthetic (perinatal interventions) and real-world observational data (antibiotic treatments, and job interventions) show that the PSSAM-GAN approach effectively creates balanced datasets, relaxing the weighting/dropout needs for downstream methods, and providing competitive performance in effects estimation as compared to simple GAN and in conjunction with other deep counterfactual learning architectures, e.g. TARNet

Requirements and versions

pytorch - 1.3.1
numpy - 1.17.2
pandas - 0.25.1
scikit - 0.21.3
matplotlib - 3.1.1
python - 3.7.4

Citation

@article{
  ghosh2021propensity,
  title={Propensity Score Synthetic Augmentation Matching using Generative Adversarial Networks (PSSAM-GAN)},
  author={Ghosh, Shantanu and Boucher, Christina and Bian, Jiang and Prosperi, Mattia},
  journal={Computer Methods and Programs in Biomedicine Update},
  pages={100020},
  year={2021},
  publisher={Elsevier}
}

Keywords

causal AI; causal inference; deep learning; biomedical informatics; generative adversarial networks; propensity score; treatment effect; electronic health record; big data

Contributors

Shantanu Ghosh

Christina Boucher

Jiang Bian

Mattia Prosperi

Dependencies

python 3.7.7

pytorch 1.3.1

How to run

To reproduce the experiments mentioned in the paper for IHDP dataset command:
cd IHDP
python3 main_PM_GAN.py

To reproduce the experiments mentioned in the paper for Jobs dataset command:
cd Jobs
python3 main_PM_GAN.py

By default it will run for 1000 and 10 iterations for IHDP and Jobs dataset respectively.

Results

The default results mentioned in the paper is avaliable at the following locations:

IHDP

Jobs

Hyperparameters

The codebase is setup with the default hyperparameters depicted in the paper. However, if one wishes to change the hyperparameters, please visit the following files for IHDP and Jobs respectively:

IHDP

Jobs

Output

After the run, the outputs will be generated in the following location:

IHDP

Jobs

Consolidated results will be available in textfile in /IHDP/Details_original.txt and /Jobs/Details_original.txt files.

The details of each run will be avalable in csv files in the following locations:

  1. IHDP - /IHDP/MSE/Results_consolidated_NN.csv

  2. Jobs - /Jobs/MSE/Results_consolidated_NN.csv

Plots

The plots for each run will be found at the following location:

  1. IHDP - /IHDP/Plots

  2. Jobs - /Jobs/Plots

License & copyright

© DISL, University of Florida

Licensed under the MIT License

pssam-gan's People

Contributors

shantanu-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.