Git Product home page Git Product logo

opentox_project_classification's Introduction

VAERS | Classification Problem

1. Project Understanding

  • Objective
  • Description

2. Data Understanding

  • Import Libraries
  • Load data
  • 2.1. Data Cleaning (On how to get high-quality data?)
  • Missing values (imputation, masking)
  • Dealing with (duplication records, errors, inconsistency, misspelling, outliers etc)
  • Categorical data: high and low cardinality
  • 2.2. Exploratory Data Analysis
  • Feature Engineering (binning, encoding)
  • Statistical summaries and visualisations
  • Features correlation

3. Predictive Modeling

  • Classification models:
  • Logistic Regression (LR),
  • Support Vector Machine (SVM),
  • K-nearest Neaighbors (Knn),
  • Ensemble methods:
      . Random Forest (RF), Extra Trees (ET) 
      . Boosting: AdaBoost (AB), Extreme Gradient Boost (XGB)
      . Stacking: Decision Tree (DT) + AdaBoost (AB)
  • Models Validation:
  • GridSearch Cross-Validation
  • Models Evaluation:
  • Error metrics: Accuracy, Precision, Recall, F-score, Confusion Matrix

4. Conclusion

  • Best model in terms of performance and runtime
  • Alternative model
  • Model Flaws

Objective

  • Find the best Classification model and model hyperparameters to predict post-vaccine recovery (e.g. in terms of performance, time)

  • To detect early warning AE signals and assess possible safety concerns for a given vaccine, which may generate hypotheses and prompt further investigations.

Description

  • what is VAERS?

    • Vaccine Adverse Event Reporting System --- A passive surveillance system that monitors vaccine safety in the U.S. beyond clinical trials.
  • Who reports there?

  • How we can make use of this data?


2.2. Exploratory Data Analysis Framework:

Levels of Analysis: Stats, Visualization


Part 1. Basic analysis:

  • At a glance: categorical values || missing values; what is the best imputation method?

  • Feautres Distribution

    • Post-covid19-vaccination Deaths by
    • Vaccination doses counts in 2020 and 2021.
    • Vaccine manufacture and vaccine lots.
    • Recovery counts by each State.
    • Gender, ageGroup, vaccine brand and vaccine lot distribution in
    • Hospitalization, Emergency visit, life threatening, disability, prolonged hospitalized days
    • Myocarditis/Pericarditis, Guillain–Barré Syndrome, Thrombosis with Thrombocytopenia Syndrome (TTS) (Rare but high risk health conditions attributed to the vax administration.)

Part 2. Correlation signals:

__Q. Does the population-health status play a significant role in having post-vaccine AE or getting recoverd?

  • status of the vaccine recipients: (ageGroup, gender, medical history, current illnes, current medication, allergy)

    with:

  • Recovery outcomes: (yes, no) after Vaccine administration
  • Serious outcomes: (yes, no) after Vaccine administration (Death, life threatening, disability, emergency visit, prolonged hospitalized days)

4. Conclusion

  • RF classifier was proposed as the best model in terms of performance and runtime
  • XGB as an alternative model
  • Model Flaws:
  • Create different featured engineered variables from independent variables, that may improve correlation with the dependent variable and see if it helps in improving model performance and feature importance.
  • Keep iterating with subsample, learning rate and estimators for the `stacking model to further improve the results
  • Explore the database updates, more data will definitely help in improving the prediction outcome.

opentox_project_classification's People

Contributors

cheminform-bio avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.