Git Product home page Git Product logo

logisitic-regression-heart-disease-model's Introduction

Logistic Regression Heart Disease Model

Authors: Andrew Bridglall, Brendan Tong, Darren Jian, Youngjun Oh

Description

We wanted to build a logistic regression model to accurately predict heart disease in patients based on data from the Cleveland Clinic Foundation. To do this, we have performed logistic regressions where we first considered multiple potential predictor variables and distilled them into the “best” set of predictors through a model-selection process. In the revised logistic regression, we have only considered the following predictor variables: cp (chest pain), sex, trestbps (resting blood pressure), slope (the slope of the peak exercise ST segment), ca (the number of major blood vessels (0-3) colored by fluoroscopy), thal (thallium stress test result).

Our results indicated that both our original and revised logistic regression models failed the Hosmer-Lemeshow goodness of fit test, suggesting that the models did not sufficiently fit the data. After comparing the deviance residual outputs for both models, we found that the second model’s deviance residuals median was closer to 0 and the minimum and maximum values were more symmetric than those of the first model. Next, we observed that the misclassification error rate for our second model (14.3%) was less than the error rate for our first model (16.1%), which would have indicated accuracy if not for the lack of good fit. Ultimately, while our revised model has reasonable predictive power (>85% prediction accuracy), we must take our models’ reliability and predictions with a grain of salt.

We then put forth a variety of suggestions that may have increased the reliability of our logistic models. First, we believe that having a larger dataset on heart disease, as well as including more variables that were directly linked to heart disease (ie, patient smoking history, obesity and diabetes) would have been more beneficial to building accurate logistic models. Regarding rooms for improvement, we have noticed that partitioning our dataset resulted in the sample size of the testing data being much smaller than the sample size of the training data; this might explain the higher error rates in the testing data compared to the training data. In addition, adjusting the heart disease prediction threshold (50%) may have also resulted in more accurate predictions. Implementing improvements may increase the predictive power of future models on heart disease.

We were in part inspired by Josh Starmer's helpful video series on using logistic regression in R.

logisitic-regression-heart-disease-model's People

Contributors

andrewbridglall avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.