Git Product home page Git Product logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Utilized several machine learning models to predict credit risk using Python's imbalanced-learn and scikit-learn libraries

Supervised Machine Learning Overview

Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, you’ll need to employ different techniques to train and evaluate models with unbalanced classes. We then will evaluate the performance of these models and make a recommendation on whether they should be used to predict credit risk.

Resources

  • Dataset

    • LoanStats_2019Q1.csv
  • Software

    • Python
    • Jupyter Notebook
    • NumPy, Skikit-learn, & Imbalanced Libraries
    • Logistic Regression
    • Random Forest Models
    • Ensemble and resampling techniques

Results

Below are the results of all six machine learning algorithms.

ML Model Accuracy Score Precision Recall F1
Naive Random Oversampling 0.64 0.99 0.68 0.80
SMOTE Oversampling 0.64 0.99 0.63 0.77
Cluster Centroid Undersampling 0.51 0.99 0.44 0.60
SMOTEENN Combination Sampling 0.63 0.99 0.57 0.73
Balanced Random Forest Classifier 0.78 0.99 0.91 0.95
Easy Ensemble AdaBoost Classifier 0.93 0.99 0.94 0.97

Naive Random Oversampling

webpage

SMOTE Oversampling

webpage

Cluster Centroid Undersampling

webpage

SMOTEENN

webpage

Balanced Random Forest Classifier

webpage

Easy Ensemble Adaboost Classifier

webpage

Summary

Within the financial industry, sensitivity is a more valuable metric than precision when anylizing risk and default rates for loan candidates. Banks need to be able to identify all high-risk loan candidates to ensure they do not select them and default on those loans.

In terms of precision, all six algorithms had a really low precision rate for high risk individuals. The highest one was the Easy Ensemble AdaBoost Classifier with 7% precision which is still considered pretty low for finding these high risk individuals not to give loans to. This means that out of all the customers marked as high-risk 7% were actually high-risk. On the other hand, all the models had a perfect precision for low-risk individiauls meaning that all of the low-risk customers were marked as that.

Having this in mind, precision is not telling us much information to compare the algorithms, so we should take a look at sensitivity. The model with the highest sensitivity was the easy ensemble adaboost classifier (91% for high-risk and 94% for low-risk individuals), meaning that 91% of the time all the high-risk individuals are marked as high-risk individuals. Followed by this model, the other two with high recall were the Random Forest Classifier (67%) and SMOTEENN Resample (70%).

And last but not least, we are going to look at the balanced accuracy score to make the final decision of which model to use. The accuracy score stands for how correct was the model, meaning out of all the predictions how many of them were true to the classification. As we were able to see, the model with the highest accuracy score by far was the Easy Ensemble AdaBoost Classifier. So, this should be the one we chose because of its high accuracy, highest precision, and highest sensitivity.

credit_risk_analysis's People

Contributors

diercz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.