Git Product home page Git Product logo

credit_risk_analysis's Introduction

Credit_Risk_Analysis

Purpose

This machine learning project used scikit-learn and imbalanced-learn to train and evaluate models to determine credit card risk using a credit card dataset. Six different techniques were used to train and evaluate models to determine credit risk: Oversampling with the RandomOverSampler and SMOTE algorithms, Undersampling with the ClusterCentroids algorithm, a combination of over and under sampling with the SMOTEENN algorithm, and two machine learning models that reduce bias - the BalancedRandomForestClassifer and EasyEnsembleClassifier.

Results

Naive Random Oversampling

naive_random_sampling
-The balanced accuracy score was 0.6496, meaning the model predicted the credit risk accurately 64.96% of the time.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted. This model is not great for identifying high-risk loans.
-The recall scores for this model show that the model is better at identifying positive low-risk loans (0.68) and decent at positively identifying high-risk loans (0.62)

SMOTE Oversampling

SMOTE
-The balanced accuracy score was 0.6444, meaning the model predicted the credit risk accurately 64.44% of the time.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted. This model is not great for identifying high-risk loans.
-The recall scores for this model show that the model is better at identifying positive low-risk loans (0.66) and decent at positively identifying high-risk loans (0.63)

Undersampling

undersampling
-The balanced accuracy score was 0.5292, meaning the model predicted the credit risk accurately 52.92% of the time.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted. This model is not great for identifying high-risk loans.
-The recall scores for the high-risk loans were around 0.61 and the recall scores for low-risk loans were around 0.45. While the high-risk recall is much better, both recall scores are low and show that many of the positives were not accurately predicted.

Combination (Over and Under) Sampling (SMOTEENN)

combination_sampling
-The balanced accuracy score for the undersampling technique was around .6245, meaning that only 62.45% of the testing data was accurately classified.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted.
-The high-risk recall score for this model is fairly high at 0.68 and the low-risk recall score is average at 0.57. Compared to the previous techniques, this model is much better at identifying true high-risk loans.

Balanced Random Forest Classifier

random_forest
-The balanced accuracy score for this model is comparatively high at .7878, meaning that 78.78% of the testing credit data was accurately classified.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted.
-The recall score for low-risk loans is very high at 0.91 and the recall score for high-risk loans is fairly high at 0.67. This shows that the classifier is good at predicting true positives for low-risk loans.

Easy Ensemble AdaBoost Classifier

easy_ensemble
-The balanced accuracy score for this model was 0.9254, meaning the model was accurate 92.54% of the time.
-The precision scores for this model are very skewed toward the low-risk loans as all of the low-risk loans were correctly predicted, but nearly none of the high-risk loans were accurately predicted.
-The recall score was high for both low and high risk loans: 0.91 for high risk and 0.94 for low risk.

Summary

All of the machine learning models had low precision scores for the high-risk loans in accurately predicting positives. The balanced accuracy score for the models I completed varied with the lowest score for the undersampling method and a high score with Easy Ensemble Classifier. Recall scores also varied between models with the lowest scores for undersampling and highest with the classifying methods. Of the models created, the Easy Ensemble Classifier would be the best model to use to predict credit risk due to the high recall scores for both high and low risk loans, as well as an accuracy score of 92.54%. The precision for this model is still very off, indicating that the positives are not necessarily accurate, and so this model could be much improved before being put into use.

credit_risk_analysis's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.