This project is to test my understanding of machine-learning model building along with predicting credit risk taught in the Rice University FinTech Bootcamp.[1]
- General Information
- Summaries
- Technologies
- Installation Guide
- Code Examples
- Usage
- Sources
- Status
- Contributors
Loans of any kind such as automotive loans, mortgage, student loans, and debt consolidation, are just a few examples consumers are seeking online. Peer-to-peer lending services such as LendingClub or Prosper allow investors to loan other people money without the use of a bank. However, investors always want to mitigate risk, so the following repository was created to help a client use machine learning techniques to predict credit risk.[1]
- Which model had the best balanced accuracy score?
- The Synthetic Minority Oversampling Technique (SMOTE) model produced the best balanced accuracy score, 79.67%.
- Which model had the best recall score?
- The best recall score was produced by using the SMOTE model producing 71% recall for high risk loans and 88% recall for low risk loans.
- Which model had the best geometric mean score?
- The best geometric mean scores produced were 79% using the SMOTE model.
- Which model had the best balanced accuracy score?
- The Easy Ensemble Classifier produced the best balanced accuracy score, 92.55%.
- Which model had the best recall score?
- The best recall score was produced by using the Easy Ensemble Classifier producing 91% recall for high risk loans and 94% recall for low risk loans.
- Which model had the best geometric mean score?
- The best geometric mean scores produced were 93% using the Easy Ensemble Classifier.
- What are the top three features?
- The top three features are:
- Principle Received to Date (total_rec_prncp): 0.0738
- Interest Received to Date (total_rec_int): 0.0639
- Payments Received to Date for Portion of Total Amount Funded by Investors (total_pymnt_inv): 0.0607
- The top three features are:
- See requirements.txt for a list of libraries to create a machine learning environment.
- Download the entire repository
- Open Git Terminal
- Navigate into the repository file path where you stored the files during the download.
- The notebook files should be visible to run.
- Make sure to create a separate virtual environment for the machine-learning libraries.
- Use requirements.txt in the repository to install the libraries using the following command:
pip install -r requirements.txt
*See the Usage section below for instructions on how to run notebooks.
- Creating a standard scaler instance and training the model
# Create the StandardScaler instance
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit the Standard Scaler with the training data
# When fitting scaling functions, only train on the training dataset
X_scaler = scaler.fit(X_train)
# Scale the training and testing data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)
- Showing the imbalanced classification report for easy ensemble classifier
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_eec_predictions))
-
To run the analysis process, navigate to credit_risk_resampling.ipynb using Git Terminal within the directory risky_business.
-
Execute the command 'code .' in the terminal to open VS Code.
-
VS Code opens. Select the credit_risk_resampling.ipynb file found in the left side navigation pane.
-
Click the Run All Cells button, double arrows, found at the top of the main workspace to run all cells in the Jupyter Notebook file.
-
All cells in the notbook run.
- Follow steps 1 - 5 to run the credit_risk_ensemble.ipynb notebook.
Project is: finished
- Jonathan Owens
- LinkedIn: www.linkedin.com/in/jonowens