`
This is done as capstone project in BFSI domain and submitted as first phase.
Problem Description:
Given demographic data and credit bureau data of the CredX applicants, asked to evaluate the customers of CredX using scorecard.
Methodology:
Followed CRISP-DM framework and below are the components. Also described achievments and approach followed in each component.
-
Business Objective:
CredX is a leading credit card provider that gets thousands of credit card applicants every year. But in the past few years, it has experienced an increase in credit loss. Company believes that the best strategy to mitigate credit risk is to ‘acquire the right customers’.
In this project, we will help CredX to identify the right customers using predictive models. Using past data of the bank’s applicants, we need to determine the factors affecting credit risk, create strategies to mitigate the acquisition risk.
-
Data understanding:
Exploratory Data Analysis (EDA), is performed in various levels such as univariate, bi-variate and multivariate to understand the data
and to understand the driving factors that impacts ‘PerformenceTag’. Below is the highlevel snapshot of given data.Demographic/applicant data: This is obtained from the information provided by the applicants at the time of credit card application. It contains customer-level information on age, gender, income, marital status, etc.
Credit Bureau Data: This is taken from the credit bureau and contains variables such as ‘number of times 30 DPD or worse in last 3/6/12 months’, ‘outstanding balance’, ‘number of trades’, etc.
-
Data preparation:
Below data quality issues:
- Credit bureau data has 3 duplicate application id’s. We have removed it.
- Applicant age ‘-3’ has treated as incorrect age.
- Gender- 2 rows have missing for gender.
- Marital Status is missing for 6 applicants.
- Number of Dependents were missing for 3 Applicants , out of which 1 applicants age is 0.
- One Applicants Income is mentioned as -0.5.
- Education details are missing for 119 applicants.
- Profession information is missing for 14 Applicants.
- Type of Residence is missing for 8 Applicants.
- Performance Tag for 2% of the total applicants is Missing.
- All the missing values except Performance Tag were replaced with WOE values.
- Missing values for Performance tag have been treated as Rejected applicants.
WoE Analysis:
Performed analysis on WoE plots, to identify the impact of PerformenceTag on each of the attribute in given data. Created Wieght of Evidence (WoE) values for each attribute on merged data (Demographic Data + Credit Bureau Data). Populated WoE values in given data for futher model building.
Information Value Analysis:
Created InformationValue for each of the attribute to measure the level of significancy of individual attributes on ‘PerformenceTag’.
Followed the bench mark convention, to identify the significance of attributes. -
Modeling:
-
Evaluation:
Performed a recommended split of 70:30 of the merged data (Demographic Data + Credit Bureau Data). Built Bayesian logistic regression model, to better understand the string attributes, on training data.Could achieve 90% accuracy, 88% of sensitivity and 90% of specificity. Provided the performence factors of built logistic regression model. Obtained model is further used to compute scorecard of the customer. peformed prediction on test data.
-
Deployment: