Git Product home page Git Product logo

creditr's Introduction

creditR

A Credit Risk Scoring and Validation Package

This package provides a number of R functions useful in applying the methods related to credit risk scoring. The package aims to facilitate the applications of the methods of variable analysis, variable selection, model development, model calibration, rating scale development and model validation. Through the functions defined, these methodologies can be applied quickly on all modeling data or a specific variable. The package was issued for the use of credit risk professionals. Basic level knowledge about credit risk scoring methodologies is required for use of the package.

Prerequisites

In order to install the creditR package, devtools package must also be installed. The devtools package can be installed by running the following code.

install.packages("devtools", dependencies = TRUE) 

Getting Started

creditR package can be installed using the "install_github" function found in the devtools package.

library(devtools)
devtools::install_github("ayhandis/creditR")
library(creditR)

A List of Functions

The functions available under the package are listed below.

ls("package:creditR")

 [1] "Adjusted.Binomial.test"              "Adjusted.Herfindahl.Hirschman.Index" "Anchor.point"                       
 [4] "bayesian.calibration"                "Binomial.test"                       "chisquare.test"                     
 [7] "correlation.cluster"                 "Gini.univariate"                     "Gini.univariate.data"               
[10] "Gini_elimination"                    "Herfindahl.Hirschman.Index"          "IV.calc"                            
[13] "IV.calc.data"                        "IV_elimination"                      "k.fold.cross.validation.glm"        
[16] "Kolmogorov.Smirnov"                  "master.scale"                        "missing_elimination"                
[19] "missing_ratio"                       "na_checker"                          "na_filler_contvar"                  
[22] "PSI.calc"                            "PSI.calc.data"                       "regression.calibration"             
[25] "scaled.score"                        "SSI.calc"                            "SSI.calc.data"                      
[28] "train_test_balanced_split"           "train_test_split"                    "variable.clustering"                
[31] "variable.clustering.gini"            "vif.calc"                            "woe.get.clear.data"                 
[34] "woe.glm.feature.importance"          "woe.table.calc"                     

An Application of the Package

An example application of the package is shared below in a study of how some common steps in credit risk scoring are carried out using the functions provided in the package.

#This R script is designed to make the creditR package easier to understand.
#Obtaining a high accuracy model is not within the scope of this study.

#Attaching the library
library(creditR)

#Model data and data structure
data("germancredit")
str(germancredit)

#Preparing a sample data set.
sample_data <- germancredit[,c("duration.in.month","credit.amount","installment.rate.in.percentage.of.disposable.income",
                               "age.in.years","creditability")]

#Converting the ‘Creditability’ (default flag) variable into numeric type.
sample_data$creditability <- ifelse(sample_data$creditability == "bad",1,0)

#Calculating the missing ratios.
missing_ratio(sample_data)

#Splitting the data into train and test sets.
traintest <- train_test_split(sample_data,123,0.70)
train <- traintest$train
test <- traintest$test

#Applying WOE transformation on the variables.
woerules <- woe.binning(df = train,target.var = "creditability",pred.var = train,event.class = 1)
train_woe <- woe.binning.deploy(train, woerules, add.woe.or.dum.var='woe')

#Creating a dataset with the transformed variables and default flag.
train_woe <- woe.get.clear.data(train_woe,default_flag = "creditability",prefix = "woe")

#Applying the WOE rules used on the train data to the test data.
test_woe <- woe.binning.deploy(test, woerules, add.woe.or.dum.var='woe')
test_woe <- woe.get.clear.data(test_woe,default_flag = "creditability",prefix = "woe")

#Performing the IV and Gini calculations for the whole data set.
IV.calc.data(train_woe,"creditability")
Gini.univariate.data(train_woe,"creditability")

#Creating a new dataset by Gini elimination. IV elimination is also possible.
eliminated_data <- Gini_elimination(train_woe,"creditability",0.10)
str(eliminated_data)

#A demonstration of the functions useful in performing Clustering.
clustering_data <- variable.clustering(eliminated_data,"creditability", 2)
variable.clustering.gini(eliminated_data,"creditability", 2) # Returns the data for variables that have the maximum gini value in the dataset.
correlation.cluster(eliminated_data,clustering_data,variables = "variable",clusters = "Group")

#Creating a logistic regression model of the data.
model= glm(formula = creditability ~ .,
           family = binomial(link = "logit"),
           data = eliminated_data)
summary(model)

#Calculating variable weights. 
woe.glm.feature.importance(eliminated_data,model,"creditability")

#Generating the PD values for the train and test data.
ms_train_data <- cbind(eliminated_data,model$fitted.values)
ms_test_data <- cbind(test_woe[,colnames(eliminated_data)], predict(model,type = "response",newdata = test_woe))
colnames(ms_train_data) <- c("woe.duration.in.month.binned","woe.age.in.years.binned","woe.installment.rate.in.percentage.of.disposable.income.binned","creditability","PD")
colnames(ms_test_data) <- c("woe.duration.in.month.binned","woe.age.in.years.binned","woe.installment.rate.in.percentage.of.disposable.income.binned","creditability","PD")

#An example application of the Regression calibration method. The model is calibrated to the test_woe data.
regression.calibration(model,test_woe,"creditability")

#Creating a master scale.
master_scale <- master.scale(ms_train_data,"creditability","PD")


#Calibrating the master scale and the modeling data to the default rate of 5% using the bayesian calibration method.
ms_train_data$Score = log(ms_train_data$PD/(1-ms_train_data$PD)) 
ms_test_data$Score = log(ms_test_data$PD/(1-ms_test_data$PD)) 
bayesian_method <- bayesian.calibration(data = master_scale,average_score ="Score",total_observations = "Total.Observations",PD = "PD",central_tendency = 0.05,calibration_data = ms_train_data,calibration_data_score ="Score")

#After calibration, the information and data related to the calibration process can be obtained as follows.
bayesian_method$Calibration.model
bayesian_method$Calibration.formula
bayesian_method$data
bayesian_method$calibration_data

#The Scaled score can be created using the following function.
scaled.score(bayesian_method$calibration_data, "calibrated_pd", 3000, 15)

#Note : Model calibration is performed for illustration only. Model validation tests proceed through the original master scale as follows.

#Calculating the Vif values of the variables.
vif.calc(model)

#Calculating the Gini for the model.
Gini(model$fitted.values,ms_train_data$creditability)

#Performing the 5 Fold cross validation.
k.fold.cross.validation.glm(ms_train_data,"creditability",5,1)

#The KS test is performed on the distributions of the estimates for good and bad observations.
Kolmogorov.Smirnov(ms_train_data,"creditability","PD")
Kolmogorov.Smirnov(ms_test_data,"creditability","PD")

#Variable stabilities are measured.
SSI.calc.data(train_woe,test_woe,"creditability")

#The HHI test is performed to measure the concentration of the master scale.
Herfindahl.Hirschman.Index(master_scale,"Total.Observations")

#Performing the Anchor point test.
Anchor.point(master_scale,"PD","Total.Observations",0.30)

#The Chi-square test is applied on the master scale.
chisquare.test(master_scale,"PD","Bad.Count","Total.Observations",0.90)

#The Binomial test is applied on the master scale.
master_scale$DR <- master_scale$Bad.Count/master_scale$Total.Observations
Binomial.test(master_scale,"Total.Observations","PD","DR",0.90,"one")

Bug Fixes

Please inform me about the errors you have encountered while using the package via the e-mail address that is shared in the Author section.

Author

License

This project is licensed under the GPL-2 - See the LICENSE.md file for details

Built With

creditr's People

Contributors

ayhandis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.