This is an add-on package to the monobin
package that simplifies its use. The goal of monobin
is to perform monotonic binning of numeric risk factor in credit
rating models (PD, LGD, EAD) development. All functions handle both binary and continuous target variable. Missing values and other possible special values are treated
separately from so-called complete cases.
monobinShiny
provides shiny-based user interface (UI) to monobin package and it can be especially handy for less experienced R users as well as for those who intend to
perform quick scanning of numeric risk factors when building credit rating models. The additional functions implemented in monobinShiny
that do no exist in monobin
package are: descriptive statistics, special case and outliers imputation. The function descriptive statistics is exported and can be used in R sessions independently from the
user interface, while the special case and the outlier imputation functions are written to be used with shiny UI.
User can install the released version of monobinShiny from CRAN executing the following line of code:
install.packages("monobinShiny")
Additionally, development version can be installed using:
library(devtools)
install_github("andrija-djurovic/monobinShiny")
After installation, to start monobinShiny
application, just type:
suppressMessages(library(monobinShiny))
monobinShinyApp()
If the application is installed and started properly, the following should appear in web browser:
The application consists of three modules:
- data manager;
- descriptive statistics and imputation;
- monotonic binning.
The following sections provide short descriptions of the each module.
ℹ️ Almost all reactive elements of the application result with a notification in the lower right corner.
This module serves for data import: the manual import browsing for a file (only .csv files accepted) or the automatic import of gcd
from monobin
package (Import dummy
data).
During the manual data import, set of checks are performed such as: file extension, approprietness of .csv file and the number of identified numeric variables. If data are imported successfully, in Data Import log output, overview of the data structure will be presented along with information about identified numeric/categorical variables.
⚠️ Be aware that only variables identified as of numeric type will be processed for the other two modules.
This module covers the standard steps of univariate and part of bivariate analysis in model development supplemented by the simple options (mean or median) for imputation of special case values
and outlier imputation based on selected percentiles thresholds.
Before running any of the imputation procedures, target variable needs to be selected:
After selecting target variable, usually imputation procedures are run. If the imputation procedures are not compulsory, user can skip this step and move to the next module.
⚠️ Be aware that imputation procedures will create and add a new risk factor to imported data set. The special case values imputation will add the new risk factor names asselected risk factor + _sc_ + selected imputation method
. Example: if user selects risk factor age and mean as the imputation method, the new risk factor will be added as age_sc_mean. The same procedure will run for the outlier imputation adding a new risk factor asselected risk factor + _out_ + selected upper percentile + _ + selected lower percentile
(e.g. age_out_0.99_0.01). Special attention should be paid when data set contains more risk factors, because the final number of risk factors can increase significantly using imputations.
ℹ️ In the case when imputation values cannot be calculated, download buttons will appear providing possibility to the user to download and check for which risk factors inputs are not properly defined (all special case values and/or special case values to be imputed). Both fields, the all special case values and the special case values to be imputed should be defined as a list of numeric values (or values that can coerce to numeric including NA) separated by comma (,).
Ultimate goal of this module is to create report of descriptive statistics. Image below presents example of descriptive report. Details on calculated metrics can be found in
the help page of the function desc.stat (?desc.stat)
.
As it can be seen, user has a possibility to download the descriptive statistics report as well as data set used for its creation. If the imputation procedures are run, the
data set will contain added risk factors (.csv files).
Monotonic binning module reflects the main purpose of this package - interface to monobin
package. Similar to the previous module, user should first select the target
variable, then the risk factors ready for binning (if imputation is performed, the list of available risk factors will contain newly created risk factors) and finally define
arguments of selected binning algorithm. Available binning algorithms are those implemented in monobin
package.
Running the binning procedure will result in summary table of processed risk factors and transformed data set. Both outputs can be downloaded as .csv files.
As already stated, almost every reactive element of the application produces notification output (lower righ corner). An example of the error notification for trying to import file other than .csv is presented in the following image:
Below is the list of implemented data checks:
- if imported file has at least two numeric variables;
- if target variable is selected when running the imputation and report procedures;
- if risk factors are selected when running imputation and report procedures;
- if special case values are defined properly;
- if percentile bounds (upper and lower) for outlier imputation are defined properly;
- if numeric inputs for binning algorithms are defined properly;
- if binary type of target is specified as 0/1 variable.