Grouping-Based Synthetic Minority Oversampling Technique (briefly GB-SMOTE)

We have proposed a new oversampling algorithm called GB-SMOTE, which can circumvent the deficiency of WK-SMOTE [1] (and SMOTE as well as its variants) caused by randomly selecting some minority class samples. To design this oversampling method, we first design a simple grouping scheme that can divide the minority class into different groups by using slack variables. This grouping scheme has established a theoretical basis for selecting valuable minority class samples. Moreover, it also provides a new explanation for the poor performance of SVM on imbalanced data sets. Second, a reasonable samples selection scheme has been designed, which can avoid generating new samples in the overlapping region, and an effective samples generation scheme is proposed to generate high-quality new samples. Subsequently, an effective oversampling method GB-SMOTE is proposed. The idea of GB-SMOTE (partially selecting valuable samples) can also be applied to preprocess biased learners or to modify the input distribution in cases of a limited amount of data in semi-supervised learning. The experimental results indicate the effectiveness of the sample selection scheme and sample generation scheme in GB-SMOTE. Besides, the experiment on the real-world datasets shows that compared with all of the benchmark algorithms in both homologous and heterologous groups, GB-SMOTE outperforms them, especially on the data sets with a high imbalance ratio.

Cite us

If you find this repository helpful in your work or research, we would greatly appreciate citations to the following paper:

@article{REN20220822,
title = {Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification},
journal = {Pattern Recognition},
volume = {133},
pages = {108992},
year = {2023},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2022.108992},
url = {https://www.sciencedirect.com/science/article/pii/S0031320322004721},
author = {Jinjun Ren and Yuping Wang and Yiu-ming Cheung and Xiao-Zhi Gao and Xiaofang Guo}
}

Install

Our GB-SMOTE implementation requires following dependencies:

python (>=3.7)
numpy (>=1.11)
scipy (>=0.17)
scikit-learn (>=0.21)

git clone https://github.com/JinJunRen/GB-SMOTE

Usage

Documentation

GB-SMOTE.py

Parameters	Description
`clf`	object, (default=`sklearn.sklearn.svm.SVC()`) Built-in `fit()`, `predict()`, `predict_proba()` methods are required.
`C`	float, optional (default=100) Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

Methods	Description
`fit(self, X, y)`	Build a GB-SMOTE classifier on the training set (X, y).
`predict(self, X)`	Predict class for X.
`predict_proba(self, X)`	Predict class probabilities for X.
`score(self, X, y)`	Returns the average precision score on the given test data and labels.
`calcKxi(self,X,y)`	Calcuate the slack variables of samples X.
`partitionInstance(self,X,y)`	Divide each class into three sets, e.g., error set, margin set and safe set.
`selectInstances(self,X,in_margin,pos_safe,n)`	Select n sample-pairs from #M^+ and S^+, respectively, and create a synthesized sample set. Note that: the elements in in_margin and pos_safe are the indexs of samples in X.
`augmentKernelMatrix(self,X_len,dim,kernelmatrix,n)`	Augment kernelmatrix based on the dim(it equals the dimension of dataset, that is, the sum of the length of both training set and test set) and n (the number of the generated samples).

demorun.py

In this python script we provided an example of how to use our implementation of GB-SMOTE methods to perform classification.

Parameters	Description
`data`	String Specify a dataset.
`ker`	String,(default=`rbf`) Specify the type of kernel of SVM.
`n`	Integer,(default=`5`) Specify the number of n-fold cross-validation.

Examples

python demorun.py -data ./dataset/moon_1000_100_2.csv -n 5 
or
python demorun.py -data ./dataset/moon_1000_200_4.csv -n 5

##Dataset links: Knowledge Extraction Based on Evolutionary Learning (KEEL).

References

[1] Mathew J, Pang C K, Luo M, et al. Classification of imbalanced data by oversampling in kernel space of support vector machines[J]. IEEE transactions on neural networks and learning systems, 2017, 29(9): 4065-4076.

xpzgyy / gb-smote Goto Github PK

gb-smote's Introduction

Grouping-Based Synthetic Minority Oversampling Technique (briefly GB-SMOTE)

Cite us

Install

Usage

Documentation

Examples

References

gb-smote's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent