Comments (3)
Hi, only a handful of oversampling techniques considers categorical variables, and even so, it is not implemented in the smote-variants package. Most of the oversampling techniques operate in the Euclidean space, treating all attributes continuous. A commonly followed way to use oversampling techniques with categorical variables is encoding the categorical variables, for exampleyusing one-hot encoding. Then, oversamoling techniques might end up in feature values which are fractional numbers, but from the regression point of view it is not a problem as it just expresses that the samole might be somewhere between the two categories.
Alternatively, omce the one-hot encoding is done and the oversampling is applied, you might convert the oversampled fractional values to crisp binary ones to keep the categorical nature.
from smote_variants.
SMOTENC is just a hack to apply SMOTE to categorical data. If you encode your categorical features by one-hot encoding and standardize the continuous features to have the standard deviation 1, vanilla SMOTE and all other smote variants (including DEAGO) will operate in the same metric space as SMOTENC. So there is no need for special arguments to pass categorical features, you just need to encode them properly.
from smote_variants.
Since I found SMOTENC from imbalanced learn library which can take cat_feature index as input, I thought this libraray too have some attributes to mention about the cat_features.
from smote_variants.
Related Issues (20)
- Minimum number of rows in a class HOT 1
- when use SOMO,Why did the two types of samples not reach a balance and the number did not change HOT 2
- provided out is the wrong size for the reduction
- Categorical Variables HOT 1
- How to vary the "proportion" parameter - MulticlassOversampling class
- Why I get this error when I use smote_variants? HOT 9
- Could I apply this package to the time-series raw data?
- Question HOT 2
- Question: Combining these with Undersampling HOT 3
- Question: Regarding time complexity of Oversamplers and "Noise Filters" HOT 1
- GridSearchCV classifier parameters: int vs list HOT 3
- Implement 'verbose' parameter (feature request) HOT 2
- sv.MulticlassOversampling error for getattr() function HOT 2
- Error: Dimension of X_train and y_train is not the same ! HOT 2
- OversamplingClassifier does not work with probability-based metrics HOT 3
- Support for python 3.11 HOT 1
- Remove warnings
- Can smote_variants deal with 3_class data?
- I got this error when I used polynom_fit_SMOTE.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smote_variants.