dear, presently I am working with large datasets with high dimensional (1459 featu

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

how smote_variants work with incremental classifier with large amount of data about smote_variants HOT 3 OPEN

arjunpuri7 commented on June 22, 2024

how smote_variants work with incremental classifier with large amount of data

from smote_variants.

Comments (3)

gykovacs commented on June 22, 2024

Hi @arjunpuri7,

in my impression, 20 billions of instances of ~~1500 features (altogether 30 trillions of numbers~~120 terabytes) is far beyond the capabilities of sklearn-related techniques. partial_fit could be used, but as a matter of fact, smote_variants is not prepared for this load of data. Imbalanced datasets are usually much smaller, and SMOTE techniques are developed for these relatively small datasets.

What is the imbalance rate (#negative/#positive) in your dataset? I would guess, many of your records are redundant, do not add much information to the classification process. Subsampling would make it more easy to handle without a significant loss of information.

from smote_variants.

arjunpuri7 commented on June 22, 2024

sir,
I am trying to work with dask library and want to use smote_variants. Data is about some drugs and try to work with imbalance ratio. whole datasets is not load into memory at once, so, I am trying to load data with dask dataframe and want to use smote_variants library to work with datasets with small chuncks of main datasets. If I try to reduce the instances of my datasets then it will refect my study. please help me out.

from smote_variants.

gykovacs commented on June 22, 2024

Hi @arjunpuri7 , I hope you managed to overcome the problem. Personally I do not think that oversampling is meaningful to be applied to your huge amount of data, I think some reliable downsampling is what you need. Can we close this issue?

from smote_variants.

Recommend Projects

how smote_variants work with incremental classifier with large amount of data about smote_variants HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent