I am familiar with AUC: area under curve of ROC, acc: accuracy, F1: f1 score. But bit

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks a lot for this information. <span class="email-hidden-toggl

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I want to know about metrics used for measuring in evaluate_oversamplers about smote_variants HOT 10 CLOSED

arjunpuri7 commented on September 23, 2024

I want to know about metrics used for measuring in evaluate_oversamplers

from smote_variants.

Comments (10)

gykovacs commented on September 23, 2024

Hi @arjunpuri7, sure,

gacc refers to the geometric mean of accuracies computed for the individual classes. For example, having TP, FP, TN, FN true positives, false positives, true negatives and false negatives, the accuracy of the positive class only (also called sensitivity - SENS) is SENS= TP/(TP + FN); while the accuracy of the negative class only (also called the specificity - SPEC) is SPEC= TN/(TN + FP), then, GACC is the geometric mean of these class specific accuracies, that is, GACC = SQRT(SENS*SPEC). This score is expected to take into account class imbalance as the "accuracy" of predicing positive samples (SENS) is taken into account with the same weight as the "accuracy" of predicting negative samples (SPEC). For more, see https://stats.stackexchange.com/questions/235710/auc-geometric-mean-for-classifying-imbalanced-classes
Brier-score is basically the mean squared error of predicting probabilities, that is, the difference of the predicted positive class probability and the observed probability (0/1) is taken, squared and averaged. For more, see https://en.wikipedia.org/wiki/Brier_score
You can also take a look on this one, giving an overview of performance measures commonly accepted in imbalanced learning: https://www.researchgate.net/publication/267671515_Learning_from_Imbalanced_Data_Evaluation_Matters

from smote_variants.

arjunpuri7 commented on September 23, 2024

Thanks

…

On Tue, Jul 9, 2019, 3:01 PM György Kovács ***@***.***> wrote: Hi @arjunpuri7 <https://github.com/arjunpuri7>, sure, - gacc refers to the geometric mean of accuracies computed for the individual classes. For example, having TP, FP, TN, FN true positives, false positives, true negatives and false negatives, the accuracy of the positive class only (also called sensitivity - SENS) is SENS= TP/(TP + FN); while the accuracy of the negative class only (also called the specificity - SPEC) is SPEC= TN/(TN + FP), then, GACC is the geometric mean of these class specific accuracies, that is, GACC = SQRT(SENS*SPEC). This score is expected to take into account class imbalance as the "accuracy" of predicing positive samples (SENS) is taken into account with the same weight as the "accuracy" of predicting negative samples (SPEC). For more, see https://stats.stackexchange.com/questions/235710/auc-geometric-mean-for-classifying-imbalanced-classes - Brier-score is basically the mean squared error of predicting probabilities, that is, the difference of the predicted positive class probability and the observed probability (0/1) is taken, squared and averaged. For more, see https://en.wikipedia.org/wiki/Brier_score - You can also take a look on this one, giving an overview of performance measures commonly accepted in imbalanced learning: https://www.researchgate.net/publication/267671515_Learning_from_Imbalanced_Data_Evaluation_Matters — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gykovacs/smote_variants/issues/7?email_source=notifications&email_token=AIPQSEJTWIS5AUERZERY2ADP6RLGRA5CNFSM4H7BS7OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZPWIDI#issuecomment-509568013>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIPQSELPCH4BREAJKKYWZSTP6RLGRANCNFSM4H7BS7OA> .

from smote_variants.

arjunpuri7 commented on September 23, 2024

I want to know a little more about the evaluator_Oversamplers
as they are performing oversampling using stratified cross sampling:

is oversampling is applied to only training part of datasets in cross validation or it may use to apply oversampling on whole datasets as preprocessing??

from smote_variants.

gykovacs commented on September 23, 2024

Hi @arjunpuri7, this is a crucial question: in each round of cross-validation, oversampling is applied ONLY to the training set. The test set (which is, say, 1/8th of the entire dataset in a given split) is NOT affected by oversampling. With this approach, we can avoid any data leakage, no information from the test set is used to influence the oversampling of the training set in the cross-validation rounds.

from smote_variants.

arjunpuri7 commented on September 23, 2024

Thanks a lot for this information.

…

On Fri, Jul 12, 2019, 6:44 PM György Kovács ***@***.***> wrote: Hi @arjunpuri7 <https://github.com/arjunpuri7>, this is a crucial question: in each round of cross-validation, oversampling is applied ONLY to the training set. The test set (which is, say, 1/8th of the entire dataset in a given split) is NOT affected by oversampling. With this approach, we can avoid any data leakage, no information from the test set is used to influence the oversampling of the training set in the cross-validation rounds. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/gykovacs/smote_variants/issues/7?email_source=notifications&email_token=AIPQSENUICGC63QABNF6G7LP7B7VDA5CNFSM4H7BS7OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZZXAZI#issuecomment-510881893>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIPQSENHLAON4VRKHQH3AO3P7B7VDANCNFSM4H7BS7OA> .

from smote_variants.

arjunpuri7 commented on September 23, 2024

hello sir,
i am facing another problem with this code when i running evaluate_oversamplers with different type of oversampling methods using different datasets. it gives me same result of my initial dataset and not work on other datasets. please help me. results of two different datasets are attach as below.
New folder.zip

from smote_variants.

gykovacs commented on September 23, 2024

Hi @arjunpuri7, could you please send over the code too?

from smote_variants.

arjunpuri7 commented on September 23, 2024

hello sir,
sorry for delay in reply. I gone to some where.I solve this problem.
but another problem is how to set evaluate_oversamplers cache in colab. if you have any idea then please share it with me.

from smote_variants.

gykovacs commented on September 23, 2024

Hi @arjunpuri7, that's fine. I have limited experience with colab. The caching mechanism should work if some path is available for the caching system. As far as I know, google drive can be attached to colab as a folder and then you can use that folder for caching. For more details, please take a look at https://gist.github.com/Joshua1989/dc7e60aa487430ea704a8cb3f2c5d6a6

from smote_variants.

arjunpuri7 commented on September 23, 2024

thank sir,
finally issue is resolved.

from smote_variants.

I want to know about metrics used for measuring in evaluate_oversamplers about smote_variants HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent