Git Product home page Git Product logo

zhiningliu1998 / imbalanced-ensemble Goto Github PK

View Code? Open in Web Editor NEW
297.0 297.0 49.0 17.2 MB

🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库

Home Page: https://imbalanced-ensemble.readthedocs.io

License: MIT License

Python 100.00%
class-imbalance classification data-mining data-science ensemble ensemble-imbalanced-learning ensemble-learning ensemble-model imbalanced-classification imbalanced-data imbalanced-learning long-tail machine-learning multi-class-classification python python3 scikit-learn sklearn

imbalanced-ensemble's Introduction

Greetings! 👋 I'm Zhi-ning LIU (刘芷宁 in Chinese)

I'm a Ph.D. candidate at Department of Computer Science, University of Illinois at Urbana-Champaign. I'm interested in doing research and developing open-source softwares for unbiased, efficient, and robust learning from skewed data in real-world applications. My recent interest lies in graph data mining (ICML'24), class-imbalanced learning (ICML'24,NeurIPS'20,ICDE'20), and fairness-aware machine learning (KDD'24, FAccT'24).

Learn more about me😎:

🛸Featured Projects
⚒️IMBENS: class-imbalanced ensemble learning in Python [Python Library]
[PDF] [Documentation] [Gallery] [PyPI] [Changelog] [Zhihu/知乎]

GitHub stars GitHub forks
🚀BAT: Boost Class-imbalanced Node Classification with <10 lines of Code [ICML'24]
[PDF] [arXiv] [Github] [Zhihu/知乎]

GitHub stars GitHub forks
⚖️MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler [NeurIPS'20]
[PDF] [arXiv] [Video] [Zhihu/知乎]

GitHub stars GitHub forks
⚖️Self-paced Ensemble for Highly Imbalanced Massive Data Classification [ICDE'20]
[PDF] [arXiv] [Video] [Slides] [Zhihu/知乎]

GitHub stars GitHub forks
😎AwesomeIL: a curated list across all imbalanced/long-tailed learning topics [Awesome]
[English] [Chinese/中文] [Zhihu/知乎]

GitHub stars GitHub forks
😎Awesome Machine Learning Resources [Awesome]
[English] [Chinese/中文] [Zhihu/知乎]

GitHub stars GitHub forks
Github Stats

imbalanced-ensemble's People

Contributors

allcontributors[bot] avatar zhiningliu1998 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

imbalanced-ensemble's Issues

Bug :AttributeError: can't set attribute

hello ,when i use the code as follow,the will be some errors, EasyEnsembleClassifier was used

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from imbalanced_ensemble.ensemble import EasyEnsembleClassifier
from collections import Counter

X, y = make_classification(n_classes=2, class_sep=2,
weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
print('Original dataset shape %s' % Counter(y))

Original dataset shape Counter({{1: 900, 0: 100}})

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
bbc = EasyEnsembleClassifier(random_state=42)
bbc.fit(X_train, y_train)
EasyEnsembleClassifier(...)
y_pred = bbc.predict(X_test)
print(y_pred)

Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/pythonProject5/test-easy.py", line 16, in
bbc.fit(X_train, y_train)
File "C:\Users\Administrator\PycharmProjects\pythonProject5\venv\lib\site-packages\imbalanced_ensemble\utils_validation.py", line 602, in inner_f
return f(**kwargs)
File "C:\Users\Administrator\PycharmProjects\pythonProject5\venv\lib\site-packages\imbalanced_ensemble\ensemble\under_sampling\easy_ensemble.py", line 275, in fit
return self._fit(X, y,
File "C:\Users\Administrator\PycharmProjects\pythonProject5\venv\lib\site-packages\imbalanced_ensemble\utils_validation.py", line 602, in inner_f
return f(**kwargs)
File "C:\Users\Administrator\PycharmProjects\pythonProject5\venv\lib\site-packages\imbalanced_ensemble\ensemble_bagging.py", line 359, in fit
n_samples, self.n_features
= X.shape
AttributeError: can't set attribute

Cannot conda env export due to missing comma in INSTALL_REQUIRES list

You have a missing comma in the INSTALL_REQUIRES list which breaks conda env export.

image

Context

I'm trying to export my conda environment, but it fails.

conda env export > environment.yaml

InvalidVersionSpec: Invalid version '1.1.3joblib>=0.11': invalid character(s)

After some time digging around I find the cause using the command

# grep -rl "1.1.3joblib>=0.11" /opt/conda/envs/mt-base
/opt/conda/envs/mt-base/lib/python3.10/site-packages/imbalanced_ensemble-0.1.7.dist-info/METADATA

and I end up here :-)

Your package imbalanced-ensemble is a dependency of autoviml which is in my environment.yaml file.

Does the model support running on GPU?

Hello, I must say that IMBENS is an exceptional machine learning library. I am currently utilizing it to handle classification tasks for an extensive dataset. However, it appears that IMBENS lacks support for model migration to the GPU 'SelfPacedEnsembleClassifier' object has no attribute 'to'.

ENH add early_termination control for boosting-based methods

The early termination in sklearn.ensemble.AdaBoostClassifier may be too strict under certain scenarios (only 1 base classifier is trained), which greatly hinders the performance of boosting-based ensemble imbalanced learning methods.

It should make more sense to add a parameter that allows the user to decide whether to enable strict early termination.

AttributeError: 'Pipeline' object has no attribute '_check_fit_params'

Zhining,
Sorry to bother you, but when I use EasyEnsembleClassifier in imbens.ensemble, I got the error "AttributeError: 'Pipeline' object has no attribute '_check_fit_params'", I can't figure out why this happened, could u please help me with it, the code is as follows:


from imbens.ensemble import EasyEnsembleClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

X, y = make_classification(n_classes=2, class_sep=2, n_features=100,
weights=[0.3,0.7], n_informative=2, n_redundant=1, flip_y=0.01,
n_clusters_per_class=1, n_samples=10000, random_state=10)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.5, random_state=42)

init_kwargs1 = {'estimator': DecisionTreeClassifier(),}

easyens = EasyEnsembleClassifier(**init_kwargs1).fit(X_train, y_train)
pred = easyens.predict(X_valid)


The errors are as follows:

Traceback (most recent call last):
File "f:\MYT\EEGP\smote_compare.py", line 176, in
easyens = EasyEnsembleClassifier(**init_kwargs1).fit(X_train, y_train)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\utils_validation.py", line 604, in inner_f
return f(**kwargs)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\ensemble_under_sampling\easy_ensemble.py", line 285, in fit
return self._fit(
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\utils_validation.py", line 604, in inner_f
return f(**kwargs)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\ensemble_bagging.py", line 480, in _fit
all_results = Parallel(
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\sklearn\utils\parallel.py", line 67, in call
return super().call(iterable_with_config)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\joblib\parallel.py", line 1863, in call
return output if self.return_generator else list(output)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\joblib\parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\sklearn\utils\parallel.py", line 129, in call
return self.function(*args, **kwargs)
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\ensemble_bagging.py", line 163, in _parallel_build_estimators
estimator.fit((X[indices])[:, features], y[indices])
File "D:\ProgramData\anaconda3\envs\EEG\lib\site-packages\imbens\pipeline.py", line 273, in fit
fit_params_steps = self._check_fit_params(**fit_params)
AttributeError: 'Pipeline' object has no attribute '_check_fit_params'

packages version:
scikit-learn 1.4.0
imbalanced-ensemble 0.2.1

Supports the generation of PMML files

Hi, IMBENS is a great machine learning library, but I have a problem now, I want to deploy self_paced_ensemble algorithm as online services to production environment, how to generate PMML files successfully. I have tried Nyoka and sklearn2pmml so far, but all failed.

[major] access all samplers directly from `imbens.sampler`

access all samplers directly from imbens.sampler module.

  • rename imbens.sampler.over_sampling -> imbens.sampler._over_sampling
  • rename imbens.sampler.under_sampling -> imbens.sampler._under_sampling

This will cause incompatibility with imbalanced-ensemble<=0.1.7.

secondary development

Hi,if i wanna realize my new undersampling idea in Bagging classifiers on the package,where should I add my undersampling strategy? thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.