Git Product home page Git Product logo

Comments (8)

Wuuzzaa avatar Wuuzzaa commented on August 28, 2024

At the moment boruta tries to set the random state to all estimators. cuML's RF classifier do not have this parameter.

You can try a fix like with lightgbm. Something like this before the else part could help you.

if isinstance(self.estimator, cuml_type_here): pass

# make sure we start with a new tree in each iteration
if self._is_lightgbm:
self.estimator.set_params(random_state=self.random_state.randint(0, 10000))
else:
self.estimator.set_params(random_state=self.random_state)

from boruta_py.

curtisraymond avatar curtisraymond commented on August 28, 2024

Thanks @Wuuzzaa.

I made the adjustment you recommended but now I'm receiving this error: "ValueError: Only methods with feature_importance_ attribute are currently supported in BorutaPy."

Any recommendations on this issue?

from boruta_py.

Wuuzzaa avatar Wuuzzaa commented on August 28, 2024

Seems like the implementation from cuML´s random forest differs quiete a lot from sklearns. I just took a look at the docu and do not found something similar to the feature importance.
cuML Random Forest

Some kind of feature importance is necessary for boruta to determine which features are useful. I think there is no easy way to work around this issue.

from boruta_py.

lindeberg25 avatar lindeberg25 commented on August 28, 2024

@curtisraymond and @Wuuzzaa Hi ... any solution for this?

I'm going through the same problem. However, I'm getting a different error: "integer required"

Error

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/boruta/boruta_py.py in _get_imp(self, X, y)
383 try:
--> 384
385 self.estimator.fit(X, y)
randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

TypeError: an integer is required

ValueError: Please check your X and y variable. The providedestimator cannot be fitted to your data.
an integer is required

from boruta_py.

Wuuzzaa avatar Wuuzzaa commented on August 28, 2024

My blind guess would be an error on your y data? y must be integers. Did you check your X and y for compatible Data types.
For the types see: docu

from boruta_py.

lindeberg25 avatar lindeberg25 commented on August 28, 2024

Hi @Wuuzzaa ..

Thank you for the quick reply.

y are integers. It works fine when I use sklearn's RF classifier. But I get this error when I use cuML's RF classifier.

My guess is that there might be an incompatibility between cuML and BorutaPy

from boruta_py.

Wuuzzaa avatar Wuuzzaa commented on August 28, 2024

BorutaPy was never planned to be used within cuML. Seems like it still do not work. Like beckernick mentioned there is still an open Issue on cuML for the implementation of the Feature Importance which is needed for boruta to work.

from boruta_py.

beckernick avatar beckernick commented on August 28, 2024

Thanks for linking that issue @Wuuzzaa !

@lindeberg25 , we'd love to learn more about your use case and performance impact of using cuML's Random Forest vs. scikit-learn's RF. Let's continue the discussion on the linked issue.

from boruta_py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.