scikit-learn-contrib,boruta_py

Comments (7)

tagomatech commented on July 25, 2024

Can you provide a bit more background on this issue such as function used, parameters, dataset, .. ?

from boruta_py.

tagomatech commented on July 25, 2024

OK. I got myself that same error message. My understanding is that it is numpy sending a warning while doing comparisons with NaN. Looking at the code in boruta_py.py, this suggests that due to your data and/or your parameters, no feature is better than than the best shadow.

Can you confirm that the attribute n_features_ of your Boruta object returned 0?

from boruta_py.

flaviozamponi commented on July 25, 2024

Dear all,
I get the same warning using this simple script (synthetic data from sklearn)

`from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.datasets import make_regression
from boruta import BorutaPy

Xdum, ydum = make_regression(n_samples = 100, n_features=50,
n_informative=10, bias = 150.0,
noise = 30, random_state=0)
rf = RandomForestRegressor(n_jobs=-1, max_depth=10)

feat_selector = BorutaPy(rf, n_estimators=1000, perc=100, max_iter=20, verbose=2)
feat_selector.fit(Xdum,ydum)`

Please note that I set n_informative to 10 and at the end Boruta finds indeed 7 relevant features: feat_selector.n_features_ returns 7. The problem appears also with max_iter=100.
I'm using sklearn 0.19.1 and numpy 1.13.3

from boruta_py.

tagomatech commented on July 25, 2024

Note that the higher max_iter, the more likely you get this error message.
BTW I fixed the problem. Please see my pull request. Your snippet code runs with no warning message applying this small code change.

from boruta_py.

flaviozamponi commented on July 25, 2024

Thanks a lot!

from boruta_py.

danielhomola commented on July 25, 2024

Thanks @tagomatech ,accepted the PR.

from boruta_py.

Saravji commented on July 25, 2024

Please re-open this issue, as the proposed (and implemented) solution introduces an error:
The moment NaNs are encountered, this solution behaves in the following way:
(The array printed out is hits right after the assignment in question):
Note: I am working with the Madalon example notebook in the package.

Referred to as code variant A:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 9 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 10 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

Instead of using the actual features, it uses the n first not rejected features.
For contrast, same pipeline, only using the replaced code:

Referred to as code variant B:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

  print(hits)
[ 28  48  64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 455
 472 475 493]
Iteration: 	9 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478
~~~/boruta_py.py:421: RuntimeWarning: invalid value encountered in greater
  print(hits)
[ 28  48  64 105 128 153 241 281 318 336 338 378 433 442 451 453 455 472
 475 493]
Iteration: 	10 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478

Line number is off, as I have both code snippets in the file.

The results of both runs are significantly different:
Variant A terminates repeatable after 34 iterations, results vary between 1 or 2 accepted and remainder rejected features.
Variant B terminates after 100 iterations with 21 accepted and 1 or 2 tentative features.

from boruta_py.

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] about boruta_py HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent