Git Product home page Git Product logo

Comments (2)

danielhomola avatar danielhomola commented on July 25, 2024

Hi,

What't the dimension of X and y? Are you sure they're both numpy arrays?

from boruta_py.

mavillan avatar mavillan commented on July 25, 2024

Hi Daniel,

I have the same problem as @robinbing. Here is my test code

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from boruta_py import BorutaPy

# load X and y
# NOTE BorutaPy accepts numpy arrays only, hence the .values attribute
#X = pd.read_csv('my_X_table.csv', index_col=0).values
#y = pd.read_csv('my_y_vector.csv', index_col=0).values
X = 10*np.random.random((1000,210))
y = np.zeros(1000, dtype=int)
y[np.random.random(1000) >= 0.5] = 1 


# define random forest classifier, with utilising all cores and
# sampling in proportion to y labels
rf = RandomForestClassifier(n_jobs=-1, class_weight='auto', max_depth=5)

# define Boruta feature selection method
feat_selector = BorutaPy(rf, n_estimators='auto', verbose=2, max_iter=1000)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
feat_selector.support_

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

it's basically your same example code, but with randomly generated data. Here is the error:

Traceback (most recent call last):
  File "boruta_example.py", line 23, in <module>
    feat_selector.fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 191, in fit
    return self._fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 325, in _fit
    iter_ranks = self._nanrankdata(imp_history_rejected, axis=1)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 493, in _nanrankdata
    ranks = sp.stats.mstats.rankdata(np.ma.masked_invalid(X), axis=axis)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 260, in rankdata
    return ma.apply_along_axis(_rank1d,axis,data,use_missing).view(ndarray)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/numpy/ma/extras.py", line 394, in apply_along_axis
    res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 248, in _rank1d
    for r in repeats[0]:
TypeError: iteration over a 0-d array

It seems an error of SciPy's rankdata function.

Note: It was tested on Anaconda's Python2 and Python3

from boruta_py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.