Git Product home page Git Product logo

Comments (7)

AutoViML avatar AutoViML commented on July 28, 2024

Please note that you need to send in dataframes as input, not numpy arrays. You may have sent in arrays when you split the data intro train and test. Please double check. This code snippet works for me.

from featurewiz import FeatureWiz
features = FeatureWiz(corr_limit=0.70, feature_engg='', category_encoders='', dask_xgboost_flag=False, nrows=None, verbose=2)
X_train_selected = features.fit_transform(ord_train_t, y_train)
X_test_selected = features.transform(ord_test_t) # error is encountered here
features.features  ### provides the list of selected features ###

Please post your sample ord_train_t and ord_test_t data sets here for me to test out. Thanks for trying out featurewiz.
Thanks
AutoViML

from featurewiz.

SSMK-wq avatar SSMK-wq commented on July 28, 2024

@AutoViML - My ord_train_t is a dataframe. my y_train is also a dataframe, my ord_test_t is also a dataframe. please help me. The issue happens only when I try to transform the ord_test_t. But yes, ord_test_t contains only the input columns.

Please find below the sample data. I can't share original data due to confidentiality reasons. But I can say that most of my columns are ordinally encoded (except 4 columns).

ord_train_t

<style> </style>
Feat1 Feat2 Feat3 Feat4 Feat5 Feat6 Feat7 Feat8 Feat9 Feat10 Feat11 Feat12 Feat13 Feat14 Feat15
2 5 2 1200000 3 1 13 110 16 0 3 12 0 0 4
1 2 0 70000 5 0 56 0 33 2 5 2 1 0 1
4 2 0 200000 5 0 59 1 40 2 3 5 1 0 2
3 5 3 40000 5 1 31 0 0 2 5 12 1 0 8
2 5 2 200000 3 1 37 99999 100 0 5 16 0 0 9

ord_test_t

<style> </style>
Feat1 Feat2 Feat3 Feat4 Feat5 Feat6 Feat7 Feat8 Feat9 Feat10 Feat11 Feat12 Feat13 Feat14 Feat15
7 6 3 20000 3 1 60.5 52 0 2 3 24 1 0 14
4 6 2 50000 5 1 32.88888889 0 0 2 2 7 1 0 0
8 5 2 1500000 6 1 26.83333333 12 76.92307692 1 1 24 0 0 1
4 6 1 280000 6 2 63 0 0 2 6 12 1 0 1
8 6 3 20000 3 1 32.5 0 0 2 3 11 0 0 7

my y_train is dataframe with discrete values 0 and 1. Hope this helps

from featurewiz.

SSMK-wq avatar SSMK-wq commented on July 28, 2024

@AutoViML - Hope this sample data useful for you to test. Both ord_train_t and ord_test_t doesn't contain the output column (label column)

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @SSMK-wq 👍
I have attached your dataset and my notebook to test it. There was no error at all. Can you please download this and use it in your laptop.
All_Main.zip
please download it and test it
Thanks
AutoViML

from featurewiz.

SSMK-wq avatar SSMK-wq commented on July 28, 2024

@AutoViML - I am sorry. For some reason that I am not sure why, I still get the same key error when I pass in ordinally encoded dataframes as input. Meaning,my fit function works well. It's only my transform on test data fails with key error. But my test data also has the same columns as train data. Anyway, I also understand for you it works fine. Am not sure, how can this be reproduced to solve this issue. Thanks for your help anyway. I will go back to manual feature selection. Moreover, when I run the fit function, I don't see the graphs for Xgboost feature that you have in your sample code. I don't know why it is behaving unusually in my instance. Meaning, there is no Xgboost graphs, no SULOV graphs as well. Might be there is no correlation.

Anyway to be specific, the below is where I get the key error

X_test_selected = features.transform(ord_test_t) #ord_test_t is test data (ordinally encoded) with no target column

from featurewiz.

SSMK-wq avatar SSMK-wq commented on July 28, 2024

@AutoViML Oh, I think I found what is the issue now. It was subtle. I think the package fills the spaces in column names with underscores. Meaning if my original column name is product group, the package converts them to product_group (with underscore in between). This results in key error (and prevents from transformation). Guess, during fit it is automatically handled. Apologies for overlooking this

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @SSMK-wq 👍
I think you caught a nice bug! thanks for bringing it to my attention. I will be uploading a new version 0.1.01

Please upgrade to it via:

pip install featurewiz --upgrade

You should be able to see that fit and transform work correctly now even if the column names are changed.

Thanks
AutoViML

from featurewiz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.