Comments (7)
Please note that you need to send in dataframes as input, not numpy arrays. You may have sent in arrays when you split the data intro train and test. Please double check. This code snippet works for me.
from featurewiz import FeatureWiz
features = FeatureWiz(corr_limit=0.70, feature_engg='', category_encoders='', dask_xgboost_flag=False, nrows=None, verbose=2)
X_train_selected = features.fit_transform(ord_train_t, y_train)
X_test_selected = features.transform(ord_test_t) # error is encountered here
features.features ### provides the list of selected features ###
Please post your sample ord_train_t and ord_test_t data sets here for me to test out. Thanks for trying out featurewiz.
Thanks
AutoViML
from featurewiz.
@AutoViML - My ord_train_t
is a dataframe. my y_train
is also a dataframe, my ord_test_t
is also a dataframe. please help me. The issue happens only when I try to transform the ord_test_t
. But yes, ord_test_t
contains only the input columns.
Please find below the sample data. I can't share original data due to confidentiality reasons. But I can say that most of my columns are ordinally encoded (except 4 columns).
ord_train_t
<style> </style>Feat1 | Feat2 | Feat3 | Feat4 | Feat5 | Feat6 | Feat7 | Feat8 | Feat9 | Feat10 | Feat11 | Feat12 | Feat13 | Feat14 | Feat15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 5 | 2 | 1200000 | 3 | 1 | 13 | 110 | 16 | 0 | 3 | 12 | 0 | 0 | 4 |
1 | 2 | 0 | 70000 | 5 | 0 | 56 | 0 | 33 | 2 | 5 | 2 | 1 | 0 | 1 |
4 | 2 | 0 | 200000 | 5 | 0 | 59 | 1 | 40 | 2 | 3 | 5 | 1 | 0 | 2 |
3 | 5 | 3 | 40000 | 5 | 1 | 31 | 0 | 0 | 2 | 5 | 12 | 1 | 0 | 8 |
2 | 5 | 2 | 200000 | 3 | 1 | 37 | 99999 | 100 | 0 | 5 | 16 | 0 | 0 | 9 |
ord_test_t
<style> </style>Feat1 | Feat2 | Feat3 | Feat4 | Feat5 | Feat6 | Feat7 | Feat8 | Feat9 | Feat10 | Feat11 | Feat12 | Feat13 | Feat14 | Feat15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | 6 | 3 | 20000 | 3 | 1 | 60.5 | 52 | 0 | 2 | 3 | 24 | 1 | 0 | 14 |
4 | 6 | 2 | 50000 | 5 | 1 | 32.88888889 | 0 | 0 | 2 | 2 | 7 | 1 | 0 | 0 |
8 | 5 | 2 | 1500000 | 6 | 1 | 26.83333333 | 12 | 76.92307692 | 1 | 1 | 24 | 0 | 0 | 1 |
4 | 6 | 1 | 280000 | 6 | 2 | 63 | 0 | 0 | 2 | 6 | 12 | 1 | 0 | 1 |
8 | 6 | 3 | 20000 | 3 | 1 | 32.5 | 0 | 0 | 2 | 3 | 11 | 0 | 0 | 7 |
my y_train is dataframe with discrete values 0 and 1. Hope this helps
from featurewiz.
@AutoViML - Hope this sample data useful for you to test. Both ord_train_t
and ord_test_t
doesn't contain the output column (label column)
from featurewiz.
Hi @SSMK-wq 👍
I have attached your dataset and my notebook to test it. There was no error at all. Can you please download this and use it in your laptop.
All_Main.zip
please download it and test it
Thanks
AutoViML
from featurewiz.
@AutoViML - I am sorry. For some reason that I am not sure why, I still get the same key error when I pass in ordinally encoded dataframes as input. Meaning,my fit
function works well. It's only my transform
on test data fails with key error. But my test data also has the same columns as train data. Anyway, I also understand for you it works fine. Am not sure, how can this be reproduced to solve this issue. Thanks for your help anyway. I will go back to manual feature selection. Moreover, when I run the fit
function, I don't see the graphs for Xgboost feature that you have in your sample code. I don't know why it is behaving unusually in my instance. Meaning, there is no Xgboost graphs, no SULOV graphs as well. Might be there is no correlation.
Anyway to be specific, the below is where I get the key error
X_test_selected = features.transform(ord_test_t)
#ord_test_t is test data (ordinally encoded) with no target column
from featurewiz.
@AutoViML Oh, I think I found what is the issue now. It was subtle. I think the package fills the spaces in column names with underscores. Meaning if my original column name is product group
, the package converts them to product_group
(with underscore in between). This results in key error (and prevents from transformation). Guess, during fit
it is automatically handled. Apologies for overlooking this
from featurewiz.
Hi @SSMK-wq 👍
I think you caught a nice bug! thanks for bringing it to my attention. I will be uploading a new version 0.1.01
Please upgrade to it via:
pip install featurewiz --upgrade
You should be able to see that fit and transform work correctly now even if the column names are changed.
Thanks
AutoViML
from featurewiz.
Related Issues (20)
- Category type, indexes don't match on AutoEncoding HOT 3
- Issue with working with Featureviz HOT 1
- Comment has incorrect code ( verbose=0. imbalanced=False [verbose=0, imbalanced=False]) HOT 1
- make tensorflow optional HOT 4
- lazytransform.py float to integer error HOT 2
- Dealing with a Numpy array as features HOT 1
- Convert binary columns to categorical HOT 1
- featurewiz ignores category columns HOT 2
- dont show chart for more than 1000 features HOT 1
- featurewiz ignores category columns with an example HOT 5
- Universal API required for smooth working HOT 4
- Unpin requirements? HOT 2
- Conda package outdated
- TypeError: expected string or bytes-like object on int type column name HOT 2
- Conflict Error Among Poetry Package Dependencies: lazytransform, tqdm, featurewiz HOT 8
- TYPO ERROR
- Typo Error
- Version Conflict for scikit-learn - Bump to 1.3.2 possible? HOT 1
- Can't get featurewiz to work HOT 1
- ValueError: Length mismatch: Expected axis has X elements, new values have Y elements HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from featurewiz.