Comments (8)
Hi @fjpa121197 👍
There are two mistakes you are making:
- You didn't need to transform the variables. For feature selection, featurewiz automatically transforms categorical variables internally to feed it to XGBoost. So you can simply remove
OrdinalEncoder
from your input. That should solve your first problem. - You can try solving the second problem with another model if you first let featurewiz select the best variables.
Hope this helps,
AutoVIML
from featurewiz.
Hi @AutoViML,
But the last part, using XGBoost, it gives the following output:
[15:33:59] [C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/objective/multiclass_obj.cu:120](): SoftmaxMultiClassObj: label must be in [0, num_class).
Regular XGBoost is crashing. Returning with currently selected features...
And outputs[0] is giving the target variable only (as a dataframe).
The first suggestion solved my problem, but I'm curious when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns.
from featurewiz.
Hi @fjpa121197 👍
There is one quick and easy way to resolve this. Just change your target variable to float before feeding it to Featurewiz. If it is float, it will treat it as a Regression problem. That should work.
If you still have a problem, just cut and paste the first 10 rows of your dataset here or attach a zip file with a sample dataset and I will try to trouble shoot it.
AutoViML
from featurewiz.
Hi @AutoViML,
That did solve my problem, and was able to run the last part that without problem, thanks!
I do still have questions about this:
"when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns."
Is there any way to see if the results are different when using One-hot encoding? But to be able to see the actual features after encoding? For example:
Lets says I have a categorical column type_transportation
with the following unique values: ['car', 'boat', 'bike', 'plane']
, after one-hot encoding, it will create the following columns ['type_transportation_car', 'type_transportation_boat', 'type_transportation_bike']
.
However, after using featurewiz, the returned features (selected) and returned like that:
['OneHotEncoder_property_type_1',
'OneHotEncoder_property_type_6' ...
Is there any way to know the actual value or to which type does it refer to?
from featurewiz.
Hi @fjpa121197 👍
I will look into it. In the meantime, as I said earlier, you can one-hot encode categorical variables in your dataframe before you send it to featurewiz. The other option is to remove one-hot encoding from your featurewiz calling statement since featurewiz automatically transforms variables and detects which variables are important and sends you the list of features untransformed.
Check out both options.
Thanks for trying out featurewiz.
AutoViML
from featurewiz.
Hi @AutoViML,
The first option sound good for me! And I can handle the inverse/untransformation of the columns with the output from Featurewiz, and do not assume an ordinal relationship for my categorical features.
Sorry for another question, but I'm really interested and amazed by the automation part.
Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.
from featurewiz.
Hi @fjpa121197 👍
Great question.
Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.
You should not worry too much about performance each time since Recursive XGBoost uses fewer and fewer features to use in its modeling. That means the actual performance in each round might be falling: but that is not what matters. What matters is that we need to know among the fewer variables, which one stands out as being the most important. That's why I don't show the performance since that will give a misleading picture. If you don't believe this method will work for you, the best thing to do is to compare featurewiz
with other methods and see which one does feature selection better. That is one way to find out.
If this answers your question, please consider closing this issue.
Hope this helps,
AutoViML
from featurewiz.
That is understandable, I think I will compare results with other techniques.
But overall, great tool. Thanks for the help and answering these questions!
Closing this.
from featurewiz.
Related Issues (20)
- Category type, indexes don't match on AutoEncoding HOT 3
- Issue with working with Featureviz HOT 1
- Comment has incorrect code ( verbose=0. imbalanced=False [verbose=0, imbalanced=False]) HOT 1
- make tensorflow optional HOT 4
- lazytransform.py float to integer error HOT 2
- Dealing with a Numpy array as features HOT 1
- Convert binary columns to categorical HOT 1
- featurewiz ignores category columns HOT 2
- dont show chart for more than 1000 features HOT 1
- featurewiz ignores category columns with an example HOT 5
- Universal API required for smooth working HOT 4
- Unpin requirements? HOT 2
- Conda package outdated
- TypeError: expected string or bytes-like object on int type column name HOT 2
- Conflict Error Among Poetry Package Dependencies: lazytransform, tqdm, featurewiz HOT 8
- TYPO ERROR
- Typo Error
- Version Conflict for scikit-learn - Bump to 1.3.2 possible? HOT 1
- Can't get featurewiz to work HOT 1
- ValueError: Length mismatch: Expected axis has X elements, new values have Y elements HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from featurewiz.