Hello, I'm testing featurewiz with a dataframe with numerical and categorical variable

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[QUESTION] untransform encoded categorical values and change type of problem about featurewiz HOT 8 CLOSED

fjpa121197 commented on July 28, 2024

[QUESTION] untransform encoded categorical values and change type of problem

from featurewiz.

Comments (8)

AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
There are two mistakes you are making:

You didn't need to transform the variables. For feature selection, featurewiz automatically transforms categorical variables internally to feed it to XGBoost. So you can simply remove OrdinalEncoder from your input. That should solve your first problem.
You can try solving the second problem with another model if you first let featurewiz select the best variables.
Hope this helps,
AutoVIML

from featurewiz.

fjpa121197 commented on July 28, 2024

Hi @AutoViML,

But the last part, using XGBoost, it gives the following output:

[15:33:59] [C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/objective/multiclass_obj.cu:120](): SoftmaxMultiClassObj: label must be in [0, num_class).
Regular XGBoost is crashing. Returning with currently selected features...

And outputs[0] is giving the target variable only (as a dataframe).

The first suggestion solved my problem, but I'm curious when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns.

from featurewiz.

AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
There is one quick and easy way to resolve this. Just change your target variable to float before feeding it to Featurewiz. If it is float, it will treat it as a Regression problem. That should work.
If you still have a problem, just cut and paste the first 10 rows of your dataset here or attach a zip file with a sample dataset and I will try to trouble shoot it.
AutoViML

from featurewiz.

fjpa121197 commented on July 28, 2024

Hi @AutoViML,

That did solve my problem, and was able to run the last part that without problem, thanks!

I do still have questions about this:
"when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns."

Is there any way to see if the results are different when using One-hot encoding? But to be able to see the actual features after encoding? For example:

Lets says I have a categorical column type_transportation with the following unique values: ['car', 'boat', 'bike', 'plane'], after one-hot encoding, it will create the following columns ['type_transportation_car', 'type_transportation_boat', 'type_transportation_bike'].

However, after using featurewiz, the returned features (selected) and returned like that:

['OneHotEncoder_property_type_1',
 'OneHotEncoder_property_type_6' ...

Is there any way to know the actual value or to which type does it refer to?

from featurewiz.

AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
I will look into it. In the meantime, as I said earlier, you can one-hot encode categorical variables in your dataframe before you send it to featurewiz. The other option is to remove one-hot encoding from your featurewiz calling statement since featurewiz automatically transforms variables and detects which variables are important and sends you the list of features untransformed.
Check out both options.
Thanks for trying out featurewiz.
AutoViML

from featurewiz.

fjpa121197 commented on July 28, 2024

Hi @AutoViML,

The first option sound good for me! And I can handle the inverse/untransformation of the columns with the output from Featurewiz, and do not assume an ordinal relationship for my categorical features.

Sorry for another question, but I'm really interested and amazed by the automation part.

Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.

from featurewiz.

AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
Great question.

Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.

You should not worry too much about performance each time since Recursive XGBoost uses fewer and fewer features to use in its modeling. That means the actual performance in each round might be falling: but that is not what matters. What matters is that we need to know among the fewer variables, which one stands out as being the most important. That's why I don't show the performance since that will give a misleading picture. If you don't believe this method will work for you, the best thing to do is to compare featurewiz with other methods and see which one does feature selection better. That is one way to find out.

If this answers your question, please consider closing this issue.
Hope this helps,
AutoViML

from featurewiz.

fjpa121197 commented on July 28, 2024

That is understandable, I think I will compare results with other techniques.

But overall, great tool. Thanks for the help and answering these questions!

Closing this.

from featurewiz.

[QUESTION] untransform encoded categorical values and change type of problem about featurewiz HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent