Git Product home page Git Product logo

Comments (8)

AutoViML avatar AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
There are two mistakes you are making:

  1. You didn't need to transform the variables. For feature selection, featurewiz automatically transforms categorical variables internally to feed it to XGBoost. So you can simply remove OrdinalEncoder from your input. That should solve your first problem.
  2. You can try solving the second problem with another model if you first let featurewiz select the best variables.
    Hope this helps,
    AutoVIML

from featurewiz.

fjpa121197 avatar fjpa121197 commented on July 28, 2024

Hi @AutoViML,

But the last part, using XGBoost, it gives the following output:

[15:33:59] [C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/objective/multiclass_obj.cu:120](): SoftmaxMultiClassObj: label must be in [0, num_class).
Regular XGBoost is crashing. Returning with currently selected features...

And outputs[0] is giving the target variable only (as a dataframe).


The first suggestion solved my problem, but I'm curious when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns.

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
There is one quick and easy way to resolve this. Just change your target variable to float before feeding it to Featurewiz. If it is float, it will treat it as a Regression problem. That should work.
If you still have a problem, just cut and paste the first 10 rows of your dataset here or attach a zip file with a sample dataset and I will try to trouble shoot it.
AutoViML

from featurewiz.

fjpa121197 avatar fjpa121197 commented on July 28, 2024

Hi @AutoViML,

That did solve my problem, and was able to run the last part that without problem, thanks!

I do still have questions about this:
"when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns."

Is there any way to see if the results are different when using One-hot encoding? But to be able to see the actual features after encoding? For example:

Lets says I have a categorical column type_transportation with the following unique values: ['car', 'boat', 'bike', 'plane'], after one-hot encoding, it will create the following columns ['type_transportation_car', 'type_transportation_boat', 'type_transportation_bike'].

However, after using featurewiz, the returned features (selected) and returned like that:

['OneHotEncoder_property_type_1',
 'OneHotEncoder_property_type_6' ...

Is there any way to know the actual value or to which type does it refer to?

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
I will look into it. In the meantime, as I said earlier, you can one-hot encode categorical variables in your dataframe before you send it to featurewiz. The other option is to remove one-hot encoding from your featurewiz calling statement since featurewiz automatically transforms variables and detects which variables are important and sends you the list of features untransformed.
Check out both options.
Thanks for trying out featurewiz.
AutoViML

from featurewiz.

fjpa121197 avatar fjpa121197 commented on July 28, 2024

Hi @AutoViML,

The first option sound good for me! And I can handle the inverse/untransformation of the columns with the output from Featurewiz, and do not assume an ordinal relationship for my categorical features.

Sorry for another question, but I'm really interested and amazed by the automation part.

Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @fjpa121197 👍
Great question.

Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features?
I think it will be good to know, since the feature importance is also impacted by the estimator performance.

You should not worry too much about performance each time since Recursive XGBoost uses fewer and fewer features to use in its modeling. That means the actual performance in each round might be falling: but that is not what matters. What matters is that we need to know among the fewer variables, which one stands out as being the most important. That's why I don't show the performance since that will give a misleading picture. If you don't believe this method will work for you, the best thing to do is to compare featurewiz with other methods and see which one does feature selection better. That is one way to find out.

If this answers your question, please consider closing this issue.
Hope this helps,
AutoViML

from featurewiz.

fjpa121197 avatar fjpa121197 commented on July 28, 2024

That is understandable, I think I will compare results with other techniques.

But overall, great tool. Thanks for the help and answering these questions!

Closing this.

from featurewiz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.