Git Product home page Git Product logo

Comments (9)

rsesha avatar rsesha commented on July 28, 2024 2

from featurewiz.

dougnicholson avatar dougnicholson commented on July 28, 2024 1

Nifty package! I second nemar3's suggestion of having a separate transform method.

I'd love to use this package in a production workflow, but this current limitation means that I would have to either a) retrain every time I wanted to score new data or b) reverse engineer the code that transforms my raw features into featwiz features.

from featurewiz.

rsesha avatar rsesha commented on July 28, 2024

Hi Nemar3:
These are very good suggestions. Let me see what I can do.
Thanks
Ram

from featurewiz.

rankwe avatar rankwe commented on July 28, 2024

Hi @rsesha, any progress on fit_transform(train) and transform(test) class of the featurewiz liobrary? when should we expect it.

from featurewiz.

rsesha avatar rsesha commented on July 28, 2024

from featurewiz.

chrico-bu-uab avatar chrico-bu-uab commented on July 28, 2024
class FeatureWiz(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.features = None

    def fit(self, X, y):

        # Build dataframe. The column names are numbers represented as strings
        df = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])

        # Add a column with binarized labels
        target_col = str(X.shape[1])
        df[target_col] = y

        # Select features using featurewiz
        features, _ = featurewiz(df, target_col)

        # Convert the remaining column names back to integers and drop the
        # column of labels
        self.features = [int(s) for s in np.squeeze(features)][:-1]

        return self

    def transform(self, x):
        return x[:, self.features]

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @chrico-bu-uab 👍

Hahaha. I love your thoughtfulness 💯 I would have created this kind of code (class) long long ago if you could just turn the 100's of things that featurewiz does into a transformer mixin.

There are 2 ways to solve the problem you all have mentioned here:

1. Keep featurewiz as rich and full featured as it is: You can't just throw featurewiz into an existing TranformerMixin class. It will blow up... The reason is that you will need to containerize the entire featurewiz library if you do what you are saying here. But if you think that is not a problem for many folks like yourself, I can then try to offer this version.

2. Create featurewiz-lite version: I am working on creating a new version of featurewiz based on the TransformerMixin above that takes in a dataset and completely transforms it into numeric variables and then selects the best features from it based on recursive XGBoost. It is a completely scikit-learn compatible Pipeline object. I call it featurewiz_lite. So it will work in any Python data pipeline you create. and won't need a special container However, this version cannot do SULOV since SULOV uses networkx library (graph networks) which is an extremely complicated piece of code to make it happen. Do you think that featurewiz-lite will be noteworthy and helpful to data and ML engineers?

Looking for comments and feedback on above.
AutoViML

from featurewiz.

chrico-bu-uab avatar chrico-bu-uab commented on July 28, 2024

Thanks, and point taken! I had just hacked something together (have only lightly tested it) and thought I would share. I would definitely be in favor of a featurewiz_lite version. The full version has so many "bells and whistles" (which is a great thing) that it wouldn't be a huge loss if a little of that functionality were sacrificed in order to have a scikit-learn compatible version.

As they say, "A designer knows he has achieved perfection, not when there is nothing left to add, but when there is nothing left to take away."

from featurewiz.

AutoViML avatar AutoViML commented on July 28, 2024

Hi @chrico-bu-uab, @rankwe, @nemar3:
👍

You are in luck! I was able to finally create a Transformer class out of featurewiz called FeatureWiz. You can now use it to perform feature selection using the fit and predit syntax of scikit-learn as follows:

from featurewiz import FeatureWiz
features = FeatureWiz(corr_limit=0.70, feature_engg='', category_encoders='', dask_xgboost_flag=False, nrows=None, verbose=2)
X_train_selected = features.fit_transform(X_train, y_train)
X_test_selected = features.transform(X_test)
features.features  ### provides the list of selected features ###

You will get a Transformer that can select the top variables from your dataset.

You must first upgrade your featurewiz version to 0.0.90 or higher via:

pip install featurewiz --upgrade

Hope you like it. Please provide your feedback and comments here.
AutoViML

from featurewiz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.