Comments (9)
from featurewiz.
Nifty package! I second nemar3's suggestion of having a separate transform method.
I'd love to use this package in a production workflow, but this current limitation means that I would have to either a) retrain every time I wanted to score new data or b) reverse engineer the code that transforms my raw features into featwiz features.
from featurewiz.
Hi Nemar3:
These are very good suggestions. Let me see what I can do.
Thanks
Ram
from featurewiz.
Hi @rsesha, any progress on fit_transform(train) and transform(test) class of the featurewiz liobrary? when should we expect it.
from featurewiz.
from featurewiz.
class FeatureWiz(BaseEstimator, TransformerMixin):
def __init__(self):
self.features = None
def fit(self, X, y):
# Build dataframe. The column names are numbers represented as strings
df = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
# Add a column with binarized labels
target_col = str(X.shape[1])
df[target_col] = y
# Select features using featurewiz
features, _ = featurewiz(df, target_col)
# Convert the remaining column names back to integers and drop the
# column of labels
self.features = [int(s) for s in np.squeeze(features)][:-1]
return self
def transform(self, x):
return x[:, self.features]
from featurewiz.
Hi @chrico-bu-uab 👍
Hahaha. I love your thoughtfulness 💯 I would have created this kind of code (class) long long ago if you could just turn the 100's of things that featurewiz does into a transformer mixin.
There are 2 ways to solve the problem you all have mentioned here:
1. Keep featurewiz as rich and full featured as it is: You can't just throw featurewiz into an existing TranformerMixin class. It will blow up... The reason is that you will need to containerize the entire featurewiz
library if you do what you are saying here. But if you think that is not a problem for many folks like yourself, I can then try to offer this version.
2. Create featurewiz-lite version: I am working on creating a new version of featurewiz based on the TransformerMixin above that takes in a dataset and completely transforms it into numeric variables and then selects the best features from it based on recursive XGBoost. It is a completely scikit-learn compatible Pipeline object. I call it featurewiz_lite. So it will work in any Python data pipeline you create. and won't need a special container However, this version cannot do SULOV since SULOV uses networkx library (graph networks) which is an extremely complicated piece of code to make it happen. Do you think that featurewiz-lite
will be noteworthy and helpful to data and ML engineers?
Looking for comments and feedback on above.
AutoViML
from featurewiz.
Thanks, and point taken! I had just hacked something together (have only lightly tested it) and thought I would share. I would definitely be in favor of a featurewiz_lite version. The full version has so many "bells and whistles" (which is a great thing) that it wouldn't be a huge loss if a little of that functionality were sacrificed in order to have a scikit-learn compatible version.
As they say, "A designer knows he has achieved perfection, not when there is nothing left to add, but when there is nothing left to take away."
from featurewiz.
Hi @chrico-bu-uab, @rankwe, @nemar3:
👍
You are in luck! I was able to finally create a Transformer class out of featurewiz called FeatureWiz. You can now use it to perform feature selection using the fit and predit syntax of scikit-learn as follows:
from featurewiz import FeatureWiz
features = FeatureWiz(corr_limit=0.70, feature_engg='', category_encoders='', dask_xgboost_flag=False, nrows=None, verbose=2)
X_train_selected = features.fit_transform(X_train, y_train)
X_test_selected = features.transform(X_test)
features.features ### provides the list of selected features ###
You will get a Transformer that can select the top variables from your dataset.
You must first upgrade your featurewiz version to 0.0.90 or higher via:
pip install featurewiz --upgrade
Hope you like it. Please provide your feedback and comments here.
AutoViML
from featurewiz.
Related Issues (20)
- Category type, indexes don't match on AutoEncoding HOT 3
- Issue with working with Featureviz HOT 1
- Comment has incorrect code ( verbose=0. imbalanced=False [verbose=0, imbalanced=False]) HOT 1
- make tensorflow optional HOT 4
- lazytransform.py float to integer error HOT 2
- Dealing with a Numpy array as features HOT 1
- Convert binary columns to categorical HOT 1
- featurewiz ignores category columns HOT 2
- dont show chart for more than 1000 features HOT 1
- featurewiz ignores category columns with an example HOT 5
- Universal API required for smooth working HOT 4
- Unpin requirements? HOT 2
- Conda package outdated
- TypeError: expected string or bytes-like object on int type column name HOT 2
- Conflict Error Among Poetry Package Dependencies: lazytransform, tqdm, featurewiz HOT 8
- TYPO ERROR
- Typo Error
- Version Conflict for scikit-learn - Bump to 1.3.2 possible? HOT 1
- Can't get featurewiz to work HOT 1
- ValueError: Length mismatch: Expected axis has X elements, new values have Y elements HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from featurewiz.