IS it possible to use a "fitted" transformer and evaluate a new (however, similar data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

DOC: Reusing generated model specifications about formulaic HOT 11 CLOSED

matthewwardrop commented on July 19, 2024

DOC: Reusing generated model specifications

from formulaic.

Comments (11)

matthewwardrop commented on July 19, 2024 2

@TELSER1 This has been fixed and pushed out in v0.5.0 .

from formulaic.

frederik-plum-hauschultz commented on July 19, 2024 1

Thank you for this! I was btw drawn to this package not so much due to performance (which seems to be 10x faster than patsy on my setup) but the fact that it can be pickled.

from formulaic.

matthewwardrop commented on July 19, 2024

Hi @petrhrobar ,

Yes... this is easily done with the current state of formulaic.

You can just do:

from formulaic import Formula

df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

trans = Formula('y ~ x + z')

mm1 = trans.get_model_matrix(df)

df2 = pandas.DataFrame({
    'y': [3, 3, 3],
    'x': ['A', 'B', 'B'],
    'z': [0.3, 0.1, 0.222222222],
})

mm2 = mm1.model_spec.get_model_matrix(df2)

Or, using the sugar method model_matrix in 0.3.4+:

mm2 = model_matrix(mm1, df2)

Hope that helps. I'll leave this open for documentation purposes until the docsite is updated.

from formulaic.

petrhrobar commented on July 19, 2024

THanks, Matthew!

I was actually able to figure it out myself yesterday from a previous issue about this topic. So I guess it kind of boils down to the documentation :/.

If I may recommend and show how I want to use this is to have a sklean component:

from sklearn.base import BaseEstimator, TransformerMixin
from formulaic import Formula, model_matrix

class FormulaicTransformer(TransformerMixin, BaseEstimator):

    def __init__(self, formula):
        self.formula = formula

    def fit(self, X, y = None):
        """Fits the estimator"""
        self._trans = model_matrix(self.formula, X).model_spec.rhs
        return self

    def transform(self, X, y= None):
        """Fits the estimator"""        
        X_ = self._trans.get_model_matrix(X)
        return X_


pipe = Pipeline([
    ("formula", FormulaicTransformer("(bs(yday, df=12) + wday + num_date")),
    ("scale", StandardScaler()),
    ("model", LinearRegression())
])

As this persists the design info and can be pickled. It may be used as a proper sklearn component!
This is a badass feature!

from formulaic.

matthewwardrop commented on July 19, 2024

Nice! And yes... documentation will come... eventually!

This is a really cool use of Formulaic :). Maybe something like this makes sense to bring into formulaic itself at some point; or perhaps even better, upstream into sklearn.

When used in libraries, though, I do recommend using Formula(...).get_model_matrix(...) since that way the compute context is explicitly established. When using model_matrix the default behaviour is to make the entire locals() and globals() context available to use in formulae. For local use, that's fine... for libraries, not so much.

from formulaic.

frederik-plum-hauschultz commented on July 19, 2024

Hi @petrhrobar ,

Yes... this is easily done with the current state of formulaic.

You can just do:
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

trans = Formula('y ~ x + z')

mm1 = trans.get_model_matrix(df)

df2 = pandas.DataFrame({
    'y': [3, 3, 3],
    'x': ['A', 'B', 'B'],
    'z': [0.3, 0.1, 0.222222222],
})

mm2 = mm1.model_spec.get_model_matrix(df2)
Or, using the sugar method model_matrix in 0.3.4+:
mm2 = model_matrix(mm1, df2)
Hope that helps. I'll leave this open for documentation purposes until the docsite is updated.

I have been attempting to use Formulaic and ran in to the same issue. The above example does not seem to work anymore. Is there a different way of doing this now? Thanks in advance!

from formulaic.

matthewwardrop commented on July 19, 2024

Hi @frederik-plum-hauschultz !

Since a while back (not sure if it was the case when I wrote this or not), the output of get_model_matrix() is a structured Structured instance that reflects the structure of the formula.

For example:

>>> from formulaic import Formula

>>> df = pandas.DataFrame({
    'y': [0,1,2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

>>> trans = Formula('y ~ x + z')

>>> mm1 = trans.get_model_matrix(df)
>>> mm1
.lhs:
       y
    0  0
    1  1
    2  2
.rhs:
       Intercept  x[T.B]  x[T.C]    z
    0        1.0       0       0  0.3
    1        1.0       1       0  0.1
    2        1.0       0       1  0.2

>>> mm1.model_spec
.lhs:
    <formulaic.model_spec.ModelSpec object at 0x7f0f43f6df10>
.rhs:
    <formulaic.model_spec.ModelSpec object at 0x7f0f43f6df40>

It isn't possible in 0.3.x to call methods of the nested structure directly. I might add that in 0.4.x, but am not yet 100% convinced it is a good idea (maybe about 95% at the moment, leaning toward doing it, in which case it will appear in 0.4.0 shortly; feel free to nudge me if you like the idea).

In the meantime you can do:

df2 = pandas.DataFrame({
    'y': [3, 3, 3],
    'x': ['A', 'B', 'B'],
    'z': [0.3, 0.1, 0.222222222],
})

mm2 = mm1.model_spec.rhs.get_model_matrix(df2)

or, if you want both the lhs and rhs bits done in one step:

>>> mm1.model_spec._map(lambda spec: spec.get_model_matrix(df2))
.lhs:
       y
    0  3
    1  3
    2  3
.rhs:
       Intercept  x[T.B]  x[T.C]         z
    0        1.0       0       0  0.300000
    1        1.0       1       0  0.100000
    2        1.0       1       0  0.222222

Hope that helps.

from formulaic.

TELSER1 commented on July 19, 2024

How would this work if I wanted a sparse output?

If I set
mm1 = trans.get_model_matrix(df, output='sparse')

I get back the expected sparse matrices, and the model specs are

.lhs:
    ModelSpec(formula=y, materializer='pandas', ensure_full_rank=True, na_action=<NAAction.DROP: 'drop'>, output='sparse', structure=[EncodedTermStructure(term=y, scoped_terms=[y], columns=['y'])], transform_state={}, encoder_state={'y': (<Kind.NUMERICAL: 'numerical'>, {})})
.rhs:
    ModelSpec(formula=1 + x + z, materializer='pandas', ensure_full_rank=True, na_action=<NAAction.DROP: 'drop'>, output='sparse', structure=[EncodedTermStructure(term=1, scoped_terms=[1], columns=['Intercept']), EncodedTermStructure(term=x, scoped_terms=[x-], columns=['x[T.B]', 'x[T.C]']), EncodedTermStructure(term=z, scoped_terms=[z], columns=['z'])], transform_state={}, encoder_state={'x': (<Kind.CATEGORICAL: 'categorical'>, {'categories': ['A', 'B', 'C']}), 'z': (<Kind.NUMERICAL: 'numerical'>, {})})

However, when reusing the model spec:

mm1.model_spec.rhs.get_model_matrix(df2)

A dataframe is returned, and I can't pass output='sparse'.

from formulaic.

matthewwardrop commented on July 19, 2024

Hi @TELSER1 !

Thanks for reaching out! There was a regression introduced in 0.4.0 that I will be fixing shortly (hopefully tonight, added #102 to track it) whereby ModelSpec.output is not respected. You can workaround this by rolling back to 0.3.x, or using:

from formulaic import model_matrix

model_matrix(model_spec, <new_data>, output='sparse')

from formulaic.materializers.pandas import PandasMaterializer

PandasMaterializer(<new data>).get_model_matrix(model_spec, output='sparse')

Hope that helps!

from formulaic.

TELSER1 commented on July 19, 2024

Thanks for the help! In the spirit of Frederik's comment, I'm particularly interested in the serializability and sparse output functionality; I am trying to estimate some large, sparse regression models.

from formulaic.

matthewwardrop commented on July 19, 2024

@TELSER1 Thanks for the context! That's largely why I wrote formulaic too :).

from formulaic.

DOC: Reusing generated model specifications about formulaic HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent