Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Is it possible to define custom operators? about formulaic HOT 2 OPEN

s3alfisc commented on July 27, 2024

Is it possible to define custom operators?

from formulaic.

Comments (2)

matthewwardrop commented on July 27, 2024

Hi @s3alfisc ,

Thanks for looping me in. Currently this is not directly possible in stock Formulaic :(. The closest you can do today is something like:

import pandas
import re

from formulaic import Formula
from formulaic.utils.stateful_transforms import stateful_transform

@stateful_transform
def varlist(pattern, _context=None):
    pattern = re.compile(pattern)
    return {
        variable: values
        for variable, values in _context.named_layers.get("data", {}).items()
        if pattern.match(variable)
    }

Formula("varlist('X.*')").get_model_matrix(pandas.DataFrame({"X1": [1,2,3], "X2": [1,2,3]}), context={"varlist": varlist})

which would output:

This is equivalent to the additive terms you demonstrate above (with ugly naming), but would not work so well for interactions, since the varlist('X.*') when multiplied by itself would collapse, and even if you aliased varlist so you had two versions it, the materialized product would include duplicate X* features and cross-products like X*:X*.

Thinking through how this could be improved by additions to Formulaic: we are limited by the fact that the formula parser intentionally has no awareness of the dataset for which model matrices will be generated later on. So what we would need is support for rewriting formulae during materialization. Since we evaluate all of the factors prior to substituting them, we could for example return a new nested Formula as the output of a transform; which we then expand and evaluate the factors recursively until things resolve. To avoid the collapsing issue we would need to add a new syntax like y ~ !varlist('X*') * !varlist('X*'), where the ! indicates that the factors should not be collapsed during formula parsing. These nested formulae would be "expanded" into the parent formulae, and so naming would be a lot cleaner. The generated model matrices and specs would have no idea that the expansion happened (we would just hard-code the new formula into the specs). So... that would work, and wouldn't be too hard to implement... but does increase the complexity. Next question.... is it worth it? Would it be helpful?

from formulaic.

matthewwardrop commented on July 27, 2024

A slightly less general variant of the above is to add specific syntax for this kind of operation. Something like:

y ~ {:X.*}

Where we leverage the existing Python code quoting and special case "Python" snippets that start with ! in much the way we describe above, but specifically for this expansion purpose. This is somewhat thematically aligned with #175 , where it is desired that we add the . operator that expands to all unused features in the data.

from formulaic.

Recommend Projects

Is it possible to define custom operators? about formulaic HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent