Git Product home page Git Product logo

Comments (2)

matthewwardrop avatar matthewwardrop commented on July 27, 2024

Hi @s3alfisc ,

Thanks for looping me in. Currently this is not directly possible in stock Formulaic :(. The closest you can do today is something like:

import pandas
import re

from formulaic import Formula
from formulaic.utils.stateful_transforms import stateful_transform

@stateful_transform
def varlist(pattern, _context=None):
    pattern = re.compile(pattern)
    return {
        variable: values
        for variable, values in _context.named_layers.get("data", {}).items()
        if pattern.match(variable)
    }

Formula("varlist('X.*')").get_model_matrix(pandas.DataFrame({"X1": [1,2,3], "X2": [1,2,3]}), context={"varlist": varlist})

which would output:
image

This is equivalent to the additive terms you demonstrate above (with ugly naming), but would not work so well for interactions, since the varlist('X.*') when multiplied by itself would collapse, and even if you aliased varlist so you had two versions it, the materialized product would include duplicate X* features and cross-products like X*:X*.

Thinking through how this could be improved by additions to Formulaic: we are limited by the fact that the formula parser intentionally has no awareness of the dataset for which model matrices will be generated later on. So what we would need is support for rewriting formulae during materialization. Since we evaluate all of the factors prior to substituting them, we could for example return a new nested Formula as the output of a transform; which we then expand and evaluate the factors recursively until things resolve. To avoid the collapsing issue we would need to add a new syntax like y ~ !varlist('X*') * !varlist('X*'), where the ! indicates that the factors should not be collapsed during formula parsing. These nested formulae would be "expanded" into the parent formulae, and so naming would be a lot cleaner. The generated model matrices and specs would have no idea that the expansion happened (we would just hard-code the new formula into the specs). So... that would work, and wouldn't be too hard to implement... but does increase the complexity. Next question.... is it worth it? Would it be helpful?

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 27, 2024

A slightly less general variant of the above is to add specific syntax for this kind of operation. Something like:

y ~ {:X.*}

Where we leverage the existing Python code quoting and special case "Python" snippets that start with ! in much the way we describe above, but specifically for this expansion purpose. This is somewhat thematically aligned with #175 , where it is desired that we add the . operator that expands to all unused features in the data.

from formulaic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.