Git Product home page Git Product logo

Comments (9)

matthewwardrop avatar matthewwardrop commented on July 19, 2024 1

@johndburger As a pretty big hack, in case this is blocking you, you can do the following (which will register numpy as something to be looked up globally):

import numpy
formulaic.materializers.transforms.TRANSFORMS['np'] = numpy

This is currently considered an internal API feature, but perhaps it should be promoted to a public facing API.

Also note that this will break in 0.3.0, because the transforms module is moving to the top-level of the formulaic package.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024 1

Numpy is now available always as np (along with a few other helpful transforms like log and exp); see e8173f9 for more details (or the grammar docs).

And once PR #63 merges, you'll have the helper function you were looking for @CamDavidsonPilon . Apologies for the lead time on this.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hi @CamDavidsonPilon!

That's a great question. I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).

In Formulaic, we have it easy, since model_matrix is syntactic sugar, is never used internally, and so can always just capture the calling environment, and it will do what people intuitively expect. For obvious reasons it's better not to use model_matrix in a library, so your approach is the recommended one (directly call Formula().get_model_matrix()).

As for how to get the right context in lifelines... that's a little trickier; especially if your classes are themselves nested in other libraries. If we can make the assumption that the right environment to capture is the one in which the constructor of your model class is called (e.g. CoxPHFitter()), then you should probably add some code that captures the environment there, and pass it down to formulaic. I'm happy to add some utility code in Formulaic to make the at bit easier (and to help others facing the same problem). Something like:

class CoxPHFitter:

    def __init__(self, ..., formula_context=None):
        if formula_context is None:
            formula_context = capture_caller_context(offset=...)
        # maybe add this as an instance attribute?
....

There are other much hackier solutions, including:

  • Always parse the stack to find the first frame outside of lifelines or beyond some tagged frame (perhaps with a variable "LIFELINE_TAG") [this would likely work... but it is quite magic]
  • Same as above but looking for a specific tag. This requires users to add a variable or marker to their frame. [ick]
  • Always using the ipython namespace if it exists: get_ipython().user_ns (assumes people use IPython/Jupyter) [ick]

Otherwise, I'm unaware of a generic solution to this; and it does seem cleaner to pass things down explicitly rather than magically.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

We could also allow Formula to capture an environment, and then update incoming strings toFormula instances in the constructor above. My only reservation is that we'd then be mixing config and state (which we'd need to drop during serialisation).

from formulaic.

CamDavidsonPilon avatar CamDavidsonPilon commented on July 19, 2024

I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).

It's possible! I don't recall testing custom functions, maybe this bug has always been around.

If we can make the assumption that the right environment to capture is the one in which the constructor of your model class is called (e.g. CoxPHFitter()),

I think this is a correct assumption - it's certainly the most common use case.

Given your proposed solutions, I like capture_caller_context best. I think pushing the responsibility of defining contexts onto formulaic makes the most sense, since you may change what a context means in the future (ex: moving away from LayeredMappings, etc.)

from formulaic.

johndburger avatar johndburger commented on July 19, 2024

I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).

FWIW, I had import numpy as np in the script where I was calling CoxPHFitter.fit(formula="... + np.log10(x) ...") and it magically worked with patsy. Maybe patsy walked up the stack looking for the right context?

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hmm.... that's really odd. I don't see any logic for that in Patsy. Moreover, this code demonstrates that it has the same behaviour as formulaic in this respect:

import patsy

def get_matrix(formula, data):
    return patsy.dmatrix(formula, data)

def wrapped():
    import numpy as np
    return get_matrix('np.log(x)', {'x': [1,2,3]})

wrapped()  # NameError: name 'np' is not defined

Here we see that patsy's evaluation context does not include np, because it is just inheriting from its local/global context (rather than that of the caller).

Maybe in the past lifelines had imported numpy as np in the context that the design matrix was constructed?

from formulaic.

CamDavidsonPilon avatar CamDavidsonPilon commented on July 19, 2024

@matthewwardrop do you think it's silly to always add np to TRANSFORMS? Like make the below code part of formulaic:

formulaic.materializers.transforms.TRANSFORMS['np'] = numpy

(doesn't solve my original problem, but could make bugs like mine less likely)

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

@CamDavidsonPilon No, it's not silly. I'd idly contemplated doing something like this in the past without much resolve, but decided it was "magic" and moved on. Given that we insert certain transforms anyway ('C', 'poly', 'bs', etc), it doesn't seem unreasonable to always include np also; perhaps even certain of the methods of numpy directly, like 'log', 'exp', etc. I'll give it some thought, but I think it is a good idea; it's just never been a priority heretofore compared to sorting out the framework. I'll add some of these transforms by default for 0.3. #49

from formulaic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.