Comments (9)
@johndburger As a pretty big hack, in case this is blocking you, you can do the following (which will register numpy as something to be looked up globally):
import numpy
formulaic.materializers.transforms.TRANSFORMS['np'] = numpy
This is currently considered an internal API feature, but perhaps it should be promoted to a public facing API.
Also note that this will break in 0.3.0, because the transforms
module is moving to the top-level of the formulaic package.
from formulaic.
Numpy is now available always as np
(along with a few other helpful transforms like log
and exp
); see e8173f9 for more details (or the grammar docs).
And once PR #63 merges, you'll have the helper function you were looking for @CamDavidsonPilon . Apologies for the lead time on this.
from formulaic.
That's a great question. I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).
In Formulaic, we have it easy, since model_matrix
is syntactic sugar, is never used internally, and so can always just capture the calling environment, and it will do what people intuitively expect. For obvious reasons it's better not to use model_matrix
in a library, so your approach is the recommended one (directly call Formula().get_model_matrix()
).
As for how to get the right context in lifelines
... that's a little trickier; especially if your classes are themselves nested in other libraries. If we can make the assumption that the right environment to capture is the one in which the constructor of your model class is called (e.g. CoxPHFitter()
), then you should probably add some code that captures the environment there, and pass it down to formulaic. I'm happy to add some utility code in Formulaic to make the at bit easier (and to help others facing the same problem). Something like:
class CoxPHFitter:
def __init__(self, ..., formula_context=None):
if formula_context is None:
formula_context = capture_caller_context(offset=...)
# maybe add this as an instance attribute?
....
There are other much hackier solutions, including:
- Always parse the stack to find the first frame outside of lifelines or beyond some tagged frame (perhaps with a variable "LIFELINE_TAG") [this would likely work... but it is quite magic]
- Same as above but looking for a specific tag. This requires users to add a variable or marker to their frame. [ick]
- Always using the ipython namespace if it exists:
get_ipython().user_ns
(assumes people use IPython/Jupyter) [ick]
Otherwise, I'm unaware of a generic solution to this; and it does seem cleaner to pass things down explicitly rather than magically.
from formulaic.
We could also allow Formula
to capture an environment, and then update incoming strings toFormula
instances in the constructor above. My only reservation is that we'd then be mixing config and state (which we'd need to drop during serialisation).
from formulaic.
I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).
It's possible! I don't recall testing custom functions, maybe this bug has always been around.
If we can make the assumption that the right environment to capture is the one in which the constructor of your model class is called (e.g. CoxPHFitter()),
I think this is a correct assumption - it's certainly the most common use case.
Given your proposed solutions, I like capture_caller_context
best. I think pushing the responsibility of defining contexts onto formulaic makes the most sense, since you may change what a context means in the future (ex: moving away from LayeredMappings, etc.)
from formulaic.
I'm also curious how things worked for you in the past (this should also have been a problem with patsy, no?).
FWIW, I had import numpy as np
in the script where I was calling CoxPHFitter.fit(formula="... + np.log10(x) ...")
and it magically worked with patsy. Maybe patsy walked up the stack looking for the right context?
from formulaic.
Hmm.... that's really odd. I don't see any logic for that in Patsy. Moreover, this code demonstrates that it has the same behaviour as formulaic in this respect:
import patsy
def get_matrix(formula, data):
return patsy.dmatrix(formula, data)
def wrapped():
import numpy as np
return get_matrix('np.log(x)', {'x': [1,2,3]})
wrapped() # NameError: name 'np' is not defined
Here we see that patsy's evaluation context does not include np
, because it is just inheriting from its local/global context (rather than that of the caller).
Maybe in the past lifelines
had imported numpy
as np
in the context that the design matrix was constructed?
from formulaic.
@matthewwardrop do you think it's silly to always add np
to TRANSFORMS
? Like make the below code part of formulaic:
formulaic.materializers.transforms.TRANSFORMS['np'] = numpy
(doesn't solve my original problem, but could make bugs like mine less likely)
from formulaic.
@CamDavidsonPilon No, it's not silly. I'd idly contemplated doing something like this in the past without much resolve, but decided it was "magic" and moved on. Given that we insert certain transforms anyway ('C', 'poly', 'bs', etc), it doesn't seem unreasonable to always include np
also; perhaps even certain of the methods of numpy directly, like 'log', 'exp', etc. I'll give it some thought, but I think it is a good idea; it's just never been a priority heretofore compared to sorting out the framework. I'll add some of these transforms by default for 0.3. #49
from formulaic.
Related Issues (20)
- drop both columns in dependent variable and design matrix when missings occur HOT 5
- DOC: Explicitly mention support for multiple variables on the left hand side HOT 3
- Terms not being evaluated in get_model_matrix() HOT 2
- 17 tests fail: ModuleNotFoundError: No module named 'interface_meta' HOT 2
- How can the encoding choices for one dataset be reused for another? HOT 3
- Intercept term breaks when RHS formula begins with a parentheses HOT 2
- How do I set the reference level for a categorical term? HOT 4
- Support for sympy >= 1.10 HOT 3
- ENH: Preserve variable order as they appear in formulas HOT 5
- 2 tests fail HOT 1
- Interaction between two categorical covariates sometimes switches order, causing error HOT 3
- Intercept is not added after being removed HOT 4
- Proposal: support columns representing multiple features HOT 3
- Formulaic struggles with NAs and `poly()` syntax HOT 3
- Escaped variables and functions HOT 3
- How to include structural zeros? HOT 1
- Retain Column Names for sparse model matrices HOT 4
- Formulaic not raising an exception when required fields are missing in the dataset HOT 2
- Allow formatting the categorical encoded variables HOT 4
- Throw error when formula has parameters that are not available HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from formulaic.