Comments (7)
Hi @teucer ! Actually this is behaving as expected. In Wilkinson formulae, -1 and +0 both have the effect of removing the intercept. Is there something I'm missing?
from formulaic.
We want to be able to say model_matrix(rhs)
. But this would add back the intercept for 2nd and 3rd cases.
Maybe model_matrix should handle it:
- y~x
- ~x
- x
There could be a difference between 2 and 3. Right now they are handled the same way.
from formulaic.
Hmmmm... I'm a bit confused as to what you are trying to achieve here.
There was a bug in formulaic.model_matrix
related to ingesting pre-processed formula that is now fixed, and will go out in the next point release, but basically you should be able to do this:
>>> import pandas
>>> from formulaic import Formula, model_matrix
>>> f = Formula("y ~ x - 1")
>>> f.rhs
x
>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
x
0 1
1 2
2 3
Why are you re-processing string representations of the right hand side?
from formulaic.
The issue is that in the current version
>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
# -> FormulaInvalidError
>>> model_matrix("x", pandas.DataFrame({"x": [1,2,3]}))
Intercept x
0 1.0 1
1 1.0 2
2 1.0 3
Given the bug, we were trying to do with the string.
The more fundamental questions are, should there be a difference:
- between
Formula("~x")
andFormula("x")
- if
f = Formula("y ~ x - 1")
, betweenmodel_matrix("x", df)
andmodel_matrix(f.rhs, df)
My opinion:
- Yes -> the first one is with intercept, the second one without
- No -> see point 1.
from formulaic.
Hi again @teucer ,
The bug is now fixed, and I can roll out a new version for you if this is blocking you; otherwise I'll wait a few more days so it can also include some categorical encoding enhancements.
Your proposal is not unreasonable, but for consistency with expectations set by patsy
and R, I'm going to make the executive choice not to treat ~x
and x
differently.
However, formulaic does (under the hood) support suppression of the addition of the intercept. For example, you could do:
>>> from formulaic import Formula
>>> from formulaic.parser import FormulaParser
>>> Formula(FormulaParser().get_terms("x", include_intercept=False))
x
It might be sensible to just add the include_intercept
option in the Formula
constructor, if that would be helpful to you.
from formulaic.
Having include_intercept
in the constructor would be really helpful.
from formulaic.
Hi again!
As we move toward 1.0.0 I'm going back through all the issues and deciding things one way or another. Here, I'm deciding not to include this in the Formula constructor; however, since this issue was created it was moved to the parser constructor. So you can now do:
from formulaic import Formula
from formulaic.parser import DefaultFormulaParser
Formula("x", _parser=DefaultFormulaParser(include_intercept=False))
Due to modularity, and conflation with the Structured
nature of Formula
, this is no longer something I think would be a good idea.
Thanks again for sharing your thoughts and helping make Formulaic great!
I'll close this one out for now.
from formulaic.
Related Issues (20)
- ENH: Preserve variable order as they appear in formulas HOT 5
- 2 tests fail HOT 1
- Interaction between two categorical covariates sometimes switches order, causing error HOT 3
- Intercept is not added after being removed HOT 4
- Proposal: support columns representing multiple features HOT 3
- Formulaic struggles with NAs and `poly()` syntax HOT 3
- Escaped variables and functions HOT 3
- How to include structural zeros? HOT 1
- Retain Column Names for sparse model matrices HOT 4
- Formulaic not raising an exception when required fields are missing in the dataset HOT 2
- Allow formatting the categorical encoded variables HOT 4
- Throw error when formula has parameters that are not available HOT 2
- Support polars HOT 5
- Dropping Indices via "+0" or "-1" and reference levels for categoricals HOT 1
- Extending `formulaic` to work with other input types HOT 2
- Handling individual columns that can expand into multiple columns HOT 7
- Support the hashing trick as an encoding strategy for categorical features HOT 6
- `model_spec.transform_state` bugged when formula is not correctly written HOT 1
- Is there a way to get the baseline value for categorical variables? HOT 7
- Add . operator HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from formulaic.