Comments (5)
Hi @bashtage ! Thanks for reaching out!
Formulaic made a design decision way-back-when to always sort terms and factors such that equivalent formulae behave identically and always generate the same results. You can read more about this here: https://matthewwardrop.github.io/formulaic/guides/formulae/#formula .
I can definitely see why this might be a bit annoying, though, if you are manually staring at regression reports, and are used to being able to search for features in the order you wrote them rather than lexically (though arguably it is easier to find lexically).
I'm willing to add support for disabling this, and perhaps even default to disabling this feature. I presume that the factors within a term could still be sorted?
from formulaic.
Hi Matthew,
First of all, I want to express my gratitude for developing the 'formulaic' package. It has been incredibly convenient to use in conjunction with the linearmodels developed by Kevin.
I do have one suggestion for your package. Currently, when running linearmodels with the 'formulaic' package, the independent variables are ordered alphabetically by default. This differs from the 'statsmodel' package, which orders the variables as specified by the user in the formula. It would be much more user-friendly if the 'formulaic' package could also order the variables in the same way as 'statsmodel', as it can be "not intuitive" and sometimes confusing to loop over and extract specific parameters and statistics when the variables are ordered alphabetically.
I hope you find this suggestion helpful, and I appreciate your consideration in implementing this improvement in your package.
Thank you again for your hard work in developing the 'formulaic' package.
Best,
Dong Gil Kim
from formulaic.
Hi @matthewwardrop,
I do think this would be a useful enhancement. For example, it is common to do something like
(a) The first few variables are actually of interest
(b) The remaining variables are controls that are not of specific interest.
I also think the expanding a specification can be tricky when the order is not preserved, since one has to figure out the position in the output to get the relevant coefficients.
from formulaic.
Hi @bashtage and @DongGilKim ,
Thanks for your patience. I never have as much time as I'd like to work on my projects :).
I'm sold. I think this is just a case of me having a momentary idea years ago and never revisiting it. Keeping the terms in the same order as input (grouped by interaction order, like R and patsy) makes sense to me, and there is basically no benefit to having a guaranteed order to the terms. I'll fix this shortly, just in time for the 1.0.0 release. Thanks for catching this in time :).
fwiw: When looking up indices in a library I recommend using the model spec rather than based on input order, since terms may expand to multiple columns (which in turn may reduced in cardinality to keep things full rank).
I should have this done in a week or so.
from formulaic.
Okay... so it wasn't too hard, and I had a bit of time to work on it... so I did. I'd love your thoughts about the remaining differences between patsy and formulaic ordering in #139 .
from formulaic.
Related Issues (20)
- How to include structural zeros? HOT 1
- Retain Column Names for sparse model matrices HOT 4
- Formulaic not raising an exception when required fields are missing in the dataset HOT 2
- Allow formatting the categorical encoded variables HOT 4
- Throw error when formula has parameters that are not available HOT 2
- Support polars HOT 5
- Dropping Indices via "+0" or "-1" and reference levels for categoricals HOT 1
- Extending `formulaic` to work with other input types HOT 2
- Handling individual columns that can expand into multiple columns HOT 7
- Support the hashing trick as an encoding strategy for categorical features HOT 6
- `model_spec.transform_state` bugged when formula is not correctly written HOT 1
- Is there a way to get the baseline value for categorical variables? HOT 7
- Add . operator HOT 1
- Suggestions for creating `get_feature_names_out` for Scikit Learn ColumnTransformer compatibility? HOT 3
- Is it possible to define custom operators? HOT 2
- Is it possible to force the `Formula` class to not expand categorical variables? HOT 3
- Add required variables to the `Formula` class HOT 6
- Potential Bug / different defaults for Intercept / Reference Levels when using `Formula.get_model_matrix()` with categoricals HOT 2
- Potential bug in Interacting variables via `:` syntax for categorical variables HOT 3
- Incompatibility with pandas development version HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from formulaic.