Git Product home page Git Product logo

Comments (5)

matthewwardrop avatar matthewwardrop commented on July 21, 2024

Hi @bashtage ! Thanks for reaching out!

Formulaic made a design decision way-back-when to always sort terms and factors such that equivalent formulae behave identically and always generate the same results. You can read more about this here: https://matthewwardrop.github.io/formulaic/guides/formulae/#formula .

I can definitely see why this might be a bit annoying, though, if you are manually staring at regression reports, and are used to being able to search for features in the order you wrote them rather than lexically (though arguably it is easier to find lexically).

I'm willing to add support for disabling this, and perhaps even default to disabling this feature. I presume that the factors within a term could still be sorted?

from formulaic.

DongGilKim avatar DongGilKim commented on July 21, 2024

Hi Matthew,

First of all, I want to express my gratitude for developing the 'formulaic' package. It has been incredibly convenient to use in conjunction with the linearmodels developed by Kevin.

I do have one suggestion for your package. Currently, when running linearmodels with the 'formulaic' package, the independent variables are ordered alphabetically by default. This differs from the 'statsmodel' package, which orders the variables as specified by the user in the formula. It would be much more user-friendly if the 'formulaic' package could also order the variables in the same way as 'statsmodel', as it can be "not intuitive" and sometimes confusing to loop over and extract specific parameters and statistics when the variables are ordered alphabetically.

I hope you find this suggestion helpful, and I appreciate your consideration in implementing this improvement in your package.

Thank you again for your hard work in developing the 'formulaic' package.

Best,
Dong Gil Kim

from formulaic.

bashtage avatar bashtage commented on July 21, 2024

Hi @matthewwardrop,

I do think this would be a useful enhancement. For example, it is common to do something like

(a) The first few variables are actually of interest
(b) The remaining variables are controls that are not of specific interest.

I also think the expanding a specification can be tricky when the order is not preserved, since one has to figure out the position in the output to get the relevant coefficients.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 21, 2024

Hi @bashtage and @DongGilKim ,

Thanks for your patience. I never have as much time as I'd like to work on my projects :).

I'm sold. I think this is just a case of me having a momentary idea years ago and never revisiting it. Keeping the terms in the same order as input (grouped by interaction order, like R and patsy) makes sense to me, and there is basically no benefit to having a guaranteed order to the terms. I'll fix this shortly, just in time for the 1.0.0 release. Thanks for catching this in time :).

fwiw: When looking up indices in a library I recommend using the model spec rather than based on input order, since terms may expand to multiple columns (which in turn may reduced in cardinality to keep things full rank).

I should have this done in a week or so.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 21, 2024

Okay... so it wasn't too hard, and I had a bit of time to work on it... so I did. I'd love your thoughts about the remaining differences between patsy and formulaic ordering in #139 .

from formulaic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.