Git Product home page Git Product logo

Comments (7)

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hi @teucer ! Actually this is behaving as expected. In Wilkinson formulae, -1 and +0 both have the effect of removing the intercept. Is there something I'm missing?

from formulaic.

teucer avatar teucer commented on July 19, 2024

We want to be able to say model_matrix(rhs). But this would add back the intercept for 2nd and 3rd cases.

Maybe model_matrix should handle it:

  1. y~x
  2. ~x
  3. x

There could be a difference between 2 and 3. Right now they are handled the same way.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hmmmm... I'm a bit confused as to what you are trying to achieve here.

There was a bug in formulaic.model_matrix related to ingesting pre-processed formula that is now fixed, and will go out in the next point release, but basically you should be able to do this:

>>> import pandas
>>> from formulaic import Formula, model_matrix
>>> f = Formula("y ~ x - 1")
>>> f.rhs
x
>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
	x
0	1
1	2
2	3

Why are you re-processing string representations of the right hand side?

from formulaic.

teucer avatar teucer commented on July 19, 2024

The issue is that in the current version

>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
# -> FormulaInvalidError

>>> model_matrix("x", pandas.DataFrame({"x": [1,2,3]}))
        Intercept  x
0	      1.0  1
1	      1.0  2
2	      1.0  3

Given the bug, we were trying to do with the string.

The more fundamental questions are, should there be a difference:

  1. between Formula("~x") and Formula("x")
  2. if f = Formula("y ~ x - 1"), between model_matrix("x", df) and model_matrix(f.rhs, df)

My opinion:

  1. Yes -> the first one is with intercept, the second one without
  2. No -> see point 1.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hi again @teucer ,

The bug is now fixed, and I can roll out a new version for you if this is blocking you; otherwise I'll wait a few more days so it can also include some categorical encoding enhancements.

Your proposal is not unreasonable, but for consistency with expectations set by patsy and R, I'm going to make the executive choice not to treat ~x and x differently.

However, formulaic does (under the hood) support suppression of the addition of the intercept. For example, you could do:

>>> from formulaic import Formula
>>> from formulaic.parser import FormulaParser
>>> Formula(FormulaParser().get_terms("x", include_intercept=False))
x

It might be sensible to just add the include_intercept option in the Formula constructor, if that would be helpful to you.

from formulaic.

teucer avatar teucer commented on July 19, 2024

Having include_intercept in the constructor would be really helpful.

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 19, 2024

Hi again!

As we move toward 1.0.0 I'm going back through all the issues and deciding things one way or another. Here, I'm deciding not to include this in the Formula constructor; however, since this issue was created it was moved to the parser constructor. So you can now do:

from formulaic import Formula
from formulaic.parser import DefaultFormulaParser
Formula("x", _parser=DefaultFormulaParser(include_intercept=False))

Due to modularity, and conflation with the Structured nature of Formula, this is no longer something I think would be a good idea.

Thanks again for sharing your thoughts and helping make Formulaic great!

I'll close this one out for now.

from formulaic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.