Git Product home page Git Product logo

Comments (3)

matthewwardrop avatar matthewwardrop commented on July 21, 2024

Hi Martin! Thanks for taking the time to report this bug.

This ought not to be possible because terms are ordered based on the formula, and the algorithm should in principle be deterministic after this. And I don't seem to be able to reproduce it. The following works fine for me:

import formulaic
import numpy
import pandas

for n in (10, 100, 1000):
    df = pandas.DataFrame({
        "frog_type": numpy.random.choice(["stiff", "movable"], n),
        "is_cargo": numpy.random.choice([True, False], n),
    })
    for i in range(1000):
        formulaic.model_matrix("C(frog_type):C(is_cargo)", df).model_spec.get_model_matrix(df)

Do you have an example you can share of when this breaks?

from formulaic.

martiningram avatar martiningram commented on July 21, 2024

Thanks Matthew for replying so quickly, I really appreciate it!

I'll have to work on an example; it's slightly tricky because I won't be able to share the actual data I'm working with.

One question: the error actually appeared originally using formulaic 0.5.2. I save a pickle file, restore it, and then do these computations. The error also persisted with the lastest formulaic version and this pickle file, but I am now wondering whether maybe it's because the pickle file was saved with the old version. Basically, my question is: do you think it could be that this was possible in 0.5.2 but has been fixed since?

from formulaic.

matthewwardrop avatar matthewwardrop commented on July 21, 2024

Ah... yes. Version 0.6.0 changed the ordering of terms, as described in the release notes, so if you are using older model specs that will be problematic.

Also note that (as described in the docs here), reusing model specs from older versions of formulaic is not supported. In most cases, it will work fine, but there will be occasional issues which we do not plan to support at this time.

If things consistently fail for the same formula/model spec, that is expected here. If it stochastically fails, then this is more concerning, since both 0.5.x and 0.6.x have stable (but different) term ordering strategies.

I'll close this one out for now, but feel free to reopen if you think there's any remaining issues not dealt with above!

And thanks again!

from formulaic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.