Git Product home page Git Product logo

Comments (3)

kbattocchi avatar kbattocchi commented on July 28, 2024

The only change that I can think of is that we have changed the default first-stage propensity and regression models to do model selection between linear and forest models instead of always just using a linear model.

We made this change because the accuracy of the CATE estimate depends strongly on having good models, and for many datasets we'd expect forest models to fit the data much better. In general, this has not resulted in large slowdowns in our own internal testing, but perhaps you have a much larger number of rows or columns than we've been testing on - what are the shapes of your Y, T, X, and W inputs?

If fitting forest models is the cause of the slowdown, you can explicitly pass first-stage models of your choice instead. However, as I mentioned it is important to use models that can actually fit your data well if you want to get accurate CATE estimates, so I would only fall back on linear models if you are confident that those have good predictive power in your setting.

As a side note, we released v0.15.1 yesterday, which contains some bugfixes, so you may want to upgrade to that, but I don't expect it to affect your performance issues if the cause is what I've outlined above.

from econml.

Jiaqi-ads avatar Jiaqi-ads commented on July 28, 2024

Thanks for your prompt response! @kbattocchi

The dataset I was testing on contains about 500,000 rows and have about 50 columns in X and W combined, which consists of mostly the one-hot encoded categorical variables. So maybe it is because of the changes in the default first stage models?

On the accuracy of the first-stage models though, although I agree that forest models tend to have better accuracy and more accurate first-stage models lead to better CATE estimation, I'm aware that there are some arguments saying that forest models tend to generate more extreme probability scores in classification tasks. This could probably affect both the outputs of propensity model and the "regression model" as well if the outcome variable is binary, which ultimately affects the performance of the final CATE model. May I ask what your thoughts are on this? Thanks in advance.

from econml.

Jiaqi-ads avatar Jiaqi-ads commented on July 28, 2024

Hi, just wanted to follow up on the issue of speed. I've upgraded the module to v0.15.1 and tried to set both the model_propensity and model_regression to 'linear'. It still took hours to finish training on the dataset whereas it took only four minutes with v0.14.0. Besides, the execution time was the same as setting those parameters to 'auto' and changing the parameters to 'forest' doesn't affect the execution time much either. So I wonder if there could be some other issues?

from econml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.