Hi EconML team, I've just upgraded my EconML package to V0.15.0 and

Thanks for your prompt response! <a class="user-mention notranslate" data-hovercard-ty

V0.15.0 runs hours longer than V0.14.0 about econml HOT 3 OPEN

Jiaqi-ads commented on July 28, 2024

V0.15.0 runs hours longer than V0.14.0

from econml.

Comments (3)

kbattocchi commented on July 28, 2024

The only change that I can think of is that we have changed the default first-stage propensity and regression models to do model selection between linear and forest models instead of always just using a linear model.

We made this change because the accuracy of the CATE estimate depends strongly on having good models, and for many datasets we'd expect forest models to fit the data much better. In general, this has not resulted in large slowdowns in our own internal testing, but perhaps you have a much larger number of rows or columns than we've been testing on - what are the shapes of your Y, T, X, and W inputs?

If fitting forest models is the cause of the slowdown, you can explicitly pass first-stage models of your choice instead. However, as I mentioned it is important to use models that can actually fit your data well if you want to get accurate CATE estimates, so I would only fall back on linear models if you are confident that those have good predictive power in your setting.

As a side note, we released v0.15.1 yesterday, which contains some bugfixes, so you may want to upgrade to that, but I don't expect it to affect your performance issues if the cause is what I've outlined above.

from econml.

Jiaqi-ads commented on July 28, 2024

Thanks for your prompt response! @kbattocchi

The dataset I was testing on contains about 500,000 rows and have about 50 columns in X and W combined, which consists of mostly the one-hot encoded categorical variables. So maybe it is because of the changes in the default first stage models?

On the accuracy of the first-stage models though, although I agree that forest models tend to have better accuracy and more accurate first-stage models lead to better CATE estimation, I'm aware that there are some arguments saying that forest models tend to generate more extreme probability scores in classification tasks. This could probably affect both the outputs of propensity model and the "regression model" as well if the outcome variable is binary, which ultimately affects the performance of the final CATE model. May I ask what your thoughts are on this? Thanks in advance.

from econml.

Jiaqi-ads commented on July 28, 2024

Hi, just wanted to follow up on the issue of speed. I've upgraded the module to v0.15.1 and tried to set both the model_propensity and model_regression to 'linear'. It still took hours to finish training on the dataset whereas it took only four minutes with v0.14.0. Besides, the execution time was the same as setting those parameters to 'auto' and changing the parameters to 'forest' doesn't affect the execution time much either. So I wonder if there could be some other issues?

from econml.

V0.15.0 runs hours longer than V0.14.0 about econml HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent