Comments (3)
The only change that I can think of is that we have changed the default first-stage propensity and regression models to do model selection between linear and forest models instead of always just using a linear model.
We made this change because the accuracy of the CATE estimate depends strongly on having good models, and for many datasets we'd expect forest models to fit the data much better. In general, this has not resulted in large slowdowns in our own internal testing, but perhaps you have a much larger number of rows or columns than we've been testing on - what are the shapes of your Y, T, X, and W inputs?
If fitting forest models is the cause of the slowdown, you can explicitly pass first-stage models of your choice instead. However, as I mentioned it is important to use models that can actually fit your data well if you want to get accurate CATE estimates, so I would only fall back on linear models if you are confident that those have good predictive power in your setting.
As a side note, we released v0.15.1 yesterday, which contains some bugfixes, so you may want to upgrade to that, but I don't expect it to affect your performance issues if the cause is what I've outlined above.
from econml.
Thanks for your prompt response! @kbattocchi
The dataset I was testing on contains about 500,000 rows and have about 50 columns in X and W combined, which consists of mostly the one-hot encoded categorical variables. So maybe it is because of the changes in the default first stage models?
On the accuracy of the first-stage models though, although I agree that forest models tend to have better accuracy and more accurate first-stage models lead to better CATE estimation, I'm aware that there are some arguments saying that forest models tend to generate more extreme probability scores in classification tasks. This could probably affect both the outputs of propensity model and the "regression model" as well if the outcome variable is binary, which ultimately affects the performance of the final CATE model. May I ask what your thoughts are on this? Thanks in advance.
from econml.
Hi, just wanted to follow up on the issue of speed. I've upgraded the module to v0.15.1 and tried to set both the model_propensity and model_regression to 'linear'. It still took hours to finish training on the dataset whereas it took only four minutes with v0.14.0. Besides, the execution time was the same as setting those parameters to 'auto' and changing the parameters to 'forest' doesn't affect the execution time much either. So I wonder if there could be some other issues?
from econml.
Related Issues (20)
- Reproducible error: SHAP ExplainerError: Additivity check failed in TreeExplainer HOT 4
- Questions regarding DRPolicyForest results HOT 2
- DRtester does not work for binary treatment AND binary outcome HOT 5
- Confounder adjusting before applying the ITE model to observational data
- Calculation of confidence intervals in NormalInferenceResults becomes very slow when passing big dataframes HOT 2
- DML discrete outcome HOT 1
- High memory footprint for big dataframes in CausalForest model HOT 3
- Questions about econml and CausalForestDML
- Reduce residual confounding in time series
- How can I calculate the treatment effect function in a double machine learning model? HOT 1
- Why Shape of Y in Causal Forest notebook is 1000*1000 HOT 2
- Migrate DeepIV to new TensorFlow API or PyTorch
- Support numpy 2.0 HOT 1
- Support scikit-learn 1.5.0
- Causal Forest DML has very wide confidence interval HOT 1
- oob_predict_interval: request to add functionality for prediction of out-of-bag confidence intervals
- DeepIV.fit with Inference='bootstrap' throws error
- ModuleNotFoundError: No module named 'econml.dynamic' HOT 1
- Error of crossfit folds splits with DynamicDML
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from econml.