Comments (3)
One simple comparison that would be useful is how the memory consumption of standard sklearn RandomForests compares on dataframes of the same size, since much of the EconML tree code was forked from sklearn (version 0.24, I believe).
Although 180GB does seem excessive, I don't think it is really exponential - if your input has 40M floating point values, the raw data for that alone is 320MB, so this is ~560 times the size of your dataset. Certainly if we can easily optimize things to bring this down we should, but it's not even quadratic in the number of elements.
You mention that memory is high for both fit and effect: do you mean that while running those methods memory usage spikes but then comes back down to a more reasonable amount when the method calls complete?
from econml.
You mention that memory is high for both fit and effect: do you mean that while running those methods memory usage spikes but then comes back down to a more reasonable amount when the method calls complete?
Yes, memory usage spikes, but then comes back down.
I'm trying to investigate better inside fit, but in predict_point_and_var, I identified that the spike of memory comes after the second Parallel call inside var condition, so I think memory spike is probably origined on these rows:
EconML/econml/grf/_base_grf.py
Lines 703 to 763 in db1e254
from econml.
Another important detail. I was using a treatment dataframe with featurizer, making me have 6 columns in T. I was inspecting code, and, in many steps, they use a cross product of T over T. I think this is contributing for this memory spike too.
from econml.
Related Issues (20)
- DynamicDML() issue: AttributeError: Provided crossfit folds contain training splits that don't contain all treatments DynamicDML HOT 5
- Inconsistent ATE estimation HOT 3
- Confidence Interval for categorical outcome HOT 3
- [Bug] fit_cate_incercept argument in econml.dml.DML does not add intercept correctly HOT 5
- `shap_values` for tree-based models doesn't set `check_additivity=False` as expected HOT 3
- A column-vector y was passed when a 1d array was expected (however, y is already a 1d array) HOT 1
- Individual Treatment Effects HOT 1
- How to get the Confidence Interval for ATE instead of CATE HOT 1
- Converting to Python object not allowed without gil HOT 1
- Reproducible error: SHAP ExplainerError: Additivity check failed in TreeExplainer HOT 4
- Questions regarding DRPolicyForest results HOT 2
- DRtester does not work for binary treatment AND binary outcome HOT 5
- Confounder adjusting before applying the ITE model to observational data
- Calculation of confidence intervals in NormalInferenceResults becomes very slow when passing big dataframes HOT 2
- DML discrete outcome HOT 1
- Questions about econml and CausalForestDML
- Reduce residual confounding in time series
- How can I calculate the treatment effect function in a double machine learning model? HOT 1
- Why Shape of Y in Causal Forest notebook is 1000*1000 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from econml.