Comments (8)
Thank you for information. An implementation based on statsmodels seems to be a way for us how to get fast Causal Impact.
Thank you for the link to the older pycausalimpact library.
from tfcausalimpact.
Hi @mc-karsa-tech ,
I like your idea. This is technically possible but I its implementation might not be that easy. The main issue is that a pre-trained model is essentially represented not only by the model itself but also by the posterior samples of each parameter there, which so far can be processed by two algorithms (variational inference and hamiltonian monte carlo).
So we'd need to think in a way to accept not only a customized model but also the resulting posterior samples.
I'll keep this open and if I find an easy way to implement it I'll try to allocate some time to add this feature (PRs with full unit tests are also welcome).
Thanks,
Will
from tfcausalimpact.
Hi Will,
thank you for your response.
I confess that I don't understand it fully since I don't have sufficient knowledge in statistics but I do understand that it is not easy to do.
One thing I know is that we use hmc (hamiltonian monte carlo) instead of vi (variational inference) because hmc seems to be more stable when we repeat evaluation on the same data. Unlike hmc, vi gives us very different results on each run.
Have a nice day!
from tfcausalimpact.
Hi @mc-karsa-tech ,
I'd like to confirm one thing: when you say "incremental training" you mean that the pre-trained model should be trained again with new fewer rows? I thought at first that the same pre-trained model would be solely used and only post-intervention rows would change.
Let me know what interpretation is correct :)
from tfcausalimpact.
Hi Will,
it would help us, if we would be able to train pre-intervention rows and reuse this trained model for different sets of post-intervention data, so that evaluation of each new set of post-intervention rows would be fast. This way we could have better speed until pre-intervention data are changed.
The second option (reusing only some first part of pre-intervention and appending new rows to it as well as changing post-interention rows) would provide us even better speed up.
Have a nice day!
from tfcausalimpact.
Hi @mc-karsa-tech ,
I've been thinking on ways to cache the model for your use case but unfortunately so far I couldn't find anything useful. The main issue is how TFP implemented the linear regression, as you can see here. The post-intervention data affects the training of the model by setting the design matrix of the linear regression.
This prevents us from installing any viable caching system.
Also, for the second option, that would not be possible as well. If we change training data a bit then the whole training has to run again, only thing that might help in this case is setting the priors with the post-fitted posteriors of the model but performance-wise that wouldn't change anything.
So the bad news is that for now I don't see any viable solution for this problem, as far as caching goes.
What could be another option though is to run your job in parallel. Have you tried multiprocessing already? I suspect that for now that's the only technique at our disposal that will help in this issue (notice that multi-threading won't work as it's CPU bound, not IO).
Let me know what you think.
Best,
Will
from tfcausalimpact.
Hi Will,
thank you for your answer. We already run multiple processes with Causal Impact computation, but it only gives several times improved speed up. It seems still not enough for us.
Do you thing that we could train Neural Network (NN) to predict Causal Impact and use this trained NN to predict impact instead of using actual Causal Impact computation?
from tfcausalimpact.
I see. As for the NN I think it's possible (maybe using LSTMs). Not sure if that would help much performance-wise and also not sure if you can extrapolate confidence intervals with this technique.
Another thing that can be done is to use statsmodels UnobservedComponents
which was the first algorithm used on pycausalimpact (the package unfortunately has been deleted by Dafiti so it may be a bit out of date). Still it's dozens of time faster than the TFP implementation (but it doesn't follow a bayesian approach so you can't really manipulate priors).
A cool thing to do would be to update tfcausalimpact to also run on top of statsmodels but that would take lots of time as well.
from tfcausalimpact.
Related Issues (20)
- Installation HOT 3
- Add compatibility for Python 3.10 HOT 4
- 'CausalImpact' object has no attribute 'posterior_dist' HOT 2
- TypeError: ufunc 'isfinite' not supported for the input types HOT 5
- is there a way to check the coefficient of regression part ? HOT 4
- Saving Figures of the Model Output HOT 2
- AttributeError: 'NoneType' object has no attribute 'loc' HOT 4
- Understanding the results and improving the model HOT 2
- p-value is always less than 0.5 HOT 1
- How to save model ? HOT 2
- Categorical Variables HOT 2
- have an error when using customized model HOT 6
- ResourceExhaustedError HOT 1
- How to save the results HOT 2
- Support Python 3.11
- AttributeError: 'NoneType' object has no attribute 'loc' HOT 3
- how to specify which column is y HOT 2
- Question: Is it possible to extract inclusion probabilities of predictors from trained model? HOT 1
- Warnings: deprecated tensorflow features HOT 1
- Question: how to access the P-value as a variable from the model HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tfcausalimpact.