dvbuntu / barmpy Goto Github PK
View Code? Open in Web Editor NEWPython module for Bayesian Additive Regression Models
Home Page: https://dvbuntu.github.io/barmpy
License: MIT License
Python module for Bayesian Additive Regression Models
Home Page: https://dvbuntu.github.io/barmpy
License: MIT License
PyMC is a python library that tries to fit bayesian models with Markov Chain Monte Carlo. BARN essentially fits that mold, so it would be instructive and potentially useful to port barmpy
to that ecosystem. PyMC is a different approach from sklearn
, however, so there may be a bit of learning curve. Some good first steps:
BARN
to PyMC, using PyMC-BART as a starting pointBARN
, keeping sklearn
compatibility.It'd be great to have a Python implementation of BART in barmpy
! Note that BARTPy exists, but it hasn't been updated in several years. It would still serve as an excellent starting place.
This issue should also do some refactoring to barmpy.barn
so generic routines can be used in both BARN and BART. That will help future features like BAR-Support Vector Machines and the like.
BARN models currently only return a single ensemble from the posterior distribution (i.e. a single MCMC replicate). BART, however, allows returning an average over multiple MCMC iterations. Doing such averaging means the final model approximates the expected value of the posterior distribution, not just a single sample from it. This may improve modeling results in some contexts, especially if the variance in the posterior is relatively large (measured by the model sigma
estimate).
Practically, there are a few considerations. First, because successive MCMC iterations are correlated, we only want to sample every so many steps (anecdotally, the integrated autocorrelation time is about 7 steps, but that depends on the problem, ensemble size, and other parameters). From a computational perspective, we can save some effort if the same model within the ensemble stays the same (i.e. declines to transition) between two samples in the average. In that case, we can just double weight that model. This requires some additional bookkeeping over just saving every Kth ensemble separately.
The actual output should probably be saved as a new ensemble model (even a barmpy.barn.BARN
object itself), just with num_nets*M
total networks, where M
is the number of samples from the posterior to average over. The final output should also divide by M
to ensure it's an average, or we can adjust the weights of the final NN layer to scale similarly (i.e. divide those weights by M
instead and sum over the various ensembles).
To better encourage and manage user-submitted contributions like new methods and custom callbacks, we should add both a contribution guide walking through the process as well as code of conduct to set expectations.
Contribution guide can be a markdown file with a small example, say a custom callback. It should walk a user through the steps of integrating this feature into barmpy
. Namely:
barmpy
maintainersAs a practical matter, features added by the primary developers (i.e. Dr. Van Boxel) will likely continue on the main
branch directly for now.
The code of conduct can be a short statement, likely as part of the contribution guide, asserting how to engage with the barmpy
community. We can pull some examples from https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project, but the short answer will be treating people with respect, understanding that different opinions can exist, and keeping discussion within barmpy
focused on the development of this project (i.e. not wider mathematical discussion, however fun that may be).
BART and BARN exist, but Support Vector Machines (SVMs) are another machine learning method that might be useful to ensemble this way, giving us Bayesian Additive Regression SVMs (BARS).
BARS most likely will define its state space as the hyperparameters for the kernels (i.e.
Practically, we can use SVMs from sklearn
. We'll need an extra argument for the kernel, and we can have the kernel choice affect the prior and transition function as needed. So maybe start with a polynomial kernel, then try a gaussian kernel. Slowly add more kernels, generalizing as we go.
After implementing, we should do some extensive analysis of how well BARS does on benchmark data. Is it better than BARN? Maybe it's faster? This could make a great paper.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.