bayesianbandits / bayesianbandits Goto Github PK

View Code? Open in Web Editor NEW

81.0 81.0 1.0 3.23 MB

A Pythonic microframework for multi-armed bandit problems

License: MIT License

Python 100.00%

bayesian-statistics multi-armed-bandits reinforcement-learning

bayesianbandits's Introduction

`bayesianbandits`

Bayesian Multi-Armed Bandits for Python

Problem: Despite having a conceptually simple interface, putting together a multi-armed bandit in Python is a daunting task.

Solution: bayesianbandits is a Python package that provides a simple interface for creating and running Bayesian multi-armed bandits. It is built on top of scikit-learn and scipy, taking advantage of conjugate priors to provide fast and accurate inference.

While the API is still evolving, this library is already being used in production for marketing optimization, dynamic pricing, and other applications. Are you using bayesianbandits in your project? Let us know!

Features

Simple API: bayesianbandits provides a simple interface - most users will only need to call pull and update to get started.
Fast: bayesianbandits is built on top of already fast scientific Python libraries, but, if installed, will also use SuiteSparse to further speed up matrix operations on sparse matrices. Handling tens or even hundreds of thousands of features in a sparse model is no problem.
scikit-learn compatible: Use sklearn pipelines and transformers to preprocess data before feeding it into your bandit.
Flexible: Pick from a variety of policy algorithms, including Thompson sampling, upper confidence bound, and epsilon-greedy. Pick from a variety of prior distributions, including beta, gamma, normal, and normal-inverse-gamma.
Extensible: bayesianbandits provides simple interfaces for creating custom policies and priors.
Well-tested: bayesianbandits is well-tested, with nearly 100% test coverage.

Compatibility

bayesianbandits is tested with Python 3.9, 3.10, 3.11, and 3.12 with scikit-learn 1.2.2, 1.3.2, and 1.4.2.

Getting Started

Install this package from PyPI.

pip install -U bayesianbandits

Define a LinearUCB contextual bandit with a normal prior.

import numpy as np
from bayesianbandits import (
    Arm,
    NormalInverseGammaRegressor,
    ContextualAgent,
    UpperConfidenceBound,
)

arms = [
    Arm(1, learner=NormalInverseGammaRegressor()),
    Arm(2, learner=NormalInverseGammaRegressor()),
    Arm(3, learner=NormalInverseGammaRegressor()),
    Arm(4, learner=NormalInverseGammaRegressor()),
]

policy = UpperConfidenceBound(alpha=0.84)

Instantiate the agent and pull an arm with context.

agent = ContextualAgent(arms, policy)

context = np.array([[1, 0, 0, 0]])

# Can be constructed with sklearn, formulaic, patsy, etc...
# context = formulaic.Formula("1 + article_number").get_model_matrix(data)
# context = sklearn.preprocessing.OneHotEncoder().fit_transform(data)

agent.pull(context)

Update the bandit with the reward.

agent.update(context, np.array([15.0]))

That's it! Check out the documentation for more examples.

bayesianbandits's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo

bayesianbandits's Issues

Add sparse matrix support to linear estimators

Users should be able to specify if they want to store the covariance matrix as a dense or sparse matrix in the constructor, and it should be handled smoothly from there.

Potentially also emit warnings if the stored matrix is dense.

Automatically drop removed arms from `delayed_reward` bandits

Currently, the bandit will forever keep around unique_id-arm pairs in the cache for dropped arms.

Better handling of prior setting for one-hot encoded data when using Bayesian linear regression

Currently, if the user chooses to set priors for each coefficient separately, they'd have to manually set a value for each coefficient generated by one-hot encoding. This is a bit inconvenient, as it requires the user to check how many categories are in each of their categorical columns to set up the prior.

Instead, we could assume the prior is the same for each category, allowing the user to just provide a prior vector that has the same length as the number of columns in their data, which seems more intuitive.

`alpha` in known mean, unknown variance linear regression should be precision, not variance

It's currently being initialized as if alpha parameterizes cov instead of cov_inv

Agent API doesn't share random state between arms

For reproducibility during testing, this would be desired.

Support masking of arms when pulling

For continuum-arm bandits being treated with fixed discretization, it can be useful to "activate" or "deactivate" arms in certain scenarios. Additionally, there may be business reasons to want to never pull an arm in certain contexts. Providing an interface for masking arms for a given pull would give users a lot of flexibility to handle these situations.

Deprecate `action_function` - initialize arms with some token that's passed back if the arm is chosen by `pull`

ArmProtocol doesn't contain action_token as an attribute

Better error handling in `update`

Raise custom exception if key not found in cache for delayed_reward bandits
Raise warning if key not found in cache for batch updates

Use SuperLU's symmetric mode to speed up sampling and updates

Context should be allowed to be sparse matrices when underlying learner can handle them

Attempting to re-use a `unique_id` should throw an error

Gamma regressor `sample` output dimensions are inconsistent with normal regressors

This is likely because we only support a single "feature" in the gamma regressor, and many numpy/scipy operations often squeeze the output.

This causes batch pulling to behave oddly.

Potential bug with agent.decay() when not all arms are updated

Context: trying to update the arms of an agent using batches of data (weeks at the time) where it is possible that not all arms appear in every given week.
An agent.decay() is applied after each week update.

Nevertheless, that creates an issue after the first update and the first decay, potentially involving the covariance matrix.
If the Sparse = False option is used, the error returned is: LinAlgError: Matrix is singular.
If Sparse = True, the error returned is: ValueError: inconsistent shapes

Both errors seem to be triggered at the following step: # Update the inverse covariance matrix

Potential cause: for the arms that were not present in a given week, the covariance matrix was never initialized.
Potential fix: only decay an arm if check_if_fitted_ returns true

Adding the name property broke old bandit pickles

Support batch pulling in non-delayed reward contexts

Support `scikit-umfpack` if it is present but `scikit-sparse` is not

Add support for CHOLMOD if it is on the system

Bug in Sparse LU decomposition

When computing the LU decomposition of a sparse, square matrix, permutation of the columns will not guarantee that the LU decomposition is equal to the Cholesky decomposition.

Potential fix: set permc_spec="NATURAL" in splu, to keep the natural order of the columns

Cache Cholesky decomposition of sparse covariance matrix to speed up multiple `pull`s

Allow users to `decay` bandit on their own

In delayed_reward situations, calling decay after each update doesn't make sense, as updates may not come in the same order as they were pulled. To account for this, the user should be able to decide when to decay the bandit, just as they decide when to pull and update.

Allow batch `pull` operations, similar to how batch `update` operations are implemented

Fix type hint for `reward_function`

Reward functions can take a scalar and return a scalar, take an array and return an array, or take an array and return a scalar.

Add integration tests

That the docs run properly is okay, but testing the entire user-facing API would be good

Make it the caller's responsibility to remember which arm was pulled for delayed_reward bandits?

This would make it so the function signatures wouldn't have to change between various bandit forms.

Use `scipy.stats.Covariance` to circumvent slow `multivariate_normal` and `multivariate_t` calls in Normal regressors

Batch updates for `delayed_reward` bandits to reduce calls to `arm.update`

arm.update calls can be very expensive, especially when the learner has a LAPACK call. Currently, a batch of updates for a delayed_reward looks up each arm associated with each unique_id, then calls arm.update. Grouping all unique_ids that resulted in pulling the same arm would bound the number of arm.update calls per bandit.update to the number of total arms, which is far far less than it currently is.

Lookup arms by token for manual updates

Make the default `reward_function` the identity function

This seems common enough that it's worth saving a few characters for the end user

Deprecate `delayed_reward` requiring an additional `unique_id` argument to `pull` and `update`

delayed_reward makes the API hard to statically type check. In most real-world scenarios (including my own experiences putting these models into production), the calling code should be keeping track of arms pulled in a delayed-reward scenario. Perhaps the cleanest way to implement this is by having delayed_reward add a arm_to_update property with a setter that takes an action token. Then, users could do something like agent.arm_to_update(19.5).update(X, y).

CHORE: clean up `Bandit` methods

There are branches for delayed_reward and within delayed_reward for single or batch pulls. The clarity is not super good.