johnmyleswhite / banditsbook Goto Github PK

Code for my book on Multi-Armed Bandit Algorithms

License: Other

Shell 1.87% Julia 26.42% Python 27.01% R 33.90% Ruby 10.81%

banditsbook's Introduction

Code to Accompany the Book "Bandit Algorithms for Website Optimization"

This repo contains code in several languages that implements several standard algorithms for solving the Multi-Armed Bandits Problem, including:

epsilon-Greedy
Softmax (Boltzmann)
UCB1
UCB2
Hedge
Exp3

It also contains code that provides a testing framework for bandit algorithms based around simple Monte Carlo simulations.

Languages

This codebase is split up by language. In most languages, there are parallel implementations of the core algorithms and infrastructure for testing the algorithms:

Python
Julia
Ruby

In R, there is a body of code for visualizing the results of simulations and analyzing those results. The R code would benefit from some refactoring to make it DRYer.

If you're interested in seeing how some of these algorithms would be implemented in Javascript, you should try out Mark Reid's code: http://mark.reid.name/code/bandits/

If you're looking for Java code, try Dani Sola's work: https://github.com/danisola/bandit

If you're looking for Scala code, try everpeace(Shingo Omura)'s work: https://github.com/everpeace/banditsbook-scala

If you're looking for Go code, try Rany Keddo's work: https://github.com/purzelrakete/bandit

If you're looking for Clojure code, try Paul Ingles's work: https://github.com/pingles/clj-bandit

If you're looking for Swift code, see https://github.com/crenwick/Swiper

For a Flask implementation, see https://github.com/DeaconDesperado/flask_mab

Getting Started

To try out this code, you can go into the Python or Julia directories and then run the demo script.

In Python, that looks like:

python demo.py

In Julia, that looks like:

julia demo.jl

You should step through that code line-by-line to understand what the functions are doing. The book provides more in-depth explanations of how the algorithms work.

The Ruby code was contributed by Kashif Rasul. If you're interested in translating the code into another language, please submit a pull request. I will merge any new implementations as soon as I can.

Adding New Algorithms: API Expectations

As described in the book, a Bandit algorithm should implement two methods:

select_arm(): A method that returns the index of the Arm that the Bandit object selects on the current play. No arguments are required.
update(): A method that updates the internal state of the Bandit object in response to its most recently selected arm's reward. The index of the chosen arm and the amount of reward received must be passed as arguments.

As described in the book, an Arm simulator should implement:

draw(): A method that returns a single instance of reward from the arm that was pulled. No arguments are required.

In addition, the Bandit algorithms are designed to implement one additional method used in simulations:

initialize(): A method that returns nothing. Instead, this method resets all of the data-driven variables in a Bandit object. For most objects, this resets the counts and values field to their initial states. No arguments are required.

Beyond the testing framework described in the book, I am currently providing an additional system built around the concept of an Environment. Environment objects encapsulate not only a set of Arms, but also a mechanism for having those Arms change over time. This allows you to simulate complex scenarios that aren't well described by a constant set of arms.

If you would like to implement your own Environment, you will need to provide a very simple interface. The Environment interface requries you to implement two methods:

arms(): A method that returns the array of arms that exist at time T. You must pass T as an argument.
n_arms(): A method that returns the number of arms that the environment will return with each call to arms(). While the arms may change over time, the number of arms should not. No arguments are required.

banditsbook's People

Contributors

Stargazers

Watchers

Forkers

mathdr kashif bhattisatish thomascabrol d1on alaiacano kevinsteck jcborras nloadholtes lelayf chadboyda baotong clstopher ssimeonov kshtzsharma48 abresler stefb47 heath kweiner zuiwanting pcward greenmidori bawerd whitmera imclab vink007 hankstartup kentaroh-toyoda reverna shidaitimes immjolnir aprial andrewdavidmackenzie ccpoirier mindis rmminusrslash leelakrishna dundonian fubuki nvenkataraman1 andrazhribernik uzunovic chrisbg johannesbraun jonahglover tvkpz redsuncmx willbrowne kendooley marchezini fangzheng354 andreas-koukorinis wuxj1991 sshegheva smokymountains jamespaul007 tkrudagawa kunalkishore m4573r nito7 camilledolle marrowp d13sl0w xingwudao salemameen saif-data sweaterr jkushida stephanesbizzera zhmxu kawaiti mathkann sambaiz nikos-daniilidis vikram-jayaram rotemgor mikhailmatveev cshowley perfettiful bharatd11 gembler jljones zhixiangeng harshitajaiswal statwonk barisguns bekterra sakulkar wnstlr everpeace gtostock alexbigboy crenwick jwinternheimer loitd lenovor chenchaodev kingsi collinlee wmelton

banditsbook's Issues

Visualizations with matplotlib for python

Hello John,

First, thank you for writing one of the most enjoyable books in the domain!

One thing that puzzled me was since most of the code was in python, why did we not have the charts and visualizations in python as well. Since I did not know R, I spend quite some time into recreating these with matplotlib while I was working thought the book. (example here: https://github.com/niazangels/bandits/blob/master/01-epsilon-greedy.ipynb)

I know there hasn't been any commits to this repo fo a while, but I was wondering if you would be interested in adding these to your repo. If so what would be a good way to structure them (my plots are in Jupyter notebooks, but I can split them into files).

I could raise a pull request in case you're interested, but feel free to close this issue if this isn't something you'd like to pursue.

Bug in epsilon_greedy/plot_standard.R?

The right arm can't be arm number 5 since arm numbers run from 0 to 4. Also after suffling the values for the means the maximum value is not any longer at the end of the list, but in the second place (seed(1) is to blame for this).

Where can I get the data?

Ties not broken randomly

We are currently working on a R package ("contextual") that aims to facilitate the implementation and simulation of both context-free and contextual Multi-Armed Bandit policies in R.

As "Bandit Algorithms for Website Optimization" offers a comprehensive entry-level introduction to context-free bandits policy evaluation, we decided to replicate the book's simulations.

In doing so, we found that the book's source code in this repository deterministically chooses the first arm (in other words, the arm with the lowest index) when rewards between arms are tied:

def ind_max(x):
  m = max(x)
  return x.index(m)

As can be seen in our replication vignette, this introduces a bias that adds up over time, changing simulations' results and plots. To illustrate, left our replication of Figure 4-2 without breaking ties randomly, right when correctly breaking ties randomly:

A patch along the following lines would resolve this issue by breaking ties randomly:

def ind_max(x):
  max_value = max(x)
  max_keys = [k for k, v in enumerate(x) if v == max_value]
  return random.choice(max_keys)

(I presume that the now closed but unresolved #10 also alluded to this particular issue)

License?

I wouldn't mind using the Exp3 code. Had you settled on a license?

Could I modifid the code to work with other data?

thanks

Epsilon Greedy: graph for Accuracy, Performance...sensitive to the declaration order of the arm ?

By coding for retrieving performance curve as shown in chapter "analyzing Results from Monte Carlo (chapter 4) study" Approach 1 (Proba of selecting best arm) Approach 2 (Average Reward), I noticed that changing the order of the arm, may change dramatically the curves.
For instance if I declare the mean of my five arms as follow :
means=[0.8, 0.9, 0.1, 0.5, 0.5] n_arms=len(means) random.shuffle(means) arms=[BernoulliArm(mu) for mu in means]

The shuffle change the order within means list, therefore the order of the arms, and the performance curve may be very different.

for instance, considering [0.9, 0.5, 0.1, 0.5, 0.8] as order after shuffling I get this curve (average reward per time):

whereas considering [0.1, 0.5, 0.9, 0.5, 0.8] , I get

Do you have an explanation ? Same phenomenon for proba of selecting best arm (but no so much).

Parameters:
num_sims=1000
horizon=250
epsilon=0.1

Change description to show code exists for multiple languages

now only R is shown beside this repository name, if we click to show all your repos. if possible change the description to show that other language code exists in this repo. maybe like "code in julia, R, python, Ruby for my book on Multi-Armed Bandit Algorithms"