Git Product home page Git Product logo

banditsbook's Introduction

Code to Accompany the Book "Bandit Algorithms for Website Optimization"

This repo contains code in several languages that implements several standard algorithms for solving the Multi-Armed Bandits Problem, including:

  • epsilon-Greedy
  • Softmax (Boltzmann)
  • UCB1
  • UCB2
  • Hedge
  • Exp3

It also contains code that provides a testing framework for bandit algorithms based around simple Monte Carlo simulations.

Languages

This codebase is split up by language. In most languages, there are parallel implementations of the core algorithms and infrastructure for testing the algorithms:

  • Python
  • Julia
  • Ruby

In R, there is a body of code for visualizing the results of simulations and analyzing those results. The R code would benefit from some refactoring to make it DRYer.

If you're interested in seeing how some of these algorithms would be implemented in Javascript, you should try out Mark Reid's code: http://mark.reid.name/code/bandits/

If you're looking for Java code, try Dani Sola's work: https://github.com/danisola/bandit

If you're looking for Scala code, try everpeace(Shingo Omura)'s work: https://github.com/everpeace/banditsbook-scala

If you're looking for Go code, try Rany Keddo's work: https://github.com/purzelrakete/bandit

If you're looking for Clojure code, try Paul Ingles's work: https://github.com/pingles/clj-bandit

If you're looking for Swift code, see https://github.com/crenwick/Swiper

For a Flask implementation, see https://github.com/DeaconDesperado/flask_mab

Getting Started

To try out this code, you can go into the Python or Julia directories and then run the demo script.

In Python, that looks like:

python demo.py

In Julia, that looks like:

julia demo.jl

You should step through that code line-by-line to understand what the functions are doing. The book provides more in-depth explanations of how the algorithms work.

The Ruby code was contributed by Kashif Rasul. If you're interested in translating the code into another language, please submit a pull request. I will merge any new implementations as soon as I can.

Adding New Algorithms: API Expectations

As described in the book, a Bandit algorithm should implement two methods:

  • select_arm(): A method that returns the index of the Arm that the Bandit object selects on the current play. No arguments are required.
  • update(): A method that updates the internal state of the Bandit object in response to its most recently selected arm's reward. The index of the chosen arm and the amount of reward received must be passed as arguments.

As described in the book, an Arm simulator should implement:

  • draw(): A method that returns a single instance of reward from the arm that was pulled. No arguments are required.

In addition, the Bandit algorithms are designed to implement one additional method used in simulations:

  • initialize(): A method that returns nothing. Instead, this method resets all of the data-driven variables in a Bandit object. For most objects, this resets the counts and values field to their initial states. No arguments are required.

Beyond the testing framework described in the book, I am currently providing an additional system built around the concept of an Environment. Environment objects encapsulate not only a set of Arms, but also a mechanism for having those Arms change over time. This allows you to simulate complex scenarios that aren't well described by a constant set of arms.

If you would like to implement your own Environment, you will need to provide a very simple interface. The Environment interface requries you to implement two methods:

  • arms(): A method that returns the array of arms that exist at time T. You must pass T as an argument.
  • n_arms(): A method that returns the number of arms that the environment will return with each call to arms(). While the arms may change over time, the number of arms should not. No arguments are required.

banditsbook's People

Contributors

crenwick avatar everpeace avatar jcborras avatar johnmyleswhite avatar kashif avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

banditsbook's Issues

Visualizations with matplotlib for python

Hello John,

First, thank you for writing one of the most enjoyable books in the domain!

One thing that puzzled me was since most of the code was in python, why did we not have the charts and visualizations in python as well. Since I did not know R, I spend quite some time into recreating these with matplotlib while I was working thought the book. (example here: https://github.com/niazangels/bandits/blob/master/01-epsilon-greedy.ipynb)

I know there hasn't been any commits to this repo fo a while, but I was wondering if you would be interested in adding these to your repo. If so what would be a good way to structure them (my plots are in Jupyter notebooks, but I can split them into files).

I could raise a pull request in case you're interested, but feel free to close this issue if this isn't something you'd like to pursue.

Bug in epsilon_greedy/plot_standard.R?

The right arm can't be arm number 5 since arm numbers run from 0 to 4. Also after suffling the values for the means the maximum value is not any longer at the end of the list, but in the second place (seed(1) is to blame for this).

Ties not broken randomly

We are currently working on a R package ("contextual") that aims to facilitate the implementation and simulation of both context-free and contextual Multi-Armed Bandit policies in R.

As "Bandit Algorithms for Website Optimization" offers a comprehensive entry-level introduction to context-free bandits policy evaluation, we decided to replicate the book's simulations.

In doing so, we found that the book's source code in this repository deterministically chooses the first arm (in other words, the arm with the lowest index) when rewards between arms are tied:

def ind_max(x):
  m = max(x)
  return x.index(m)

As can be seen in our replication vignette, this introduces a bias that adds up over time, changing simulations' results and plots. To illustrate, left our replication of Figure 4-2 without breaking ties randomly, right when correctly breaking ties randomly:

compare

A patch along the following lines would resolve this issue by breaking ties randomly:

def ind_max(x):
  max_value = max(x)
  max_keys = [k for k, v in enumerate(x) if v == max_value]
  return random.choice(max_keys)

(I presume that the now closed but unresolved #10 also alluded to this particular issue)

License?

I wouldn't mind using the Exp3 code. Had you settled on a license?

Epsilon Greedy: graph for Accuracy, Performance...sensitive to the declaration order of the arm ?

By coding for retrieving performance curve as shown in chapter "analyzing Results from Monte Carlo (chapter 4) study" Approach 1 (Proba of selecting best arm) Approach 2 (Average Reward), I noticed that changing the order of the arm, may change dramatically the curves.
For instance if I declare the mean of my five arms as follow :
means=[0.8, 0.9, 0.1, 0.5, 0.5] n_arms=len(means) random.shuffle(means) arms=[BernoulliArm(mu) for mu in means]

The shuffle change the order within means list, therefore the order of the arms, and the performance curve may be very different.

for instance, considering [0.9, 0.5, 0.1, 0.5, 0.8] as order after shuffling I get this curve (average reward per time):

image

whereas considering [0.1, 0.5, 0.9, 0.5, 0.8] , I get

image

Do you have an explanation ? Same phenomenon for proba of selecting best arm (but no so much).

Parameters:
num_sims=1000
horizon=250
epsilon=0.1

Change description to show code exists for multiple languages

now only R is shown beside this repository name, if we click to show all your repos. if possible change the description to show that other language code exists in this repo. maybe like "code in julia, R, python, Ruby for my book on Multi-Armed Bandit Algorithms"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.