This repository contains the python implementation of popular multi arm bandits as described in the book:
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: AnIntroduction, Second edition
It includes Epsilon-greedy, softmax action selection, UCB1 and Median Elimination algorithm
To test the algorithms you can run the main.py script as is an it will produce all the graphs as depicted in the report and save them in the plots directory.
To run the agents on custom multi arm bandits problems, you can change the values of variables:
num_arms = 10
plays = 2000
iterations = 1000
-
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition
-
Github Link: Shangtong Zhang, Python implementation of Reinforcement Learning: An Introduction.
-
Even-Dar, Eyal & Mannor, Shie & Mansour, Yishay. (2003). Action Elimination and Stopping Conditions for Reinforcement Learning.. 162-169.