bandit's Introduction

Bandit

This project was done alongside reading through the Sutton and Barto book Reinforcement Learning, it is based on the first chapter on Bandit processes.

The module contains the code for two bandit different processes: one which is a stationary process where the task of the algorithm is to learn the best bandit lever to pull, the other is non-stationary and requires the algorithm to discount information from earlier pulls to allow for the fact that a lever may change in how favourable the rewards it produces are. Two algorithms are put in the module also - the softmax and epsilon greedy algorithms.

The two other files contain scripts to allow the easy plotting of the results of an average performance of each of these algorithms over many iterations.

Recommend Projects

jamied157 / bandit Goto Github PK

bandit's Introduction

Bandit

This project was done alongside reading through the Sutton and Barto book Reinforcement Learning, it is based on the first chapter on Bandit processes.

bandit's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent