- Python Mode for Processing 3 by Jonathan Feinberg
It simulates an agent moving through the board, measuring the utility and policy of every state that it move on.
It has two modes, one that it is manually moved, and other that it moves seeking for the highest utility value, to get to the terminal state, and both can be used simultaneously.
- r : Is the immediate reward of the movement
- d : Is the punishment to when the agent hits the wall
- ui_width : Is the width value where the board will be draw
- ui_height : Is the height value where the board will be draw
- dI : Is the initial "i" position of the agent and can not be greater than the number of rows
- dJ : Is the initial "j" position of the agent and can not be greater than the number of columns
- rows : Is the number of rows of the board, by default it is 3
- cols : Is the number of columns of the board, by default it is 2
- a : Alpha is the learning rate, by default it is 0.5
- g : Gama is the discount factor, by default it is 0.8