Comments (6)
I just found this problem as well.
Here is another approach, we can use np.where or np.argwhere instead of writing our own codes.
max_actions = np.argwhere(values==np.amax(values))
action = np.random.choice(max_actions.flatten())
from reinforcement-learning-an-introduction.
@zwfcrazy It looks cool! Did you find any bug in the repo, or I have fixed them all?
from reinforcement-learning-an-introduction.
@ShangtongZhang Not yet. I am still reading the book...slowly...I am glad to see that Prof. Sutton has finished the draft! BTW, perhaps we could try to improve the efficiency of the simulations...Exercise 2.9 of chapter 2 requires running the parameter study for 200k steps, which takes days to complete...
from reinforcement-learning-an-introduction.
@ShangtongZhang Hi I just found you didn't fix the argmax problem in chapter 2.
from reinforcement-learning-an-introduction.
@zwfcrazy Did it lead to some bugs? I didn't mean to replace all the np.argmax
from reinforcement-learning-an-introduction.
@ShangtongZhang the simulation results seem no difference, but the simple bandit algorithm requires breaking ties randomly (see p24 of the complete draft). I think it may slow down exploration at the beginning since we assume initial estimates of Q(a) are the same for all actions. This won't be critical later on as ties happen rarely.
from reinforcement-learning-an-introduction.
Related Issues (20)
- Unable to get the same results while formulating differently HOT 1
- A simpler draw function HOT 2
- nit: chapter 6 references
- something wrong in matplotlib HOT 2
- Generalization to abstract classes for Environment/Agents? HOT 2
- tictactoe compete() plays 1000 almost identical games HOT 1
- typo
- wrong figure number for chapter 11
- ten_armed_testbed.py中的figure2_3为何不用“sample_averages”
- problem about chapter04/car_rental.py HOT 1
- example to use it on human genetic data?
- Problem of excercise 2.5
- The plicy of chapter1 HOT 1
- Wrong Bellman equation for Jack's car rental problem? HOT 1
- Unclear point for the code in Blackjack example HOT 1
- l
- ch06 random_walk td method HOT 1
- chapter4 gamblers_problem, showing multiple best actions
- Chapter 2: Couldn't find the file '../images/figure_2_1.png'
- Citing this repository
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reinforcement-learning-an-introduction.