One issue of np.argmax is that it always return the f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

np.argmax may lead to unexpected behavior about reinforcement-learning-an-introduction HOT 6 CLOSED

shangtongzhang commented on May 12, 2024 1

np.argmax may lead to unexpected behavior

from reinforcement-learning-an-introduction.

Comments (6)

zwfcrazy commented on May 12, 2024 1

I just found this problem as well.
Here is another approach, we can use np.where or np.argwhere instead of writing our own codes.

max_actions = np.argwhere(values==np.amax(values))
action = np.random.choice(max_actions.flatten())

from reinforcement-learning-an-introduction.

ShangtongZhang commented on May 12, 2024

@zwfcrazy It looks cool! Did you find any bug in the repo, or I have fixed them all?

from reinforcement-learning-an-introduction.

zwfcrazy commented on May 12, 2024

@ShangtongZhang Not yet. I am still reading the book...slowly...I am glad to see that Prof. Sutton has finished the draft! BTW, perhaps we could try to improve the efficiency of the simulations...Exercise 2.9 of chapter 2 requires running the parameter study for 200k steps, which takes days to complete...

from reinforcement-learning-an-introduction.

zwfcrazy commented on May 12, 2024

@ShangtongZhang Hi I just found you didn't fix the argmax problem in chapter 2.

from reinforcement-learning-an-introduction.

ShangtongZhang commented on May 12, 2024

@zwfcrazy Did it lead to some bugs? I didn't mean to replace all the np.argmax

from reinforcement-learning-an-introduction.

zwfcrazy commented on May 12, 2024

@ShangtongZhang the simulation results seem no difference, but the simple bandit algorithm requires breaking ties randomly (see p24 of the complete draft). I think it may slow down exploration at the beginning since we assume initial estimates of Q(a) are the same for all actions. This won't be critical later on as ties happen rarely.

from reinforcement-learning-an-introduction.

Recommend Projects

np.argmax may lead to unexpected behavior about reinforcement-learning-an-introduction HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent