Hello, I'm trying to create a strong Mancala bot. I chose Q-learning: `# Let's

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Q-learning is a loser? about open_spiel HOT 6 CLOSED

StepHaze commented on July 18, 2024

Q-learning is a loser?

from open_spiel.

Comments (6)

lanctot commented on July 18, 2024

Hi @StepHaze,

It's probably due to the size of Mancala? i.e. I would guess something similar would happen with chess. The state space is too large, so most states are probably only visited once (or very small number of times). Tabular methods will only work on small games (e.g. Tic-Tac-Toe has ~4500 states total).

Try self-play DQN instead? It's basically Q-learning with function approximation and a few extra things (experience replay, target network, etc.)

from open_spiel.

DmitriMedved commented on July 18, 2024

Hi @StepHaze,

Try self-play DQN instead? It's basically Q-learning with function approximation and a few extra things (experience replay, target network, etc.)

I'm interested in it too. Can you please provide us with an example for DQN ?

from open_spiel.

lanctot commented on July 18, 2024

Sure. In OpenSpiel all the agents follow the same API (they are subclasses of the agent base class). So, 99% of the code above can be re-used and you just have to change the instantiation of the agent to DQN instead of tabular_qlearner.QLearning.

There is a full example here.

from open_spiel.

lanctot commented on July 18, 2024

Note the example above is a bit old and still uses the TF1-based DQN.

It can be easily swapped for the PyTorch DQN or JAX DQN if you prefer.

from open_spiel.

StepHaze commented on July 18, 2024

Hi @lanctot
Thank you Marc.
I always thought that a trained neural network (weights) should be stored in the external file, and be loaded every time the bot starts. Am I correct?

from open_spiel.

lanctot commented on July 18, 2024

Sounds like you are talking about checkpoints. The DQN supports saving and loading the networks themselves, but not the replay buffer. But it's certainly not enabled by default. It's something you have to do manually in your training script (check the save + load methods in dqn.py). Or if you use JAX/PyTorch you can just serialize the agent via pickle and save/load that (which would then include the replay buffer).

from open_spiel.

Recommend Projects

Q-learning is a loser? about open_spiel HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent