Yahtzotron

State your prime directive! - "... to ... roll ..." 🤖 🎲

Yahtzotron is a bot for Yahtzee and Yatzy, trained via advantage actor-critic (A2C) through self-play. Yahtzotron is implemented through the JAX library ecosystem (JAX + Haiku + optax + rlax).

Yahtzee is a game of chance played with 5 dice and involves making strategic decisions based on the outcome of your rolls early in the game. This makes for a surprisingly challenging task for reinforcement learning.

The pre-trained agents are close to perfect play (average scores are around 240 for both Yahtzee and Yatzy, just 5-10 points below perfect play).

Read my blog post about the making of Yahtzotron here.

Usage

Just clone the repository and run

$ pip install .

Then, you can use the Yahtzotron command-line interface:

$ yahtzotron --help
Usage: yahtzotron [OPTIONS] COMMAND [ARGS]...

  This is Yahtzotron, the friendly robot that beats you in Yahtzee.

Options:
  --version                       Show the version and exit.
  -v, --loglevel [debug|info|warning|error]
  --help                          Show this message and exit.

Commands:
  evaluate  Evaluate performance of trained agents.
  origin    Show Yahtzotron's origin story.
  play      Play a game against Yahtzotron.
  train     Train a new model through self-play.

Why don't you try a game against one of the pre-trained agents?

$ yahtzotron play pretrained/yahtzee-score.pkl

Bonus

When you play Yahtzotron, it is going to tell you what its current strategy is before every action (to teach us puny humans how to play):

> My turn!
> Roll #1: [3, 3, 3, 5, 6].
> I think I should go for Threes, so I'm keeping [3, 3, 3].
> Roll #2: [3, 3, 3, 3, 4].
> I think I should go for Threes or Yatzy, so I'm keeping [3, 3, 3, 3].
> Roll #3: [1, 3, 3, 3, 3].
> I'll pick the "Threes" category for that.

train error

Hello. I tried experimenting with the rules so I could train yachzotron for a slightly different game. Unfortunately, call
python3 yahtzotron/cli.py train -o custom.pkl --ruleset yatzy_modified
fails with

  0%|                                                             | 1/20000 [00:01<7:01:30,  1.26s/it, actor_loss=8.04, critic_loss=77, entropy_loss=-0.000279, loss=85.1, score=173]
Traceback (most recent call last):
  File "/home/user/storage/lab/yahtzotron/yahtzotron/cli.py", line 171, in <module>
    cli()
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/yahtzotron/cli.py", line 70, in train
    yzt = train_a2c(yzt, num_epochs=20_000, pretraining=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/yahtzotron/training.py", line 197, in train_a2c
    weights, opt_state = sgd_step(
                         ^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/yahtzotron/training.py", line 90, in sgd_step
    updates, opt_state = optimizer.update(gradients, opt_state)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/optax/transforms/_accumulation.py", line 380, in update
    new_updates, new_state = lax.cond(
                             ^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/optax/transforms/_accumulation.py", line 336, in _do_update
    acc_grads = jtu.tree_map(
                ^^^^^^^^^^^^^
ValueError: Custom node type mismatch: expected type: <class 'haiku._src.data_structures.FlatMap'>, value: {'linear': {'b': Traced<ShapedArray(float32[128])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[25,128])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_1': {'b': Traced<ShapedArray(float32[256])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,256])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_2': {'b': Traced<ShapedArray(float32[128])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[256,128])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_3': {'b': Traced<ShapedArray(float32[1])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,1])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_4': {'b': Traced<ShapedArray(float32[32])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,32])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_5': {'b': Traced<ShapedArray(float32[15])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,15])>with<DynamicJaxprTrace(level=2/0)>}}.
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

Same goes for unmodified rules.
My guess is the vague setup.py file, which does not specify the versions of the packages, and newer versions lost compatibility with your code.
Is there any chance to restore the knowledge of the environment in which yachzotron was created and share it?

dionhaefner / yahtzotron Goto Github PK

yahtzotron's Introduction

Yahtzotron

Usage

Bonus

yahtzotron's People

Contributors

Stargazers

Watchers

Forkers

yahtzotron's Issues

train error

No issue here, just wanted to ask a couple of questions

Would it be possible for the bot to act as a coach?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent