Git Product home page Git Product logo

yahtzotron's Introduction

Yahtzotron

State your prime directive! - "... to ... roll ..." ๐Ÿค– ๐ŸŽฒ

Yahtzotron is a bot for Yahtzee and Yatzy, trained via advantage actor-critic (A2C) through self-play. Yahtzotron is implemented through the JAX library ecosystem (JAX + Haiku + optax + rlax).

Yahtzee is a game of chance played with 5 dice and involves making strategic decisions based on the outcome of your rolls early in the game. This makes for a surprisingly challenging task for reinforcement learning.

The pre-trained agents are close to perfect play (average scores are around 240 for both Yahtzee and Yatzy, just 5-10 points below perfect play).

Read my blog post about the making of Yahtzotron here.

Usage

Just clone the repository and run

$ pip install .

Then, you can use the Yahtzotron command-line interface:

$ yahtzotron --help
Usage: yahtzotron [OPTIONS] COMMAND [ARGS]...

  This is Yahtzotron, the friendly robot that beats you in Yahtzee.

Options:
  --version                       Show the version and exit.
  -v, --loglevel [debug|info|warning|error]
  --help                          Show this message and exit.

Commands:
  evaluate  Evaluate performance of trained agents.
  origin    Show Yahtzotron's origin story.
  play      Play a game against Yahtzotron.
  train     Train a new model through self-play.

Why don't you try a game against one of the pre-trained agents?

$ yahtzotron play pretrained/yahtzee-score.pkl

Bonus

When you play Yahtzotron, it is going to tell you what its current strategy is before every action (to teach us puny humans how to play):

> My turn!
> Roll #1: [3, 3, 3, 5, 6].
> I think I should go for Threes, so I'm keeping [3, 3, 3].
> Roll #2: [3, 3, 3, 3, 4].
> I think I should go for Threes or Yatzy, so I'm keeping [3, 3, 3, 3].
> Roll #3: [1, 3, 3, 3, 3].
> I'll pick the "Threes" category for that.

yahtzotron's People

Contributors

dionhaefner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yahtzotron's Issues

train error

Hello. I tried experimenting with the rules so I could train yachzotron for a slightly different game. Unfortunately, call
python3 yahtzotron/cli.py train -o custom.pkl --ruleset yatzy_modified
fails with

  0%|                                                             | 1/20000 [00:01<7:01:30,  1.26s/it, actor_loss=8.04, critic_loss=77, entropy_loss=-0.000279, loss=85.1, score=173]
Traceback (most recent call last):
  File "/home/user/storage/lab/yahtzotron/yahtzotron/cli.py", line 171, in <module>
    cli()
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/yahtzotron/cli.py", line 70, in train
    yzt = train_a2c(yzt, num_epochs=20_000, pretraining=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/yahtzotron/training.py", line 197, in train_a2c
    weights, opt_state = sgd_step(
                         ^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/yahtzotron/training.py", line 90, in sgd_step
    updates, opt_state = optimizer.update(gradients, opt_state)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/optax/transforms/_accumulation.py", line 380, in update
    new_updates, new_state = lax.cond(
                             ^^^^^^^^^
  File "/home/user/storage/lab/yahtzotron/venv/lib/python3.12/site-packages/optax/transforms/_accumulation.py", line 336, in _do_update
    acc_grads = jtu.tree_map(
                ^^^^^^^^^^^^^
ValueError: Custom node type mismatch: expected type: <class 'haiku._src.data_structures.FlatMap'>, value: {'linear': {'b': Traced<ShapedArray(float32[128])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[25,128])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_1': {'b': Traced<ShapedArray(float32[256])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,256])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_2': {'b': Traced<ShapedArray(float32[128])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[256,128])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_3': {'b': Traced<ShapedArray(float32[1])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,1])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_4': {'b': Traced<ShapedArray(float32[32])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,32])>with<DynamicJaxprTrace(level=2/0)>}, 'linear_5': {'b': Traced<ShapedArray(float32[15])>with<DynamicJaxprTrace(level=2/0)>, 'w': Traced<ShapedArray(float32[128,15])>with<DynamicJaxprTrace(level=2/0)>}}.
--------------------
For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

Same goes for unmodified rules.
My guess is the vague setup.py file, which does not specify the versions of the packages, and newer versions lost compatibility with your code.
Is there any chance to restore the knowledge of the environment in which yachzotron was created and share it?

No issue here, just wanted to ask a couple of questions

Heyy,

First of all, great project, I'm glad to see it succeeded! I'm working on a similar project of making A2C model learn to play Yamb (it's a variation of Yahtzee played with 6 dices where you keep 5 and the categories are different), so I was wondering if you could possibly find a couple of minutes of your day to answer some of my questions of how you got the agent to learn, since it's a pretty similar problem that we're dealing with.

Sorry for opening an issue, I just had no idea of how to message you other than this :)

Regards,
Aleksandar

Would it be possible for the bot to act as a coach?

Absolutely marvellous program, I really enjoyed playing against the bot.

Would it be possible for the bot to act as a coach? Meaning that it would tell you the what goals to aim for, what dice to keep, and what the probablity of a specific strategy is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.