cestpasphoto / alpha-zero-general Goto Github PK

A very fast implementation of AlphaZero, applied to games like Splendor, Santorini, The Little Prince, … Browser version available

License: MIT License

Python 100.00%

alphago alphago-zero alphazero machikoro minivilles python pytorch reinforcement-learning santorini santorini-game

alpha-zero-general's People

Stargazers

Watchers

Forkers

seedds garavanof kuboyoo r-n balpeaux aethy vincentcartillier liukx08 lyquentxy mmkalll

alpha-zero-general's Issues

A few questions about the Splendor AI

Hi, this looks like a fantastic implementation of AlphaZero for Splendor—thanks for making it. I had a few questions:

Splendor has a few mechanics that chess, Go, and Shogi don't seem to have. How do you handle them? In particular I'm thinking of:
a. Hidden information: you can take a face-down card from the pile and it remains hidden until you play it.
b. Chance: the cards are shuffled.
c. Multiplayer: you can have more than two players.

Is there any new theory involved or does the same old MCTS + NNS work just fine? If the latter, is there anything special you have to do to handle these different gameplay elements?

How good is the best bot you've trained? And how do you know how good it is?

Thank you!

Would you share model.pt ?

I have implemented token exchange (all 406 ways except gold return) based on this repository and modified the environment to be more similar to the actual Splendor.
https://github.com/kuboyoo/alpha-zero-general

I would like to compare the strength of the model (model.onnx {cpuct=1.0, fpu=0.1, numMCTSSims=6400}) in this repository with my model re-trained in the modified environment.
Would you be willing to share the .pt file before conversion to .onnx?

Errors running/retraining Splendor using commands from tutorial

I tried to play Splendor using the command from the tutorial (I first changed the package imports):

python ./pit.py splendor/pretrained_2players.pt human -n 1

But I got this following error:

File "D:\programs\Python\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

So I figured maybe it's due to the mentioned issue "Ongoing code/features rework, some pretrained networks won't work anymore". So I reverted to the version of 30/1/2024, without avail. Then I decided to first run the training myself, using the example from the tutorial (I had to add the -V 85 though, otherwise it complained about version 1 not existing):

python main.py -m 800 -e 1000 -i 5 -F -c 2.5 -f 0.1 -T 10 -b 32 -l 0.0003 -p 1 -D 0.3 -C ../results/mytest -V 85

But now I got the following error:

  File "D:\programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'SplendorNNet' object has no attribute 'first_layer'

ONNXRuntimeErrors trying to run or train Splendor

Heya, thank you for your awesome additions to alpha-zero!

I tried to run your code, but unfortunately ran into some errors that look similar to the ones in #3.
First, updated all the dependencies:

pip3 install -U onnxruntime numba tqdm colorama coloredlogs
pip3 install -U torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu

In particular, I'm using:

colorama             0.4.6
coloredlogs          15.0.1
numba                0.59.1
onnxruntime          1.17.3
torch                2.3.0+cpu
torchvision          0.18.0+cpu
tqdm                 4.66.4

And then tried the commands from the readme:

python ./pit.py splendor splendor/pretrained_2players.pt human -n 1

which still printed the initial game-board, but then threw:

Error log

Turn 1 Player 0: Traceback (most recent call last):
  File "D:\alpha-zero-general\pit.py", line 252, in <module>
    main()
  File "D:\alpha-zero-general\pit.py", line 246, in main
    play(args)
  File "D:\alpha-zero-general\pit.py", line 71, in play
    result = arena.playGames(args.num_games, initial_state=args.state, verbose=args.display or human)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 123, in playGames
    gameResult = self.playGame(verbose=verbose, initial_state=initial_state, other_way=not one_vs_two)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 74, in playGame
    action = players[curPlayer](canonical_board, it)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\pit.py", line 59, in <lambda>
    player = lambda x, n: np.argmax(mcts.getActionProb(x, temp=(0.5 if n <= 6 else 0.), force_full_search=True)[0])
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\MCTS.py", line 65, in getActionProb
    self.search(canonicalBoard, dirichlet_noise=dir_noise, forced_playouts=forced_playouts)
  File "D:\alpha-zero-general\MCTS.py", line 144, in search
    Ps, v = self.nnet.predict(canonicalBoard, Vs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 100, in predict
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

And running:

python main.py splendor -m 800 -f 0.1 -l 0.0003 -D 0.3 -C ../results/mytest -V 74

Yielded:

Error log

Traceback (most recent call last):
  File "C:\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Python\Python312\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 142, in predict_server
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

Unrelated, but I think in the readme, line 108, the game-argument splendor is missing.

Usage of OneCycleLR restarting at every iteration

Hi, I have recently stumbled upon this repository and am going through the code to better understand Alpha Zero.

One weird thing I noticed is the creation of the OneCycleLR scheduler each time the training function of the model is called. Since it happens at every iteration, the learning rate probably ends up very bumpy. The scheduler was created with supervised learning in mind, where the training process is more straightforward.

At the same time, the models seems to learn to play very well, so that cannot be so bad.

Do you have any insights on why it works? Or maybe you have a graph of the learning rates throughout training to illustrate what happens?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.