Git Product home page Git Product logo

alpha-zero-general's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

alpha-zero-general's Issues

A few questions about the Splendor AI

Hi, this looks like a fantastic implementation of AlphaZero for Splendor—thanks for making it. I had a few questions:

  1. Splendor has a few mechanics that chess, Go, and Shogi don't seem to have. How do you handle them? In particular I'm thinking of:
    a. Hidden information: you can take a face-down card from the pile and it remains hidden until you play it.
    b. Chance: the cards are shuffled.
    c. Multiplayer: you can have more than two players.

Is there any new theory involved or does the same old MCTS + NNS work just fine? If the latter, is there anything special you have to do to handle these different gameplay elements?

  1. How good is the best bot you've trained? And how do you know how good it is?

Thank you!

Would you share model.pt ?

I have implemented token exchange (all 406 ways except gold return) based on this repository and modified the environment to be more similar to the actual Splendor.
https://github.com/kuboyoo/alpha-zero-general

I would like to compare the strength of the model (model.onnx {cpuct=1.0, fpu=0.1, numMCTSSims=6400}) in this repository with my model re-trained in the modified environment.
Would you be willing to share the .pt file before conversion to .onnx?

Errors running/retraining Splendor using commands from tutorial

I tried to play Splendor using the command from the tutorial (I first changed the package imports):

python ./pit.py splendor/pretrained_2players.pt human -n 1

But I got this following error:

File "D:\programs\Python\Python311\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

So I figured maybe it's due to the mentioned issue "Ongoing code/features rework, some pretrained networks won't work anymore". So I reverted to the version of 30/1/2024, without avail. Then I decided to first run the training myself, using the example from the tutorial (I had to add the -V 85 though, otherwise it complained about version 1 not existing):

python main.py -m 800 -e 1000 -i 5 -F -c 2.5 -f 0.1 -T 10 -b 32 -l 0.0003 -p 1 -D 0.3 -C ../results/mytest -V 85

But now I got the following error:

  File "D:\programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'SplendorNNet' object has no attribute 'first_layer'

ONNXRuntimeErrors trying to run or train Splendor

Heya, thank you for your awesome additions to alpha-zero!

I tried to run your code, but unfortunately ran into some errors that look similar to the ones in #3.
First, updated all the dependencies:

pip3 install -U onnxruntime numba tqdm colorama coloredlogs
pip3 install -U torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu

In particular, I'm using:

colorama             0.4.6
coloredlogs          15.0.1
numba                0.59.1
onnxruntime          1.17.3
torch                2.3.0+cpu
torchvision          0.18.0+cpu
tqdm                 4.66.4

And then tried the commands from the readme:

python ./pit.py splendor splendor/pretrained_2players.pt human -n 1

which still printed the initial game-board, but then threw:

Error log
Turn 1 Player 0: Traceback (most recent call last):
  File "D:\alpha-zero-general\pit.py", line 252, in <module>
    main()
  File "D:\alpha-zero-general\pit.py", line 246, in main
    play(args)
  File "D:\alpha-zero-general\pit.py", line 71, in play
    result = arena.playGames(args.num_games, initial_state=args.state, verbose=args.display or human)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 123, in playGames
    gameResult = self.playGame(verbose=verbose, initial_state=initial_state, other_way=not one_vs_two)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\Arena.py", line 74, in playGame
    action = players[curPlayer](canonical_board, it)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\pit.py", line 59, in <lambda>
    player = lambda x, n: np.argmax(mcts.getActionProb(x, temp=(0.5 if n <= 6 else 0.), force_full_search=True)[0])
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\MCTS.py", line 65, in getActionProb
    self.search(canonicalBoard, dirichlet_noise=dir_noise, forced_playouts=forced_playouts)
  File "D:\alpha-zero-general\MCTS.py", line 144, in search
    Ps, v = self.nnet.predict(canonicalBoard, Vs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 100, in predict
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2
And running:
python main.py splendor -m 800 -f 0.1 -l 0.0003 -D 0.3 -C ../results/mytest -V 74

Yielded:

Error log
Traceback (most recent call last):
  File "C:\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "C:\Python\Python312\Lib\threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 142, in predict_server
    self.switch_target('inference')
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 290, in switch_target
    self.export_and_load_onnx()
  File "D:\alpha-zero-general\GenericNNetWrapper.py", line 338, in export_and_load_onnx
    self.ort_session = ort.InferenceSession(temporary_file, sess_options=opts, providers=['CPUExecutionProvider'])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (MatMulBnFusion_Gemm) Op (Gemm) [ShapeInferenceError] First input does not have rank 2

Unrelated, but I think in the readme, line 108, the game-argument splendor is missing.

Usage of OneCycleLR restarting at every iteration

Hi, I have recently stumbled upon this repository and am going through the code to better understand Alpha Zero.

One weird thing I noticed is the creation of the OneCycleLR scheduler each time the training function of the model is called. Since it happens at every iteration, the learning rate probably ends up very bumpy. The scheduler was created with supervised learning in mind, where the training process is more straightforward.

At the same time, the models seems to learn to play very well, so that cannot be so bad.

Do you have any insights on why it works? Or maybe you have a graph of the learning rates throughout training to illustrate what happens?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.