Git Product home page Git Product logo

elf's Introduction

Copyright ยฉ 2018-present, Facebook, Inc. โ€” all rights reserved.

ELF

ELF is an Extensive, Lightweight, and Flexible platform for game research. We have used it to build our Go playing bot, ELF OpenGo, which achieved a 14-0 record versus four global top-30 players in April 2018. The final score is 20-0 (each professional Go player plays 5 games).

Please refer to our website for a full overview of ELF OpenGo-related resources, including pretrained models, numerous datasets, and a comprehensive visualization of human Go games throughout history leveraging ELF OpenGo's analysis capabilities.

This version is a successor to the original ELF platform.

DISCLAIMER: this code is early research code. What this means is:

  • It may not work reliably (or at all) on your system.
  • The code quality and documentation are quite lacking, and much of the code might still feel "in-progress".
  • There are quite a few hacks made specifically for our systems and infrastructure.

build

License

ELF is released under the BSD-style licence found in the LICENSE file.

Citing ELF

If you use ELF in your research, please consider citing the original NIPS paper as follows:

@inproceedings{tian2017elf,
  author = {Yuandong Tian and Qucheng Gong and Wenling Shang and Yuxin Wu and C. Lawrence Zitnick},
  title = {ELF: An extensive, lightweight and flexible research platform for real-time strategy games},
  booktitle = {Advances in Neural Information Processing Systems},
  pages = {2656--2666},
  year = {2017}
}

If you use ELF OpenGo or OpenGo-like functionality, please consider citing the technical report as follows:

@inproceedings{tian2019opengo,
  author    = {Yuandong Tian and
               Jerry Ma and
               Qucheng Gong and
               Shubho Sengupta and
               Zhuoyuan Chen and
               James Pinkerton and
               Larry Zitnick},
  title     = {{ELF} OpenGo: an analysis and open reimplementation of AlphaZero},
  booktitle = {Proceedings of the 36th International Conference on Machine Learning,
               {ICML} 2019, 9-15 June 2019, Long Beach, California, {USA}},
  pages     = {6244--6253},
  year      = {2019},
  url       = {http://proceedings.mlr.press/v97/tian19a.html}
}

* Jerry Ma, Qucheng Gong, and Shubho Sengupta contributed equally.

** We also thank Yuxin Wu for his help on this project.

Dependencies

We run ELF using:

  • Ubuntu 18.04
  • Python 3.7
  • GCC 7.3
  • CUDA 10.0
  • CUDNN 7.3
  • NCCL 2.1.2

At the moment, this is the only supported environment. Other environments may also work, but we unfortunately do not have the manpower to investigate compatibility issues.

Here are the dependency installation commands for Ubuntu 18.04 and conda:

sudo apt-get install cmake g++ gcc libboost-all-dev libzmq3-dev
conda install numpy zeromq pyzmq

# From the project root
git submodule sync && git submodule update --init --recursive

You also need to install PyTorch 1.0.0 or later:

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

A Dockerfile has been provided if you wish to build ELF using Docker.

Building

cd to the project root and run make to build.

Testing

After building, cd to the project root and run make test to test.

Using ELF

Currently, ELF must be run straight from source. You'll need to run source scripts/devmode_set_pythonpath.sh to augment $PYTHONPATH appropriately.

Training a Go bot

To train a model, please follow these steps:

  1. Build ELF and run source scripts/devmode_set_pythonpath.sh as described above.
  2. Change directory to scripts/elfgames/go/
  3. Edit server_addrs.py to specify the server's IP address. This is the machine that will train the neural network.
  4. Create the directory where the server will write the model directory. This defaults to myserver
  5. Run start_server.sh to start the server. We have tested this on a machine with 8 GPUs.
  6. Run start_client.sh to start the clients. The clients should be able to read the model written by the server, so the clients and the server need to mount the same directory via NFS. We have tested this on 2000 clients, each running exclusively on one GPU.

Running a Go bot

Here is a basic set of commands to run and play the bot via the GTP protocol:

  1. Build ELF and run source scripts/devmode_set_pythonpath.sh as described above.
  2. Train a model, or grab a pretrained model.
  3. Change directory to scripts/elfgames/go/
  4. Run ./gtp.sh path/to/modelfile.bin --verbose --gpu 0 --num_block 20 --dim 256 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

We've found that the above settings work well for playing the bot. You may change mcts_rollout_per_thread to tune the thinking time per move.

After the environment is set up and the model is loaded, you can start to type gtp commands to get the response from the engine.

Analysis mode

Here is the command to analyze an existing sgf file:

  1. Build ELF and run source scripts/devmode_set_pythonpath.sh as described above.
  2. Train a model, or grab a pretrained model.
  3. Change directory to scripts/elfgames/go/
  4. Run ./analysis.sh /path/to/model --preload_sgf /path/to/sgf --preload_sgf_move_to [move_number] --dump_record_prefix [tree] --verbose --gpu 0 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.0 --mcts_virtual_loss 1 --num_games 1

The settings for rollouts are similar as above. The process should run automatically after loading the environment, models and previous moves. You should see the move suggested by the AI after each move, along with its value and prior. This process will also generate a lot of tree files, prefixed with tree (you can change it with --dump_record_prefix option above.) The tree files will contain the full search at each move along with its prior and value. To abort the process simply kill it as the current implementation will run it to the end of the game.

Ladder tests

We provide a collection of just over 100 ladder scenarios in the ladder_suite/ directory.

elf's People

Contributors

jma127 avatar ppwwyyxx avatar soumith avatar yuandong-tian avatar zchen0211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elf's Issues

cannot make a move with GeForce GTX 650

Trying to run ./gtp.sh ./v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 512 --resign_thres 0.05 --mcts_virtual_loss 1
under supported environment I try to play with command:
genmove B
and get this error:
... /root/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py:116: UserWarning: Found GPU0 GeForce GTX 650 which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. ... THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch-nightly_1525389156111/work/aten/src/THCUNN/generic/Threshold.cu line=34 error=48 : no kernel image is available for execution on the device Traceback (most recent call last): File "df_console.py", line 78, in <module> GC.run() File "/root/ELF/src_py/elf/utils_elf.py", line 435, in run self._call(smem, *args, **kwargs) File "/root/ELF/src_py/elf/utils_elf.py", line 398, in _call reply = self._cb[idx](picked, *args, **kwargs) File "df_console.py", line 60, in actor return console.actor(batch) File "/root/ELF/scripts/elfgames/go/console_lib.py", line 302, in actor reply = self.evaluator.actor(batch) File "/root/ELF/src_py/rlpytorch/trainer/trainer.py", line 97, in actor state_curr = m.forward(batch) File "/root/ELF/src_py/elfgames/go/df_model3.py", line 274, in forward s = self.init_conv(s) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 46, in forward return F.threshold(input, self.threshold, self.value, self.inplace) File "/root/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 603, in threshold return torch._C._nn.threshold(input, threshold, value) RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch-nightly_1525389156111/work/aten/src/THCUNN/generic/Threshold.cu:34

There is no way to run ELF with CUDA capability 3.0? Which compute capability is enough - 3.5, 5.2, 6.1, 7.0? It would be nice to mention this requirements in prerequisites.

which config used to defeat leela zero?

Hi, i have successfully ran the opengo bot by the following command which use 4096*2=8192 rollouts:

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 4096 --resign_thres 0.05 --mcts_virtual_loss 1

On the other hand, i also started a leela bot with model 158603eb with the following command which also use 8192 rollouts:
src/leelaz -p 8192 -v 0 -r 5 --timemanage off -t 1 -d --noponder --gpu 1 --weights 158603eb61a1e5e9dcd1aee157d813063292ae68fbc8fcd24502ae7daf4d7948 --gtp

I put all the 2 bots on my private cgos and found opengo cannot defeat leela zero in recent 3 games, which is so weird for me because the author of leela zero claimed that leela-elf(leela bot which use elf weights) beated leela-zero by 167:18

leelaz-18e6 v leelaz-elf (185/1000 games)
board size: 19   komi: 7.5
              wins              black         white       avg cpu
leelaz-18e6     18  9.73%       9   9.68%     9   9.78%     81.09
leelaz-elf     167 90.27%       83 90.22%     84 90.32%    127.66

So i am curious if my config of opengo is wrong? how to set the config so as to defeat leela zero?

--mcts_puct tuning

How did you tune the --mcts_puct values? Is it true different values are used for generating self-play games for training vs match play?

I think self-play for training uses --mcts_puct 0.85

--mcts_puct 0.85 --mcts_rollout_per_thread 200 \

And match play uses --mcts_puct 1.50
https://github.com/pytorch/ELF/blob/a4edc96e8bf94aa1a84134431ce3758a6ade27c7/README.rst#running-a-go-bot

Edit: BTW I think this is the relevant part of the AGZ paper:

AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimisation. In AlphaZero we reuse the same hyper-parameters for all games without game-specific tuning."

It doesn't really clarify if this tuning is done for self-play only, or something more expensive involving the entire training feedback loop.

Unexpected key(s) in state_dict: "init_conv.1.num_batches_tracked"

I have successfully install PyTorch (python -c "import torch" ).
The make and make test run successfully.
I have run the path fixer (source scripts/devmode_set_pythonpath.sh)

echo $PYTHONPATH
$HOME/src/elf/src_py/:$HOME/src/elf/build/elf/:$HOME/src/elf/build/elfgames/go/

But when I try to run the gtp.sh command (after downloading the pretrained model):

Traceback (most recent call last):
  File "df_console.py", line 40, in <module>
    model = model_loader.load_model(GC.params)
  File "$HOME/src/elf/src_py/rlpytorch/model_loader.py", line 164, in load_model
    check_loaded_options=self.options.check_loaded_options)
  File "$HOME/src/elf/src_py/rlpytorch/model_base.py", line 139, in load
    self.load_state_dict(sd)
  File "$HOME/src/elf/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model_PolicyValue:
	Unexpected key(s) in state_dict: "init_conv.1.num_batches_tracked", "pi_final_conv.1.num_batches_tracked", "value_final_conv.1.num_batches_tracked", "resnet.resnet.0.conv_lower.1.num_batches_tracked", "resnet.resnet.0.conv_upper.1.num_batches_tracked", "resnet.resnet.1.conv_lower.1.num_batches_tracked", "resnet.resnet.1.conv_upper.1.num_batches_tracked", "resnet.resnet.2.conv_lower.1.num_batches_tracked", "resnet.resnet.2.conv_upper.1.num_batches_tracked", "resnet.resnet.3.conv_lower.1.num_batches_tracked", "resnet.resnet.3.conv_upper.1.num_batches_tracked", "resnet.resnet.4.conv_lower.1.num_batches_tracked", "resnet.resnet.4.conv_upper.1.num_batches_tracked", "resnet.resnet.5.conv_lower.1.num_batches_tracked", "resnet.resnet.5.conv_upper.1.num_batches_tracked", "resnet.resnet.6.conv_lower.1.num_batches_tracked", "resnet.resnet.6.conv_upper.1.num_batches_tracked", "resnet.resnet.7.conv_lower.1.num_batches_tracked", "resnet.resnet.7.conv_upper.1.num_batches_tracked", "resnet.resnet.8.conv_lower.1.num_batches_tracked", "resnet.resnet.8.conv_upper.1.num_batches_tracked", "resnet.resnet.9.conv_lower.1.num_batches_tracked", "resnet.resnet.9.conv_upper.1.num_batches_tracked", "resnet.resnet.10.conv_lower.1.num_batches_tracked", "resnet.resnet.10.conv_upper.1.num_batches_tracked", "resnet.resnet.11.conv_lower.1.num_batches_tracked", "resnet.resnet.11.conv_upper.1.num_batches_tracked", "resnet.resnet.12.conv_lower.1.num_batches_tracked", "resnet.resnet.12.conv_upper.1.num_batches_tracked", "resnet.resnet.13.conv_lower.1.num_batches_tracked", "resnet.resnet.13.conv_upper.1.num_batches_tracked", "resnet.resnet.14.conv_lower.1.num_batches_tracked", "resnet.resnet.14.conv_upper.1.num_batches_tracked", "resnet.resnet.15.conv_lower.1.num_batches_tracked", "resnet.resnet.15.conv_upper.1.num_batches_tracked", "resnet.resnet.16.conv_lower.1.num_batches_tracked", "resnet.resnet.16.conv_upper.1.num_batches_tracked", "resnet.resnet.17.conv_lower.1.num_batches_tracked", "resnet.resnet.17.conv_upper.1.num_batches_tracked", "resnet.resnet.18.conv_lower.1.num_batches_tracked", "resnet.resnet.18.conv_upper.1.num_batches_tracked", "resnet.resnet.19.conv_lower.1.num_batches_tracked", "resnet.resnet.19.conv_upper.1.num_batches_tracked".

I tried redownloading the model with not effect. I am up to date. Could it be linked to using pytorch in a virtualenv ?

runtime error

Hello, I successfully compiled the source code on Ubuntu 18.04, but
got an error message. Am I missing something?

Traceback (most recent call last):
File "df_console.py", line 12, in
from rlpytorch import Evaluator, load_env
File "/home/jehee/ELF/src_py/rlpytorch/init.py", line 8, in
from .model_loader import ModelLoader, load_env
File "/home/jehee/ELF/src_py/rlpytorch/model_loader.py", line 13, in
from elf.options import import_options, PyOptionSpec
File "/home/jehee/ELF/src_py/elf/init.py", line 8, in
from _elf import *
ImportError: dynamic module does not define module export function (PyInit__elf)

Facebook has changed Go forever

Okay so I'm trying to think this out logically. Facebook releases ELF OpenGo, instantly becoming by far the strongest public open source /weights Go AI program, overnight gets adopted as the baseline by many other programs.... so going forward most programs are going to be more or less the same strength if not simply identical whitelabel altogether, so what is left? Seems to be the "engine" part is now more or less solved and even I would say undifferentiatable. superhuman on a gtx 970, is basically the end of the road.

The remaining things of innovation are GUI, analysis, high handicap, different komi, teaching tools, etc etc

And marketing/branding/mindshare/PR for the Go bots (including Leela Zero) will perhaps be more important than ever before. We gonna see a consolidation of Go AI bots and my guess is only one or two will survive this, and that is if they are lucky.

Some immediate implications is that essentially its killed commerical Go at least from the standpoint of selling engines go. We can't compare to Chess because not only does Chess have an order of magnitude larger userbase esp in the West, Chess also enjoyed a good two decades whereby classical algorithms and programming made it such that there was a healthy ecosystem of different engines completing with one another for top listings. WIth the advent of the "zero" method, all zero programs converge to the same ultimate state and its just a matter of compute. There is really nothing left to do. More or less.

This also means there is little to no more point in having Go AI engine competition and matches. We already see cgos is defunt and its benchmark is less and less useful, UEC cup ended, Zen pulled the plug and called it quits, I seriously doubt we'll see another version or edition of CrazyStone, and now with so many engines adopting the facebook weights, whats the point? I see this as portending the demise of Go AI competitions and engine vs engine games as well. Think about it, LZ had beat DolBaram in that last competition match, now DolBaram adopts ELF weights, and ELF is stronger than both Pheonix and FineArt... it doesn't take much to put two and two together and see where this is headed... Didn't Golaxy just beat Ke Jie last week? Ill bet that was the shortest triump ever. And whatever aire of exclusitivity that FineArt enjoyed prior to the facebook event has now been obliterated, top pro in China no longer need to use FineArt to get competitive advantage in training when everyone in the world on half a decent graphics card can now run the same or better. The implications are indeed far fetching.

Lets examine the distributed community based crowd computing aspect angle. It took the public six months to get LZ to top pro level from scratch and yet facebook only needed two weeks and argueably far surpassed top pro levels and went deep into superhuman arena. Not that I know it is going to happen, but there is nothing to prevent facebook from doing it again, say another couple months down the road it can sudden drop a new weight that will be the new state of the art and far surpassing anything any community effort could have hoped to come up with within that allocation of time. Who knows maybe Google will see all this and publish the AGZ weights, or maybe in another few months the second round of weights that facebook puts out will far suprass AGZ altogether! In light of the recent developments these are all realistic possibilities now! But none of these possibilities foster morale for community initatives.

I'm thankful that prior to facebook dropping ELF onto the world, that LZ already reached and imho surpassed top pro level on its last/final network 131, (I see 132 just came out hours after the Haylee game 2 and is 60% stronger!) and that LZ project was able to convert ELF weights into native LZ format so that it can be used just like any other weightfile and now its even working great in Lizzie.

I hope that Leela Zero project finds a way to position itself to best take advantage of this new and changing landscape. By far it enjoys the most mindshare in the community of Go at large right now and I hope it continues to evolve and find ways of remaining relevant and bringing value to people's lives.

When will it start to train?

After 29000 games, it doesn't start to train.

=== Record Stats (0) ====
B/W/A: 14683/14317/29000 (50.631%). B #Resign: 11904 (41.0483%), W #Resign: 11770 (40.5862%), #NoResign: 5326 (18.3655%)
Dynamic resign threshold: 0.01
Move: [0, 100): 0, [100, 200): 0, [200, 300): 0, [300, up): 29000
=== End Record Stats ====

The script of server is like:

save=./myserver game=elfgames.go.game model=df_kl model_file=elfgames.go.df_model3
stdbuf -o 0 -e 0 python -u ./train.py
--mode train --batchsize 2048
--num_games 64 --keys_in_reply V
--T 1 --use_data_parallel
--num_minibatch 1000 --num_episode 1000000
--mcts_threads 8 --mcts_rollout_per_thread 100
--keep_prev_selfplay --keep_prev_selfplay
--use_mcts --use_mcts_ai2
--mcts_persistent_tree --mcts_use_prior
--mcts_virtual_loss 5 --mcts_epsilon 0.25
--mcts_alpha 0.03 --mcts_puct 0.85
--resign_thres 0.01 --gpu 0
--server_id myserver --eval_num_games 400
--eval_winrate_thres 0.55 --port 1234
--q_min_size 200 --q_max_size 4000
--save_first
--num_block 5 --dim 64
--weight_decay 0.0002 --opt_method sgd
--bn_momentum=0 --num_cooldown=50
--expected_num_client 496
--selfplay_init_num 0 --selfplay_update_num 0
--eval_num_games 0 --selfplay_async
--lr 0.01 --momentum 0.9 1>> log.log 2>&1 &

I want to know when will it start to train?

ELF?

ELF is already a thing (binary format), and the acronym has nothing to do with the use.

Not trying to offend, just asking for a justification

Multi-GPU for training on the server side?

Thanks for releasing Open Go! I was just wondering if the server could support training of a model with multiple GPUs. It appears from start_server.sh that 8 threads are supported but there is only one gpu specified in the command line options.

How to read the source code

emmmm. Actually, I wanna know where can I start to read the source code. Can someone give me some intuition or a guide?

time_settings and time_left support; and playout limit?

i test the version, and only support the flow gtp commands:
`list_commands

= boardsize
clear_board
exit
final_score
genmove
komi
list_commands
name
play
protocol_version
quit
showboard
version`

is any plan add time control commands, or playout limt parameter?
thanks

Unused parameter fails build

Hi all, I'm trying to build ELF and I got this :

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:24: error: unused parameter 's' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                       ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:35: error: unused parameter 'next_move_number' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                                  ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:65:69: error: unused parameter 'moves' [-Werror,-Wunused-parameter]
  moves_since(const S& s, size_t* next_move_number, std::vector<A>* moves) {
                                                                    ^

ELF/src_cpp/elf/ai/tree_search/tree_search_base.h:87:45: error: unused parameter 'a' [-Werror,-Wunused-parameter]
  static std::string to_string(const Actor& a) {

Thanks for the help. And good luck for the dev

Something may be lost in CMakelists.txt

When I cd to the project root and run maketo build, an error occurs and CMakeError.log is as follows,

CMakeFiles/cmTC_d0017.dir/CheckSymbolExists.c.o: In function `main':
CheckSymbolExists.c:(.text+0x16): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status

Then I add -lpthread in set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -lpthread -Werror -Wextra -Wno-register -fPIC -march=native") to fix it.

The only difference is my ubuntu 16.04 and cudnn 7.1.

Hangs when run Go bot and "genmove b"

Linux: 16.04
Python: 3.6
GCC: 7.3
GPU: No
pytorch: pytorch-nightly
conda: anaconda

I compile an executable, and run the Go bot.

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

I seems like successful.
And I enter "list_commands", it returns value correctly.

clear_board
exit
final_score
genmove
komi
list_commands
name
play
protocol_version
quit
showboard
version

But after I enter "genmove b", the go bot hangs, no response.
What's wrong!? What can I do?

ModuleNotFoundError: No module named 'rlpytorch'

I installed the latest version of Miniconda for python 3.6. Then I followed all steps in the readme including installing pytorch from source. I'm running Ubuntu 18.04, fresh install. I got the following error when I tried to run the program with the following command:

./gtp.sh network.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
ModuleNotFoundError: No module named 'rlpytorch'

can you do shogi?

how about shogi next?

We need a OpenChess
OponShogi

and there is a chinese game called five star chess or something like that I can't recall

thx

ImportError: dynamic module does not define module export function (PyInit__elf)

EDIT: Spoke quickly, I ran make clean && make and this issue was gone.

I have successfully install PyTorch (python -c "import torch" ).
The make and make test run successfully.
I have run the path fixer (source scripts/devmode_set_pythonpath.sh)

echo $PYTHONPATH
$HOME/src/elf/src_py/:$HOME/src/elf/build/elf/:$HOME/src/elf/build/elfgames/go/

But when I try to run the gtp.sh command (after downloading the pretrained model):

./gtp.sh ~/src/elf/pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
  File "$HOME/src/elf/src_py/rlpytorch/__init__.py", line 8, in <module>
    from .model_loader import ModelLoader, load_env
  File "$HOME/src/elf/src_py/rlpytorch/model_loader.py", line 13, in <module>
    from elf.options import import_options, PyOptionSpec
  File "$HOME/src/elf/src_py/elf/__init__.py", line 8, in <module>
    from _elf import *
ImportError: dynamic module does not define module export function (PyInit__elf)

Any ideas of what might be the cause ?

Strange move of Open Go

I compile the Open Go, and use the weight provided in this repo. But when I test the strength of the program, it plays some strange moves, I am not sure if this is his style, but I think it is not the optimal move in such situation.
BTW, I test it with 6400 playouts.
qq 20180503235412

Please do a 400 game match Pheonix Go vs ELF OpenGo

I looked at the cgos chart, Cronus stopped playing in March whilst Monroe was in April. The two never played one another. So I don't believe that Cronus is stronger than the current ELF since some tests have already show that ELF seems to be stronger.

Can your team do a 400 game match between Pheonix Go vs ELF and publish the results?
Tencent who made the Cronus/Pheonix stated that in their next AI competition they prohibit ELF from playing. I think it is because they don't want to see ELF win.

Possibility of a 40B ELF Go?

Firstly thanks to ELF Go team for your great work.

Any chance, in the future, of continueing ELF Go upto maybe 40 blocks?

(considering how successful and popular it has been so far.)

Training failed with AtrributeError

Hi, i have deployed the elf go in my 2 gpu machine, and try to train a Go bot(start server in one gpu and start client in another), the client successfully did the selfplay and send the record to server, but the server failed to train with the following error:

Traceback (most recent call last): File "./train.py", line 131, in <module> runner.run() File "/root/maxim/ELF/src_py/rlpytorch/runner/single_process.py", line 113, in run self.GC.printSummary() File "/root/maxim/ELF/src_py/elf/utils_elf.py", line 463, in printSummary self.GC.printSummary() AttributeError: '_elfgames_go.GameContext' object has no attribute 'printSummary'

when i investigate the code, i found the GCWrapper defined in utils_elf.py finally called the GameContext in c++ code, but there is no method whose name is printSummary in src_cpp.

i hope the developers can check if this is a bug or some other reason? thank you very much

Segmentation fault (core dumped) on Unbuntu 17.10

I compiled PyTorch and ELF go on a Ubuntu 17.10 machine, and when I am trying to launch the program I got this error:

./gtp.sh: line 18: 24480 Segmentation fault      (core dumped) game=elfgames.go.game model=df_pred model_file=elfgames.go.df_model3 python3 df_console.py --mode online --keys_in_reply V rv --use_mcts --mcts_verbose_time --mcts_use_prior --mcts_persistent_tree --load $MODEL --server_addr localhost --port 1234 --replace_prefix resnet.module,resnet --no_check_loaded_options --no_parameter_print --leaky_relu "$@"

I test PyTorch, and it works fine. So any idea where the error comes from?

Answer some questions about batchsize.

Recently we have seen a lot of questions about ELF OpenGo in many forums. Here I try to answer some of them here.

Chinese version here

First, we sincerely thank LeelaZero team to convert our pre-trained v0 model to LeelaZero-compatible format, so that the Go community can verify its strength immediately via LeelaZero, by interactively playing with it. This shows that our experiments are reproducible and could truly benefit the community. We are truly happy with it.

One issue that LeelaZero team found is that OpenGo-v0 might not perform that well when the number of rollouts is small (e.g., 800 or 1600). This is because we use batching in MCTS: the network only receives a batch of rollouts (e.g., 8 or 16) before feed-forwarding. This substantially improves GPU efficiency (in M40 it is like 5.5s -> 1.2s per 1600 rollout), at the price of weakening the strength of the bot, in particular when the number of rollouts is small. This is because MCTS is intrinsically a sequential algorithm, and to maximize its strength, each rollout should be played after all the previous rollouts have been played and the Q values in each node have been updated. On the other hand, batching introduces parallel evaluation and reduces the effective number or total rollouts.

The solution is obviously simple: to reduce the batchsize when the number of rollouts is small. We suggest using batchsize=4 when total number of rollouts are 800 or 1600, which could make the thinking time longer. The default setting batchsize=16 is good only when the total number of rollouts are large (e.g., 80k). Note that larger batchsize might not help. The batchsize can be modified by switches --mcts_rollout_per_batch and --batchsize. Currently please just specify the same number for both switches (this is research code, so you know it).

image

Some people might wonder in our setting what happens for self-play. Indeed, there seems to be a dilemma if we only use 1.6k rollouts for self-play: small batchsize leads to GPU inefficiency, while large batchsize weakens the move. We solve it with ELF-specific design. For a selfplay process we spawn 32 concurrent games and a maximal batchsize of 128. Each concurrent game runs its own MCTS without any batching. When the rollout reaches the leaf, it sends the current game situations to ELF, and ELF dynamically batches game situations from multiple games together and hands the batch to PyTorch for network forwarding. This makes the batchsize a variable. During selfplay, the average batch size is around 90, which is good for overall GPU utility.

ImportError: dynamic module does not define module export function (PyInit__elf)

make test
(cd build/elf && GTEST_COLOR=1 ctest --output-on-failure)
Test project /mnt/ken-volume/ai/ELF/build/elf
Start 1: test_cpp_elf_options_OptionMapTest
1/2 Test #1: test_cpp_elf_options_OptionMapTest .... Passed 0.01 sec
Start 2: test_cpp_elf_options_OptionSpecTest
2/2 Test #2: test_cpp_elf_options_OptionSpecTest ... Passed 0.00 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) = 0.02 sec
(cd build/elfgames/go && GTEST_COLOR=1 ctest --output-on-failure)
Test project /mnt/ken-volume/ai/ELF/build/elfgames/go
Start 1: test_cpp_elfgames_go_base_coord_test
1/6 Test #1: test_cpp_elfgames_go_base_coord_test ........... Passed 0.01 sec
Start 2: test_cpp_elfgames_go_base_go_test
2/6 Test #2: test_cpp_elfgames_go_base_go_test .............. Passed 0.00 sec
Start 3: test_cpp_elfgames_go_base_board_feature_test
3/6 Test #3: test_cpp_elfgames_go_base_board_feature_test ... Passed 0.00 sec
Start 4: test_cpp_elfgames_go_base_symmetry_test
4/6 Test #4: test_cpp_elfgames_go_base_symmetry_test ........ Passed 0.01 sec
Start 5: test_cpp_elfgames_go_sgf_sgf_test
5/6 Test #5: test_cpp_elfgames_go_sgf_sgf_test .............. Passed 0.01 sec
Start 6: test_cpp_elfgames_go_mcts_mcts_test
6/6 Test #6: test_cpp_elfgames_go_mcts_mcts_test ............ Passed 0.01 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) = 0.05 sec
ken@ken-server1:/XXX/ELF$source scripts/devmode_set_pythonpath.sh
ken@ken-server1:/XXX/ELF$ cd scripts/elfgames/go/
ken@ken-server1:/XXX/ELF/scripts/elfgames/go$ ./gtp.sh /mnt/ken-volume/ai/ELF/pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1
Traceback (most recent call last):
File "df_console.py", line 12, in
from rlpytorch import Evaluator, load_env
File "/mnt/ken-volume/ai/ELF/src_py/rlpytorch/init.py", line 8, in
from .model_loader import ModelLoader, load_env
File "/mnt/ken-volume/ai/ELF/src_py/rlpytorch/model_loader.py", line 13, in
from elf.options import import_options, PyOptionSpec
File "/mnt/ken-volume/ai/ELF/src_py/elf/init.py", line 11, in
from .context_utils import ContextArgs
File "/mnt/ken-volume/ai/ELF/src_py/elf/context_utils.py", line 7, in
from elf.options import auto_import_options, PyOptionSpec
File "/mnt/ken-volume/ai/ELF/src_py/elf/options/init.py", line 8, in
from .py_option_map import PyOptionMap
File "/mnt/ken-volume/ai/ELF/src_py/elf/options/py_option_map.py", line 10, in
from _elf import _options
ImportError: dynamic module does not define module export function (PyInit__elf)

hung after running a go bot

hi, i have built the elf and sucessfully run the following command:

./gtp.sh ./pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1

but i found the process hung forever with the output:

[2018-05-03 15:28:29.535] [rlpytorch.model_loader.load_env] [info] Loading env
[2018-05-03 15:28:29.550] [rlpytorch.model_loader.load_env] [info] Parsed options: {'T': 1,
 'actor_only': False,
 'adam_eps': 0.001,
 'additional_labels': ['aug_code', 'move_idx'],
...
 'white_puct': -1.0,
 'white_use_policy_network_only': False}
[2018-05-03 15:28:29.551] [rlpytorch.model_loader.load_env] [info] Finished loading env
Wait all games[1] to register their mailbox

what's wrong with it? how can i play with the bot by gtp command?

Chess version?

@yuandong-tian @jma127 can you please tell us if you are thinking about adapting ELF to a chess version (ELF OpenChess?).

As you probably know there is already an adaptation of LeelaZero for chess named "LeelaChess Zero (LCZero)": https://github.com/glinscott/leela-chess (http://lczero.org). They are doing great advances (https://docs.google.com/spreadsheets/d/1zcXqNzLNBT8RjTHO_AppL6WN0j8TGmOIh6osLPmaB6E/edit#gid=0) but they probably have no enough machine power to reach SF until three months (or more). It would be great if you could reach fastly (one or two weeks) the milestone DeepMind got defeating StockFish.

Successfully Installed and played, but can not play with GoGui

Thank the facebook go team, and now any one can play GO with a top player.

I have successfully installed ELF and played with the pretrained network, strictly following the instructions here. But I have some further questions:

  1. CuDnn 7.0 is required in the building instruction, but the prebuilt pytorch-nightly was built upone CuDnn 7.1, why?

  2. The Bot played well in the bash command shell, but when I use GoGui 1.4.9 as the graphic interface, following messages appeared:
    "Text lines before the status character of the first reponse line are not allowed by the GTP standard"
    "The Go program is not responding to the command 'name'."
    and the program was stuck there keeping prompt these error messages.
    any one can give help me?
    Does the program's debug information cause this problem?

thanks a lot

Module Not Found Error: No module named '_elf'

When I run the gtp.sh script I get the following error:

./gtp.sh ../../../pretrained-go-19x19-v0.bin --verbose --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.50 --batchsize 16 --mcts_rollout_per_batch 16 --mcts_threads 2 --mcts_rollout_per_thread 8192 --resign_thres 0.05 --mcts_virtual_loss 1
Traceback (most recent call last):
  File "df_console.py", line 12, in <module>
    from rlpytorch import Evaluator, load_env
  File "/home/farhad/drive/research/ELF/src_py/rlpytorch/__init__.py", line 8, in <module>
    from .model_loader import ModelLoader, load_env
  File "/home/farhad/drive/research/ELF/src_py/rlpytorch/model_loader.py", line 13, in <module>
    from elf.options import import_options, PyOptionSpec
  File "/home/farhad/drive/research/ELF/src_py/elf/__init__.py", line 11, in <module>
    from .context_utils import ContextArgs
  File "/home/farhad/drive/research/ELF/src_py/elf/context_utils.py", line 7, in <module>
    from elf.options import auto_import_options, PyOptionSpec
  File "/home/farhad/drive/research/ELF/src_py/elf/options/__init__.py", line 8, in <module>
    from .py_option_map import PyOptionMap
  File "/home/farhad/drive/research/ELF/src_py/elf/options/py_option_map.py", line 10, in <module>
    from _elf import _options
ModuleNotFoundError: No module named '_elf'

Some questions about training a bot

I change the network to 64*5 and start 2 clients . One client is running start_server.sh and start_client.sh simultaneously, and another one is running start_client.sh. From the log file I find that there are over 4000 selfplay games after running 40 hours. However, .I cannot find anything about training information. I want to know when will the training start?

AttributeError: Can't get attribute '_rebuild_tensor_v2'

This error happens when trying to read in the weights file using an older version of Pytorch. I assume this is why you say Pytorch needs to be built from source. However I've tried that all evening and I can't find a way to navigate all of the nvcc/gcc/cuda incompatibilities to get it to compile. Many errors, all of which are common when I google, with lots of workarounds, but all of them only partially work. Fundamentally it seems like some sort of std::tuple issue with CUDA/nvcc which Nvidia acknowledges but say they won't fix until the next CUDA release.

Is there any chance you could cave your weight file out into an older PyTorch format? Then I could just install python-pytorch-cuda-0.3.1-2 for my version of linux and be up and running in moments. ELF itself compiled fine, and it runs with python-pytorch-cuda-0.3.1-2... it just can't read the weights file. pytorch/pytorch#5729 states it's because of the newer file format.

Thanks!

What does it mean that won 200 games against LZ on "default settings"?

ELF OpenGo has been successful playing against both other open source bots and human Go players. We played and won 200 games against LeelaZero (158603eb, Apr. 25, 2018), the strongest publicly available bot, using its default settings and no pondering.

Does this mean that only 3200 visits were used? By default settings I'm assuming this is referring to LZ's own match games settings? Can we get more details on the exact specifications and hardware used for both sides for these matches? And when stated that it played and won 200 matches, is that means it won all 200 matches that it played, or that it simply won a total number of 200 matches against an unknown number of total matches played? Can you publish the sgf for these 200 games?

Can you advise as to how strong is the raw network on 1 single playout? (can you release a binary executable for Windows, Linux?)

edit: Has the raw training data of the self played games been published?

The release article mentioned one of the objectives of this openness is to help community projects such as Leela Zero, I see no better way of doing that then releasing the raw played games and allowing LZ to immediately train on them to get stronger. (assuming it was a 200:0 match and on objective parity conditions, and not like the Deepmind AlphaZero vs Stockfish chess shenanangians)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.