Comments (2)
OK, I managed to get it to train.
TLDR: It seems NN training starts after clients generate q_min_size * num_reader
games. So the solution was to reduce q_min_size
, num_reader
, reduce NN model size and re-combile for 9x9 board to speed up game generation.
NOTE: hyperparameters below are selected to force fairly quick start of training updates on neural network. These parameters are probably useless for anything else than debug.
-
Obviously make sure code base compiles and runs w/o any modifications first. Start server, start client, confirm they connect and generate games. To run server/clients on same machine set
"myserver": "[127.0.0.1]"
inserver_adddr.py
-
Compile code base for 9x9 GO, e.g. add
set(BOARD9x9 TRUE)
inCMakeLists.txt
, then rebuild everything. You should see "Use 9x9 board" appear when compilation starts. -
Change
start_server.sh
as follows:
diff --git a/scripts/elfgames/go/start_server.sh b/scripts/elfgames/go/start_server.sh
index 7f14334..8078bd8 100755
--- a/scripts/elfgames/go/start_server.sh
+++ b/scripts/elfgames/go/start_server.sh
@@ -21,12 +21,12 @@ save=./myserver game=elfgames.go.game model=df_kl model_file=elfgames.go.df_mode
--resign_thres 0.01 --gpu 0 \
--server_id myserver --eval_num_games 400 \
--eval_winrate_thres 0.55 --port 1234 \
- --q_min_size 200 --q_max_size 4000 \
+ --q_min_size 20 --q_max_size 400 --num_reader 4 \
--save_first \
- --num_block 20 --dim 256 \
+ --num_block 2 --dim 16 \
--weight_decay 0.0002 --opt_method sgd \
- --bn_momentum=0 --num_cooldown=50 \
+ --bn_momentum=0 --num_cooldown=2 \
--expected_num_client 496 \
--selfplay_init_num 0 --selfplay_update_num 0 \
--eval_num_games 0 --selfplay_async \
- --lr 0.01 --momentum 0.9 1>> log.log 2>&1 &
+ --lr 0.01 --momentum 0.9
- Change
start_client.sh
as follows:
diff --git a/scripts/elfgames/go/start_client.sh b/scripts/elfgames/go/start_client.sh
index a716443..8bb2437 100755
--- a/scripts/elfgames/go/start_client.sh
+++ b/scripts/elfgames/go/start_client.sh
@@ -11,13 +11,13 @@ echo $PYTHONPATH $SLURMD_NODENAME $CUDA_VISIBLE_DEVICES
root=./myserver game=elfgames.go.game model=df_pred model_file=elfgames.go.df_model3 \
stdbuf -o 0 -e 0 python ./selfplay.py \
--T 1 --batchsize 128 \
- --dim0 256 --dim1 256 --gpu 0 \
+ --dim0 16 --dim1 16 --gpu 0 \
--keys_in_reply V rv --mcts_alpha 0.03 \
--mcts_epsilon 0.25 --mcts_persistent_tree \
--mcts_puct 0.85 --mcts_rollout_per_thread 200 \
--mcts_threads 8 --mcts_use_prior \
--mcts_virtual_loss 5 --mode selfplay \
- --num_block0 20 --num_block1 20 \
+ --num_block0 2 --num_block1 2 \
--num_games 32 --ply_pass_enabled 160 \
--policy_distri_cutoff 30 --policy_distri_training_for_all \
--port 1234 \
- I got it work as follows:
- 1x server:
./start_server.sh
- 6x clients:
./start_client.sh
<- might work with less clients if you're short or RAM - after approximately ~1h server shows
Stats: 159/0/0
and my breakpoint inMCTSPrediction.update()
triggered
- my setup: i9 3.8GHz 6-core, single 2080ti, 48GB of RAM. All topped up.
from elf.
nice
from elf.
Related Issues (20)
- Can't compile in ubunu 18.10, gcc 8.2, cmake 3.12.1 HOT 1
- Is a 40b net run considered ? HOT 1
- will your team accept the match invitation HOT 1
- can elf go run with pytorch v1.1
- feature request: Jetson Nano build & instructions?
- releases: pretrained-go-9x9 HOT 1
- [question] Typical value of resign threshold during training HOT 1
- GoGoD commentary SGFs with an illegal move
- RuntimeError:Device index must not be negetive HOT 2
- missing ./analysis.sh
- WARNING! no servers has the label: game_start HOT 2
- NFS for Clients and Server
- How to use ELF in Sabaki or gogui?
- [elf::comm::Client-4] [warning] WARNING! no servers has the label: game_start
- How to parse SGF files analyzed by ELF GO
- Confusion about the updateEdgeStats()
- [warning] WARNING! no servers has the label: game_start
- Why using cpp?
- Running a Go bot,enter Pdb environment
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elf.