Comments (8)
I agree that we only ever look at 0.01% of the SGFs, but the problem is that we don't really know ahead of time which 0.01% we'll look at... so we end up printing out debug info on all of them. Maybe we only really need debug on 1% of them. @amj thoughts?
from minigo.
i'm happy turning off the per-move debug logs and only logging e.g. root-Q, and optionally doing a random.random() < 0.05 check or something to turn them on for only a few percent.
I think realistically, we had them on to check the re-use/c_puct/d-noise effect. At this point, i don't think the individual self-play games are mined through that much. That said, a few hundred GB is not much in the grand scheme of things.
Possibly a better way to approach the problem it is to streamline our process for stripping the comments before hosting the data, or potentially saving two copies: one with and one without the debugging info.
Incidentally, gzipping the files w/ debug info cut the size by ~7/8ths.
from minigo.
I think that the SGFs are actually the data that Go-people are most likely to consume (especially for 19x19s), so making it small will help cut down on our network costs and also make it more user-friendly, since who really wants to DL 100GB of SGFs.
@amj / @brilee -- are you guys actually using the debug info?
from minigo.
Yes, infrequently ;) I think at the point that you find an interesting game, you really want to know what the computer thought. We did all that work already, and bytes are (relatively) cheap, so we might as well store it.
Now bandwidth is less cheap, on all sides of the table, so providing stripped down versions for download makes a lot of sense.
And for the person who wants 100 GBs of SGFs with the full data -- and i'm sure a couple will! -- that'll be possible too.
from minigo.
FWIW LZ handles this by storing the random seed in the sgf file. Games run with 1 thread are reproducible so you can run with debug on afterwards.
from minigo.
LZ also store the training and debug data in two different files, and autogtp upload both file SGF and training data to a database. You could the same: write two files one SGF with and one without comments, store them both and the the user decide what he wants.
from minigo.
I like the solution with two versions of the SGF.
from minigo.
ok i'll set it up to write both versions.
from minigo.
Related Issues (20)
- run concurrent selfplay without bazel HOT 1
- Running minigo with Sabaki GUI HOT 2
- Problem while building tpu-image HOT 3
- Problem in features.stone_features HOT 1
- Onscreen buttons in lw_demo don't toggle (work)
- Minigo not working on Coral accelerator HOT 4
- Add Edge TPU support to C++ engine HOT 1
- Decouple the conv data format from the input feature layout HOT 8
- How strong is the model in kyu/dan? HOT 7
- 000990-cormorant: stderr thread died HOT 1
- Wrong argument passed in minigui/fetch-and-run.sh HOT 1
- How to communicate with engine easily outside stdin HOT 2
- Support for sending board state to the engine via GTP HOT 6
- Looking for 9x9 model files in .minigo file format HOT 7
- Error on Minigo v15(990)
- tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Assign requires shapes of both tensors to match HOT 3
- The setting of num_readouts to get strongest of minigo
- train.sh in cloud tpu
- Minigo training using Coral Dev Board HOT 1
- ./cc/configure_tensorflow.sh HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minigo.