Git Product home page Git Product logo

Comments (8)

brilee avatar brilee commented on July 19, 2024

I agree that we only ever look at 0.01% of the SGFs, but the problem is that we don't really know ahead of time which 0.01% we'll look at... so we end up printing out debug info on all of them. Maybe we only really need debug on 1% of them. @amj thoughts?

from minigo.

amj avatar amj commented on July 19, 2024

i'm happy turning off the per-move debug logs and only logging e.g. root-Q, and optionally doing a random.random() < 0.05 check or something to turn them on for only a few percent.

I think realistically, we had them on to check the re-use/c_puct/d-noise effect. At this point, i don't think the individual self-play games are mined through that much. That said, a few hundred GB is not much in the grand scheme of things.

Possibly a better way to approach the problem it is to streamline our process for stripping the comments before hosting the data, or potentially saving two copies: one with and one without the debugging info.

Incidentally, gzipping the files w/ debug info cut the size by ~7/8ths.

from minigo.

artasparks avatar artasparks commented on July 19, 2024

I think that the SGFs are actually the data that Go-people are most likely to consume (especially for 19x19s), so making it small will help cut down on our network costs and also make it more user-friendly, since who really wants to DL 100GB of SGFs.

@amj / @brilee -- are you guys actually using the debug info?

from minigo.

amj avatar amj commented on July 19, 2024

Yes, infrequently ;) I think at the point that you find an interesting game, you really want to know what the computer thought. We did all that work already, and bytes are (relatively) cheap, so we might as well store it.

Now bandwidth is less cheap, on all sides of the table, so providing stripped down versions for download makes a lot of sense.

And for the person who wants 100 GBs of SGFs with the full data -- and i'm sure a couple will! -- that'll be possible too.

from minigo.

killerducky avatar killerducky commented on July 19, 2024

FWIW LZ handles this by storing the random seed in the sgf file. Games run with 1 thread are reproducible so you can run with debug on afterwards.

from minigo.

marcocalignano avatar marcocalignano commented on July 19, 2024

LZ also store the training and debug data in two different files, and autogtp upload both file SGF and training data to a database. You could the same: write two files one SGF with and one without comments, store them both and the the user decide what he wants.

from minigo.

brilee avatar brilee commented on July 19, 2024

I like the solution with two versions of the SGF.

from minigo.

amj avatar amj commented on July 19, 2024

ok i'll set it up to write both versions.

from minigo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.