Git Product home page Git Product logo

Comments (7)

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024 1

I did it in a bad way (see this line). I should probably switch to the built-in function, but the current code should also work

from slotformer.

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024

Hey, thank you so much for pointing this out. Indeed, there seem to be some issues with the gradient sync in DDP training. Your solution of wrapping the model to perform loss computation within the forward pass sounds very reasonable to me. Have you tried that and did you see e.g. faster convergence speed of a DDP-trained model?

from slotformer.

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024

ok, I believe I have fixed the second issue with this commit. I check it with your function check_sampler_index_consistency and it doesn't show any overlapping indices. Thanks for pointing that out!

I might need some thought before fixing the first issue.

from slotformer.

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024

The second issue is likely fixed in this commit. I tested with your util function and got consistent == True.

The fix is a bit hacky and ugly tho...

from slotformer.

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024

I am training a model in this repo as a sanity-check. Unfortunately, I don't have easy access to the dataset used in SlotFormer right now, so maybe cannot test if it converges faster. @Shamdan17 will you be able to give it a try? Just pull the newest code from my nerv package

from slotformer.

Shamdan17 avatar Shamdan17 commented on May 25, 2024

Hello, I will be able to test it in a couple days, but one thing I notice is that there is no where that the self.epoch value is updated. Typically, set_epoch is called on the sampler at the beginning of each epoch to update this value, otherwise the shuffling is the same across epochs. The warning about this can be found in this doc.

from slotformer.

Wuziyi616 avatar Wuziyi616 commented on May 25, 2024

I got the results of that run, it at least doesn't harm the performance, tho the improvement is also not clear. Let me know if you have any results @Shamdan17. Thanks so much for the help!

from slotformer.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.