Comments (7)
I did it in a bad way (see this line). I should probably switch to the built-in function, but the current code should also work
from slotformer.
Hey, thank you so much for pointing this out. Indeed, there seem to be some issues with the gradient sync in DDP training. Your solution of wrapping the model to perform loss computation within the forward pass sounds very reasonable to me. Have you tried that and did you see e.g. faster convergence speed of a DDP-trained model?
from slotformer.
ok, I believe I have fixed the second issue with this commit. I check it with your function check_sampler_index_consistency
and it doesn't show any overlapping indices. Thanks for pointing that out!
I might need some thought before fixing the first issue.
from slotformer.
The second issue is likely fixed in this commit. I tested with your util function and got consistent == True
.
The fix is a bit hacky and ugly tho...
from slotformer.
I am training a model in this repo as a sanity-check. Unfortunately, I don't have easy access to the dataset used in SlotFormer right now, so maybe cannot test if it converges faster. @Shamdan17 will you be able to give it a try? Just pull the newest code from my nerv
package
from slotformer.
Hello, I will be able to test it in a couple days, but one thing I notice is that there is no where that the self.epoch value is updated. Typically, set_epoch
is called on the sampler at the beginning of each epoch to update this value, otherwise the shuffling is the same across epochs. The warning about this can be found in this doc.
from slotformer.
I got the results of that run, it at least doesn't harm the performance, tho the improvement is also not clear. Let me know if you have any results @Shamdan17. Thanks so much for the help!
from slotformer.
Related Issues (3)
- DDP settings HOT 1
- Weird bug about module import HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slotformer.