Comments (13)
it simply shares the .output buffer for every alternating module. So in total only two .output buffers are needed for the whole network. This can be done in the inference phase, where we dont need to keep around the .output for backward computation.
from dcgan.torch.
From what I understand, we need the .output
during backward computation because the output for the current layer, acts as input to the next layer.
In that case, why exactly do we need to share every alternating module (have two buffers), instead of just having one buffer ?
from dcgan.torch.
because while you are computing a layer, you cant have it's input and output be the same. As you write to the output, you cant read from the output and expect it to be input.
from dcgan.torch.
So, is it possible to set up double buffering for training? Would the required changes simply involve having buffers for gradInput
instead of output
?
from dcgan.torch.
yea you can (for gradInput), but it requires a little bit of change to our method dispatch in backward.
In current backward, all modules' updateGradInput is called, and then all modules' accGradParameters.
This has to be changed such that gradInput dependencies are better scoped.
We have internal patches for this, will open PRs slowly and patch things.
from dcgan.torch.
@soumith I think that currently all modules in nn/cunn do not use gradInput
in accGradParameters
(and that should be always the case except for rare optimizations), so optimizing gradInput
for memory is already possible. Even the newly added SpatialBatchNormalization
(which implements only a backward
in C) would work.
Or am I missing something ?
from dcgan.torch.
cc: @sgross see fmassa's comment wdyt
from dcgan.torch.
@soumith did you mean @colesbury ?
from dcgan.torch.
oops, yea. sgross is his internal username.
from dcgan.torch.
@fmassa, it also requires all nn.Containers to properly override 'backward' so that they call 'backward' on their submodules. Some, like nn.Sequential, already do this, but others, like nn.ConcatTable, do not.
from dcgan.torch.
@colesbury that's needed if we want to optimize backward for speed (by avoiding redundant calls in updateGradInput and accGradParameters).
If what we want is to optimize for space, then I think it's not needed, because gradInput is not used in accGradParameters, I think this refactoring to allow sharing the gradInputs is not needed (but care must be take with parallel containers)
from dcgan.torch.
@fmassa accGradParameters uses gradOutput, even though it doesn't use gradInput. If we share gradInput among all alternating layers, now do you see the problem....
from dcgan.torch.
Ops... you are right. My bad..
from dcgan.torch.
Related Issues (20)
- Single image results / training on single images?
- how to use saved metadata cache
- Segmentation fault (core dumped) when read h5 file in multi thread
- Error could not found HOT 1
- main.lua print libcudnn.so.5
- lsun, cannot open bedroom_train_lmdb_hashes_chartensor.t7
- require 'lmdb' failed
- Original hyperparameters HOT 1
- Does there exists any version for pytorch or tensorflow? HOT 2
- How to deal the dataset with the type of " * .tif "? HOT 1
- Disable saving checkpoints after each epoch HOT 2
- Image Error
- Sizes do not match- error training with my own images
- Errors with DATA_ROOT=celebA th data/crop_celebA.lua HOT 2
- main.lua:45: attempt to call method 'noBias' (a nil value)
- Error generating images
- Quality of generated images
- Torch-7 linux is not getting installed HOT 1
- Problems with Gan's
- Image Generation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dcgan.torch.