Comments (4)
Hi @johannespitz, in general you can use the adjoint=True
flag to manually invoke the backward version of a kernel. This is what the wp.Tape
object does, but it also takes care of a few other complications like tracking launches, zeroing gradients, etc.
When you run a backward pass it does accumulate gradients (always adds to existing arrays), this is similar to PyTorch, but it means that indeed you need to make sure they are zero'd somewhere between optimization steps.
I don't think you should need to call wp.synchronize()
explicitly. @nvlukasz can you confirm?
from warp.
Thank's for the reply @mmacklin!
And I would be very interested to hear if/where we really need to call wp.synchronize()
@nvlukasz.
Regarding the accumulation of gradients. When we use wp.from_torch()
directly, as it is used in
instead of creating a new pytorch tensor with
.clone()
the gradient of leaf nodes in the computation graph will be 2x the true gradient, even when we clear all gradients before the call.That is because torch expects
torch.autograd.Function
's to return the gradient and not write it directly into the buffer. Therefore, torch then adds the returned gradient to the gradient that warp already wrote into the buffer (for leafs in the computation graph). Note for intermediate nodes it works only because usually (if retain_graph=False
) the gradient buffers of those tensors are not used at all.from warp.
CUDA synchronization can be a little tricky, especially when launching work using multiple frameworks that use different scheduling mechanisms under the hood.
Short answer: If you're not explicitly creating and using custom CUDA streams in PyTorch or Warp, and both are targeting the same device, then synchronization is not necessary.
Long answer: By default, PyTorch uses the legacy default stream on each device. This stream is synchronous with respect to other blocking streams on the device, so no explicit synchonization is needed. Warp, by default, uses a blocking stream on each device, so Warp operations will automatically synchronize with PyTorch operations on the same device.
The picture changes if you start using custom streams in PyTorch. Those streams will not automatically synchronize with Warp streams, so manual synchronization will be required. This can be done using wp.synchronize()
, wp.synchronize_device()
, or wp.synchronize_stream()
. These functions synchronize the host with outstanding GPU work, so launching new work will be done after prior work completes. We also support event-based device-side synchronization, which is generally faster because it doesn't sync the host and only ensures that the operations are synchronized on the device. This includes wp.wait_stream()
and wp.wait_event()
, as well as interop utilities like wp.stream_from_torch()
and wp.stream_to_torch()
.
Note that when capturing CUDA graphs using PyTorch, a non-default stream is used, so synchronization becomes important.
Things can get a little complicated with multi-stream usage and graph capture, so we're working on extended documentation in this area! But in your simple example, the explicit synchronization shouldn't be necessary.
from warp.
Thank you for the detailed answer regarding the synchronization! @nvlukasz
Though, can either of you comment on the accumulation of the gradients again. @mmacklin
Am I missing something, or is the example code incorrect at the moment?
(Note: Optimizations with the 2x the gradient will likely work just fine, but if someone wants to extend the code they might run into problems.)
from warp.
Related Issues (20)
- parse_mjcf doesn't parse the freejoint tag HOT 1
- Fix bug with backward of batched_matmul. HOT 1
- Adding cubic root HOT 2
- SLANG.D as differentiable renderer for Warp HOT 1
- Self-collision in cloth simulation HOT 2
- Failed to build CPU module because no CPU buildchain was found HOT 2
- Custom Gradient Function Variable Naming HOT 2
- Segmentation camera using warp HOT 1
- RuntimeError: NanoVDB signature not found HOT 2
- Gradient through multi-dimensional arrays HOT 1
- update the mesh during simulation HOT 2
- attach some cloth vertices to moving mesh HOT 1
- Multiple inequality a < b < c behavior not consistent with python language HOT 1
- RuntimeError: CUDA is not available HOT 1
- Warp Clang/LLVM build error when python build_lib.py HOT 6
- Adjoint of Matmul HOT 2
- Boids example not loading HOT 3
- particle delta calculations are constraints/(w1+w2) but should be constraints*w1/(w1+w2)?
- Wrong gradient computation through loops
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from warp.