Comments (9)
I believe it's because we use non_blocking=True but don't sync after the tensordict creation, I can fix that!
It won't be the case if you do it the other way (cpu -> cuda) as cuda syncs itself.
Thanks for reporting
from tensordict.
In the meantime you can call
torch.cuda.synchronize()
after instantiating the td!
from tensordict.
Looking at it, this has already been solved as now non_blocking
is an arg of tensordict constructor.
If you set it to True
, it will be your responsibility to call synchronize if it's needed.
Hope that makes sense!
from tensordict.
Is the use of non_blocking=True also your own responsibility with regular tensors? I was under the impression that torch would synchronize for you if it was necessary, but maybe I'm wrong.
I'd say it's pretty dangerous to have hidden responsibilities enabled by default. If indeed this is a user's own responsibility then the default setting should be non_blocking=False in my opinion.
from tensordict.
Is the use of non_blocking=True also your own responsibility with regular tensors? I was under the impression that torch would synchronize for you if it was necessary, but maybe I'm wrong.
For cpu -> cuda yes, for anything else no
I'd say it's pretty dangerous to have hidden responsibilities enabled by default. If indeed this is a user's own responsibility then the default setting should be non_blocking=False in my opinion.
I agree. I can put a synchronize in the tensordict - the only thing blocking me right now is that we should do it only at the root tensordict, not at every node, and i'm not sure how I should achieve that without adding yet another keyword argument to __init__
from tensordict.
After further discussion with @albanD we agreed on the following plan:
- tensordict will always use non_blocking=True during creation
- if non_blocking=True is passed by the user, we don't call synchronize at the end
- if non_blocking=False (default) we do call synchronize if device is cpu / mps or any non-cuda
Your calls will be fixed, but you will have the same error if non_blocking is True.
from tensordict.
Sounds like a well thought-out resolution. Thanks a lot!
from tensordict.
After further discussion with @albanD we agreed on the following plan:
- tensordict will always use non_blocking=True during creation
- if non_blocking=True is passed by the user, we don't call synchronize at the end
- if non_blocking=False (default) we do call synchronize if device is cpu / mps or any non-cuda
Your calls will be fixed, but you will have the same error if non_blocking is True.
Double checking, did you mean to say tensordict will always use non_blocking=False during creation
from tensordict.
I meant that under the hood we'll always use non_blocking=True
From a UX perspective, the only different when calling TensorDict(..., non_blocking=True)
will be that a call to syncrhonize
will not be made after sending the tensor asynchronously to the required device. If False
, we call synchronize
on cuda (thereby synchronizing all cuda devices) whenever cuda is available, and same for MPS.
Using non_blocking=True
will be useful if you want to keep control over which stream is synchronized. For instance, you may have some tensors on cuda:0 that you want to bring to cpu while cuda:1 is busy doing somethings else. In that case, non_blocking=False will dispatch things cuda:0 -> cpu async but then call a synchronize over both cuda:0 and cuda:1, which can be annoying. With non_blocking=True, you can avoid that and call the synchronize over the first stream only.
from tensordict.
Related Issues (20)
- [Bug] TensorDict `gather` method not preserving `device` attribute
- [BUG] @property setter fails with tensorclass
- [BUG] TensorDictModule with single NonTensorStack fails
- [BUG] tensorclass as the inputs of torch.vmap HOT 1
- [BUG] NonTensorData behavior with equal data is not transparent to the rest of the library HOT 2
- [Feature Request] Support torch.uint16 dtype
- [BUG] torch.stack is not implemented for NonTensorStack and NonTensorData HOT 1
- [BUG] `_check_dim_name` expected but not implemented for `NonTensorData`
- Implemented Operations between two TensorDicts are defined by their position in their respective trees, not their keys HOT 1
- [BUG] PyPI tensordict-nightly releases >= 2024.7.4 are windows only again HOT 1
- [Question / BUG] Working with ragged tensors / stacking NonTensorData which contain tensors of different shapes HOT 1
- [Feature Request] Document masks in specs
- [BUG] Flaky tests since map_iter
- Do not set global validate_args
- [BUG] Non-unicode code causes torch.compile to fail HOT 2
- [BUG] tensorclass select breaks non-selected keys HOT 8
- [Feature Request] functiondict/opdict: TensorDict of Callables for apply() HOT 4
- [BUG] Iterating through keys for a tensordict seems to fail with runtime error on stopIteration HOT 7
- [BUG] Random 'serialization' when calling keys() vs flatten_keys().keys()
- [Feature Request] Ability to incrementally write a memmap tensordict HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensordict.