pytorch / tensordict Goto Github PK

View Code? Open in Web Editor NEW

803.0 803.0 65.0 28.55 MB

TensorDict is a pytorch dedicated tensor container.

License: MIT License

Shell 0.65% Python 99.21% C++ 0.14%

tensordict's Introduction

PyTorch is a Python package that provides two high-level features:

Tensor computation (like NumPy) with strong GPU acceleration
Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch
Installation
Getting Started
Resources
Communication
Releases and Contributing
The Team
License

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component	Description
torch	A Tensor library like NumPy, with strong GPU support
torch.autograd	A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit	A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn	A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing	Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils	DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

A replacement for NumPy to use the power of GPUs.
A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast — whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.

Installation

Binaries

Commands to install binaries via Conda or pip wheels are on our website: https://pytorch.org/get-started/locally/

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX1/TX2, Jetson Xavier NX/AGX, and Jetson AGX Orin are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source

Prerequisites

If you are installing from source, you will need:

Python 3.8 or later (for Linux, Python 3.8.1+ is needed)
A compiler that fully supports C++17, such as clang or gcc (gcc 9.4.0 or newer is required)

We highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

NVIDIA CUDA Support

If you want to compile with CUDA support, select a supported version of CUDA from our support matrix, then install the following:

NVIDIA CUDA
NVIDIA cuDNN v8.5 or above
Compiler compatible with CUDA

Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in setup.py.

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

AMD ROCm Support

If you want to compile with ROCm support, install

AMD ROCm 4.0 and above installation
ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in setup.py.

Intel GPU Support

If you want to compile with Intel GPU support, follow these

PyTorch Prerequisites for Intel GPUs instructions.
Intel GPU is supported for Linux and Windows.

If you want to disable Intel GPU support, export the environment variable USE_XPU=0. Other potentially useful environment variables may be found in setup.py.

Install Dependencies

Common

conda install cmake ninja
# Run this command from the PyTorch directory after cloning the source code using the “Get the PyTorch Source“ section below
pip install -r requirements.txt

On Linux

pip install mkl-static mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda121  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

# (optional) If using torch.compile with inductor/triton, install the matching version of triton
# Run from the pytorch directory after cloning
# For Intel GPU support, please explicitly `export USE_XPU=1` before running command.
make triton

On MacOS

# Add this package on intel x86 processor machines only
pip install mkl-static mkl-include
# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

pip install mkl-static mkl-include
# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install PyTorch

On Linux

If you would like to compile PyTorch with new C++ ABI enabled, then first run this command:

export _GLIBCXX_USE_CXX11_ABI=1

Please note that starting from PyTorch 2.5, the PyTorch build with XPU supports both new and old C++ ABIs. Previously, XPU only supported the new C++ ABI. If you want to compile with Intel GPU support, please follow Intel GPU Support.

If you're compiling for AMD ROCm then first run this command:

# Only run this if you're compiling for ROCm
python tools/amd_build/build_amd.py

Install PyTorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py develop

Aside: If you are using Anaconda, you may experience an error caused by the linker:
build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1
This is caused by ld from the Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.8.1+.

On macOS

python3 setup.py develop

On Windows

Choose Correct Visual Studio Version.

PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from https://visualstudio.microsoft.com/visual-cpp-build-tools/. The build tools do not come with Visual Studio Code by default.

If you want to build legacy python code, please refer to Building on legacy code and CUDA

CPU-only builds

In this mode PyTorch computations will run on your CPU, not your GPU

conda activate
python setup.py develop

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

CUDA based build

In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a. MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations

cmd

:: Set the environment variables after you have downloaded and unzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py develop

Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

You can also pass the CMAKE_VARS="..." environment variable to specify additional CMake variables to be passed to CMake during the build. See setup.py for the list of available variables.

make -f docker.Makefile

Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Note: if you installed nodejs with a different package manager (e.g., conda) then npm will probably install a version of katex that is not compatible with your version of nodejs and doc builds will fail. A combination of versions that is known to work is [email protected] and [email protected]. To install the latter with npm you can run npm install -g [email protected]

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:

Resources

Communication

Forums: Discuss implementations, research, etc. https://discuss.pytorch.org
GitHub Issues: Bug reports, feature requests, install issues, RFCs, thoughts, etc.
Slack: The PyTorch Slack hosts a primary audience of moderate to experienced PyTorch users and developers for general chat, online discussions, collaboration, etc. If you are a beginner looking for help, the primary medium is PyTorch Forums. If you need a slack invite, please fill this form: https://goo.gl/forms/PP1AGvNHpSaJP8to1
Newsletter: No-noise, a one-way email newsletter with important announcements about PyTorch. You can sign-up here: https://eepurl.com/cbG0rv
Facebook Page: Important announcements about PyTorch. https://www.facebook.com/pytorch
For brand guidelines, please visit our website at pytorch.org

Releases and Contributing

Typically, PyTorch has three minor releases a year. Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page. For more information about PyTorch releases, see Release page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Soumith Chintala, Gregory Chanan, Dmytro Dzhulgakov, Edward Yang, and Nikita Shulga with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.

License

PyTorch has a BSD-style license, as found in the LICENSE file.

tensordict's People

Contributors

Stargazers

Watchers

tensordict's Issues

[Feature Request] Create a `TensorDict.set_default` method

Motivation

We're missing a set_default method for TensorDict.
This would be a nice-to-have.

Solution

The set_default should work with all TensorDictBase subclasses. It should be properly tested too.

Default batch_size + TensorDictModule can be surprising

TensordictModule allows to use a tensordict/class as input to the forward module. But it also support the nn.Module style call with tensors.
Under the hood it creates a new tensordict that then forwards to the call method. In doing so, however, it uses the default batch_size.

I would say that using the maximum size is probably not the most common case. Can we reconsider the default or find a way to get the right size from the model itself?

[Feature Request] Deprecate `_run_checks` in favour of `TensorDict._checked`

Motivation

As mentioned here, we could get rid of _run_checks in the TD constructor in favour of a TensorDict._checked attribute.

This would likely break things in torchrl but it's torchrl's maintainers problem since they're using a private feature of TensorDict.

cc @apbard

[Feature Request] Nested keys compatibility improvement

Motivation

We need to improve the compatibility with nested keys. This should include:

TensorDict.get(nested_key)
TensorDict.set(nested_key, value) (+ creation of the sub-td if the original key is missing)
TensorDict.set_(nested_key, value)
TensorDict.keys()
TensorDict.select(*keys)
TensorDict.exclude(*keys)
TensorDict.set_default(key, value)
TensorDictModule
TensorDictSequential

We will cover once it's implemented

TensorDict.pop(key)

Contentious methods / unresolved issues:

items: we could return the first level by default and return nested ones if a flag is set to true?
values: same reason
meta-tensor: building the features in parallel for meta-tensor will double the amount of work (?)
empty sub-tensordict behaviour: default behaviour would return only leaves, but one could ask for every level OR to return leaf-tensordicts.

cc @tcbegley

[BUG] Error in sequential indexing of TensorDict with MemmapTensor

Describe the bug

Sampling two dimensions in a consecutive way produces an error when using MemmapTensor. If a standard Tensor is used, no error is raised.

To Reproduce

Steps to reproduce the behavior.

import tensordict, numpy, sys, torch
from tensordict import TensorDict
torch.manual_seed(4584371022706121143)

A, B = 10, 2
SAMPLE_SIZE = 1
tensordict = TensorDict({"a": torch.rand(A, B, 2), "b": torch.rand(A, B, 1)}, [A, B])
print("b: ", tensordict["b"])
tensordict.memmap_()
print(tensordict)

# sample first dimension
idx = torch.randint(0, A, (SAMPLE_SIZE,))
tensordict1 = tensordict[idx]
print(tensordict1)

# sample second dimension
step_idx = torch.randint(0, B, (SAMPLE_SIZE,))
tensordict2 = tensordict1[range(SAMPLE_SIZE), step_idx]
print(tensordict2)

tensordict2['b'] * 0.9

Output and stack traces

b:  tensor([[[0.4750],
         [0.1718]],

        [[0.0098],
         [0.7934]],

        [[0.2763],
         [0.5519]],

        [[0.2940],
         [0.8162]],

        [[0.3873],
         [0.9035]],

        [[0.6649],
         [0.1111]],

        [[0.4237],
         [0.0228]],

        [[0.2010],
         [0.4994]],

        [[0.4820],
         [0.4220]],

        [[0.0147],
         [0.4110]]])
TensorDict(
    fields={
        a: MemmapTensor(shape=torch.Size([10, 2, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        b: MemmapTensor(shape=torch.Size([10, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([10, 2]),
    device=None,
    is_shared=False)
TensorDict(
    fields={
        a: MemmapTensor(shape=torch.Size([1, 2, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        b: MemmapTensor(shape=torch.Size([1, 2, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([1, 2]),
    device=None,
    is_shared=False)
TensorDict(
    fields={
        a: MemmapTensor(shape=torch.Size([1, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        b: MemmapTensor(shape=torch.Size([1, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([1]),
    device=None,
    is_shared=False)
Traceback (most recent call last):
  File "td_test.py", line 27, in <module>
    tensordict2['b'] * 0.9
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 610, in __mul__
    return torch.mul(self, other)
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 418, in __torch_function__
    args = tuple(a._tensor if hasattr(a, "_tensor") else a for a in args)
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 418, in <genexpr>
    args = tuple(a._tensor if hasattr(a, "_tensor") else a for a in args)
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 431, in _tensor
    return self._load_item(self._index)
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 390, in _load_item
    memmap_array = self._get_item(_idx, memmap_array)
  File ".local/lib/python3.8/site-packages/tensordict/memmap.py", line 375, in _get_item
    memmap_array = memmap_array[idx]
  File ".conda/envs/testtd/lib/python3.8/site-packages/numpy/core/memmap.py", line 334, in __getitem__
    res = super().__getitem__(index)
IndexError: index 1 is out of bounds for axis 1 with size 1

Expected behavior

If the line tensordict.memmap_() is commented, the code does not produces any error.

System info

conda create -n testtd pip python==3.8
pip install torchrl

tensordict.__version__:  0.1.2
numpy.__version__:  1.24.3
sys.version:  3.8.0 (default, Nov  6 2019, 21:49:08)
[GCC 7.3.0]
sys.platform:  linux
torch.__version__:  2.0.1+cu117

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Caching keys

Motivation

Getting the list of keys can be expensive, especially when we ask for the nested keys.
We could perhaps cache them.

Updating the tensordict (either using set, update, setdefault, rename, select, exclude) would trigger an update of the keys.

The major blocker for this is that modifying a nested tensordict should impact the parent tensordict, which implementation may open the door to a pandora box that we may want to keep close (ie having a parenting mechanism similar to replay buffers or transforms):

subtd = TensorDIct({"b": [3]}, [])
tensordict = TensorDict({"a": subtd}, [])
subtd["c"] = [1]  # tensordict should be informed about this too

cc @tcbegley

[BUG] Wrong batch_size

Describe the bug

The shape of a batch is the full length of the dataset

To Reproduce

I am using version =0.0.3

import torch
from tensordict import MemmapTensor
from tensordict.prototype import tensorclass
from torch.utils.data import DataLoader

@tensorclass
class Data:
    images: torch.Tensor
    targets: torch.Tensor

    @classmethod
    def from_tensors(cls,  device: torch.device):
        len_dataset = 1000
        images, targets = torch.randn((len_dataset, 28, 28)), torch.randn((len_dataset, 1))
        data = cls(
            images=MemmapTensor.from_tensor(images, filename="images.dat"),
            targets=MemmapTensor.from_tensor(targets, filename="targets.dat"),
            batch_size=[len_dataset],
            device=device,
        )

        data.memmap_()

        return data
    
data = Data.from_tensors(device="cuda")

dl = DataLoader(data, batch_size=64, collate_fn=lambda x: x)

for batch in dl:
    print(batch.images.shape)

The output is, should be 64 as first shape (the batch_size)

torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])
torch.Size([1000, 28, 28])

[BUG] functorch.dim breaks split of big tensors

Describe the bug

For some reason (probably unrealted to tensordict), functorch dims fails with big tensors.

import torch
from tensordict import TensorDict

# build splits
total = 0
l = []
while total < 6400:
    l.append(torch.randint(2, 10, (1,)).item())
    total += l[-1]

# build td
tensordict = TensorDict({"x": torch.randn(6401, 1), "y": torch.randn(6401, 1)}, [])

# split
for k, x in tensordict.items():
    x.split(l, 0)

This should result in an error like this:

libc++abi: terminating with uncaught exception of type c10::Error: allocated_ <= ARENA_MAX_SIZE INTERNAL ASSERT FAILED at "/stuff/pytorch/pytorch/pytorch/functorch/csrc/dim/arena.h":227, please report a bug to PyTorch.

[Feature Request] `tensordict.entry_dtype(entry)`, `tensordict.entry_device(entry)`

Motivation

In #168 we introduce tensordict._entry_class. Maybe it would make sense to have the same for dtype and device to avoid doing things like if tensordict.get(key).dtype is torch.float32.
These methods should be made public and documented.
We should find a proper way to do it with lazy tensordicts (especially stack). One option would be to take the first tensordict of the list and query that method on it, implicitely assuming that homogeneous tensordicts have been stacked. This is prone bug if one stacks tensordicts with varying dtypes but that's a behaviour that is not supported anyway (as usual: do we want to apply expensive checks or just tell the users to be cautious because we don't?)

cc @tcbegley

cc @matteobettini fyi since this implies lazy-stacked tds

[Feature Request] Operations between TensorDicts

Motivation

It would be really useful if there would be the possibility to perform operations such as sum, multiplication etc... between two TensorDicts ( key-wise). Is this a feature that you have in mind to introduce in future?

Right now I noticed that is not possible.

Checklist

[ X] I have checked that there is no similar issue in the repo (required)

[Feature Request] `get` and `set` tuple support

Motivation

Like for __getindex__ and __setindex__, we should cover get and set usage with tuples. If the key is not present, set should create an empty TensorDict with same batch size and same device and populate it with the desired value.

Cons:

This may cause some overhead. I would not use a try/except which is expensive, just a regular isinstance.

Cc @tcbegley

[Bug] Indexing i:i+1 not handled correctly

from tensordict import tensorclass
import torch

@tensorclass
class MyData:
    x: torch.Tensor
a = MyData(x=torch.ones((3,10)), batch_size=[3])
b = MyData(x=torch.zeros((10)), batch_size=[])
# this works
a[1] = b
# this raises but should not
a[1:2] = b

[DOC] add this comment to code

          This is to replicate the logic above, if `self` is a `SubTensorDict` we want to process the input, hence we call `set` rather than `_set`.

Originally posted by @tcbegley in #259 (comment)

[Feature Request] Support for nested tensors

Motivation

Support for nested tensors should work as torch.stack(tensordicts, 0)
(we can't currently override torch.nested.as_nested_tensor so we'll need to write a custom op)

We don't need to subclass TensorDict for this but there are a few caveats:

Reshaping won't be allowed
Shape can't be accessed
Indexing can only be done on the first dim

For instance, populating a tensordict with a nested tensor won't work as we can't access the shape.

If nested tensors are found in a tensordict, all tensors should have that feature.
Indexing such a tensordict along the second dimension will require:

splitting the tensors that are nested
indexing those tensors
re-nesting them

[Feature Request] Benchmarks in GH actions

Motivation

We could create a benchmarking workflow in gh action and log these after very merge.

This would probably involve turning the benchmarks into pytest?

The alert is super cool too!

Ideally we'd like to have one workflow on CPU and one on GPU.

Resources:

@tcbegley pinging you for context. I will create a BC task with this, but feel free to add anything you think is relevant.

review and update getitem docstring

Actually reading this makes me think it is not true for torch.Tensor indices. I think now you need to call get_subtensordict explicitely to get a tensor that shares the same data pointer.

Originally posted by @vmoens in #226 (comment)

[Feature Request] tensordict.split

Motivation

It would be nice to have a tensordict.split method that would work as tensor.split.
We would also need to overload torch.split as we do for torch.stack if possible.

Example:

tensordict = TensorDict({}. [10])
tensordict.split(5, 0)  # results in 2 tensordict of batch_size [5] and [5]
tensordict.split([5, 3, 2], 0)  # results in 3 tensordict of batch_size [5], [3] and [2]
torch.split(tensordict, [5, 3, 2], 0)  # same as above

[BUG] 'distribution_kwargs' are not correctly passed to dist in 'ProbabilisticTensorDictModule'

Describe the bug

Hi All,

It seems to me that if I set custom arguments distribution_kwargs for a distribution with ProbabilisticTensorDictModule, these arguments only get assigned to the object (see tensordict/tensordict/nn/probabilistic.py line 215):

self.distribution_kwargs = (
            distribution_kwargs if distribution_kwargs is not None else {}
        )

but later on, when the dist is fetched def get_dist(self, tensordict: TensorDictBase) -> d.Distribution:, they are not used (see tensordict/tensordict/nn/probabilistic.py line 225):

dist_kwargs = {
    dist_key: tensordict[td_key]
    for dist_key, td_key in self.in_keys.items()
}
dist = self.distribution_class(**dist_kwargs)

In my case, this led to the issue that a custom min/max range for the dist didn't get set correctly as can be seen in the following small example.

To Reproduce

import torch
from tensordict.nn import ProbabilisticTensorDictModule
from tensordict import TensorDict
from torchrl.modules.distributions.continuous import TanhNormal

min_dist = torch.Tensor([-10, -10])
prob_module = ProbabilisticTensorDictModule(
    in_keys=["loc", "scale"],
    out_keys=["action"],
    distribution_class=TanhNormal,
    distribution_kwargs={
        "min": min_dist,
        "max": min_dist * -1,
    },
    return_log_prob=True,
)
print(f'prob_module.distribution_kwargs[\'min\']: {prob_module.distribution_kwargs["min"]}')

td = TensorDict(
    {
        'loc': torch.Tensor([0, 0]),
        'scale': torch.Tensor([0.1, 0.1]),

    }, [])

dist = prob_module.get_dist(td)
print(f'dist.min {dist.min}')

assert all(dist.min == min_dist)

Expected behavior

distribution_kwargs should be used when calling ProbabilisticTensorDictModule.get_dist().

System info

tensordict.version == 2023.01.09
numpy.version == 1.23.2
sys.version == 3.8.15 (default, Oct 12 2022, 19:15:16) \n[GCC 11.2.0]
sys.platform == linux
torch.version == 2.0.0.dev20230108+cu116

Reason and Possible fixes

In tensordict/tensordict/nn/probabilistic.py line 229, just adding **self.distribution_kwargs should fix it:

dist = self.distribution_class(**dist_kwargs, **self.distribution_kwargs)

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Improve first-class dimensions support

Motivation

#5 added support for first-class dimensions. There is however at least one missing feature and some ways we could consider improving this:

Missing Feature

It should be possible to index / order with tuples of first-class dimensions for convenient flattening / reshaping operations. Consider the following example from the torchdim repo.

i, j, k = dims(3)
j.size = 2
A = torch.rand(6, 4)
a = A[(i, j), k] # split dim 0 into i,j
print(i.size, j.size, k.size)
> 3 2 4

r = a.order(i, (j, k)) # flatten j and k
print(r.shape)
> torch.Size([3, 8])

Currently this will lead to an error if attempted on a TensorDict.

Possible improvement

One possible enhancement would be to allow the user to specify first-class dimensions in the batch_size argument when instantiating a TensorDict. E.g.

d1, d2 = dims(2)
td = TensorDict({}, batch_size=[d1, d2, 3, 4])

# roughly equivalent to
td = TensorDict({}, batch_size=[1, 2, 3, 4])[d1, d2]

The difficulty here is likely to be that if the tensordict is empty, we do not know the size of the first class dimensions and they will be unbound until something is added to the tensordict. Assuming that can be worked around it might be a nice convenience?

[Feature Request] allow infering the batch size from a dict when creating a tensordict instance

Motivation

Suppose I have a dictionary as follows:

x = {'a':torch.randn(4,3), 'b': torch.randn(4,5)}

If I want to convert this to a tensordict, I will get an error if I don't specify the batch size:

y = TensorDict(x)  # won't run

In this case, can we support inferring the batch size from the first common dimension?

This is quite useful when integrating tensordict with other libraries. For example, in HuggingFace Accelerate, it converts the network output to the right precision by using ConvertOutputsToFp32. But it won't work with Tensordict currently because Tensordict does not allow creating a tensordict instance without explicitly specifying the batch size. The specific line that errors out is this.

[Feature Request] More informative shape in `LazyStackedTensorDict` with heterogenous tensors

Motivation

Currently, when creating a stack of tensordicts with different shapes, the shape of the heterogenous componenets is reporetd as *.

It would be nice to have a more informative shape printed out, in the following way:

td1 = TensorDict({"a": torch.randn(3, 4, 3, 255, 256)}, [3, 4])
td2 = TensorDict({"a": torch.randn(3, 4, 3, 254, 256)}, [3, 4])
td = torch.stack([td1, td2], 0)

Before:

print(td)
LazyStackedTensorDict(
    fields={
        a: Tensor(*, dtype=torch.float32)},
    batch_size=torch.Size([2, 3, 4]),
    device=None,
    is_shared=False)

proposed change

print(td)
LazyStackedTensorDict(
    fields={
        a: Tensor([2,3,4,*,256], dtype=torch.float32)},
    batch_size=torch.Size([2, 3, 4]),
    device=None,
    is_shared=False)

As mentioned in #135

Fastest way to load TensorDict data?

Hi there! First of all thanks for your work, I really enjoy tensordicts!

I have a question about performance. What is the best way to load TensorDicts?
I ran some test on possible combinations of dataloaders / collate_fn:

Simple benchmarking code

import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data.dataloader import DataLoader
from tensordict import TensorDict

a = torch.rand(1000, 50, 2)
td = TensorDict({"a": a}, batch_size=1000)

Case 1: store data as tensors, create TensorDicts on the run

class SimpleDataset(Dataset):
    def __init__(self, data):
        # We split into a list since it is faster to dataload (fair comparison vs others)
        self.data = [d for d in data]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]
    
dataset = SimpleDataset(td['a'])
dataloader = DataLoader(dataset, batch_size=32, collate_fn=torch.stack)
x = TensorDict({'a': next(iter(dataloader))}, batch_size=32)

%timeit for x in dataloader: TensorDict({'a': x}, batch_size=x.shape[0])

520 µs ± 833 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Case 2: store data as TensorDicts and directly load them

class TensorDictDataset(Dataset):
    def __init__(self, data):
        self.data = [d for d in data]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]
    
data  = TensorDictDataset(td)
# use collate_fn=torch.stack to avoid StopIteration error
dataloader = DataLoader(data, batch_size=32, collate_fn=torch.stack)
x = next(iter(dataloader))

%timeit for x in dataloader: pass

1.72 ms ± 5.57 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Case 3: store TensorDict data as dictionaries and create TensorDicts on the run with collate_fn

class CustomTensorDictDataset(Dataset):
    def __init__(self, data):
        self.data = [
            {key: value[i] for key, value in data.items()}
            for i in range(data.shape[0])
        ]

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]


class CustomTensorDictCollate(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, batch):
        return TensorDict(
            {key: torch.stack([b[key] for b in batch]) for key in batch[0].keys()},
            batch_size=len(batch),
        )
    
data = CustomTensorDictDataset(td)
dataloader = DataLoader(data, batch_size=32, collate_fn=CustomTensorDictCollate())
x = next(iter(dataloader))

%timeit for x in dataloader: pass

567 µs ± 924 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Apparently, splitting data into dictionaries and creating TensorDicts on the run is the fastest way to load data... but why is it not faster to just index TensorDicts instead? And is there a better way?

[Feature Request] Support for TensorDictBase.masked_select inplace

Motivation

Giving the ability of .masked_select() inplace for the TensorDictBase.

Solution

Giving the ability of .masked_select() like but modification-inplace for the TensorDictBase, by a method named .masked_select_().

Main steps to achieve this:

Iterate key-values and collect masked tensors for values with type leaf tensor
Iterate key-values with type of nested TensorDict, and call recursively .masked_select_()
Modify the batch_size to the correct

Examples:

td = TensorDict(source={'a': torch.zeros(3, 4)},
    batch_size=[3])
mask = torch.tensor([True, False, False])
td.masked_select_(mask)
td.get("a")
#output: tensor([[0., 0., 0., 0.]])

[Feature Request] Create an explicit error when calling `tensordict.dtype(smth)`

Motivation

Since dtype is not a tensordict attribute by design, we should raise an error when the corresponding method is being called.

[BUG] Performance drops after long time

Hello,

thank you for creating TensorDict, it helps me a lot.

I use TensorDict as rolling replay buffer in a Reinforcement Learning application. I need lots of storage since my data involves images. The memmap feature is very handy in this case. Recently, I started seeing sharp performance drops during training after running for quite some time. This usually indicates that there are some tensors leaking and are not being collected.

After days I was able to track this issue down to my replay buffer implementation, which you can find below. The problem occurs when running many iterations over time. The performance drop seems to be caused more by buffer.get() rather than buffer.put(), even though buffer.put() also causes rather small performance drops over time. This happens for both small and large dicts.

There does not seem to be any issue when writing static data once (on iteration 0) and reading afterwards. Results seem to be cached and reading is super fast with no drop (at least I do not observe any).

import os
import shutil
import sys
import uuid

import numpy as np
import psutil
import torch
from tensordict import TensorDict, MemmapTensor
from tqdm import tqdm


class TensorDictReplayBuffer:
    def __init__(self, max_size, prefix):
        super().__init__()
        self.max_size = max_size
        self.current_index = 0
        self.current_size = 0
        self.prefix = prefix

        self.cache = None
        self.separator = "."

    @property
    def size(self):
        return self.current_size

    def setup_cache(self, data: TensorDict):
        os.makedirs(self.prefix, exist_ok=True)
        template = {
            k: MemmapTensor(
                self.max_size, *v.shape,
                dtype=v.dtype, filename=os.path.join(self.prefix, k)
            )
            for k, v in data[0].to_tensordict().items()
        }
        self.cache = TensorDict(template, batch_size=[self.max_size])

    def put(self, data: TensorDict) -> None:
        data = data.flatten_keys(self.separator).clear_device_()
        if self.cache is None:
            self.setup_cache(data)

        n = len(data)
        indices = torch.arange(self.current_index, self.current_index + n) % self.max_size
        self.cache[indices] = data
        self.current_size = min(self.max_size, self.current_size + n)

    def get(self, batch_size: int) -> TensorDict:
        indices = np.random.choice(self.current_size, batch_size, replace=False)
        return self.cache[indices].unflatten_keys(self.separator)


def data():
    n = 10
    return TensorDict(
        dict(
            dummy=torch.ones(n, n)
        ),
        batch_size=[]
    )

shutil.rmtree("data", ignore_errors=True)
os.makedirs("data", exist_ok=True)
buffer = TensorDictReplayBuffer(max_size=1_000_000, prefix=f"data/{uuid.uuid4()}")
n_reads = 10000
n_per_iteration = 5
batch_size = 128

for i in range(1000):
    with tqdm(total=n_reads, file=sys.stdout, desc=f"Iteration {i}") as pbar:
        for _ in range(n_reads // n_per_iteration):

            trajectory = torch.stack([data() for _ in range(n_per_iteration)])

            # if i == 0:
            buffer.put(trajectory)

            for _ in range(n_per_iteration):
                if buffer.size >= batch_size:
                    batch = buffer.get(batch_size=batch_size)

                pbar.update(1)

        pbar.set_postfix(dict(
            gpu_mem_mb=torch.cuda.memory_allocated() / 1e6,
            cpu_mem_mb=psutil.Process().memory_info().rss / 1e6
        ))

Is this something that can be explained given my implementation? Is this even an intended use-case for mem-mapped dicts? Or is there any other way to accomplish this?

Any pointers are appreciated. Thanks a lot.

[Feature Request] Make the last sync with torchrl

Motivation

TorchRL's version of tensordict and modules has changed a bit since we started this work.

From now on, TensorDict will only evolve in this repo. However we still need to do a last manual sync with the library.

[BUG] Global "indexing" of tensorclasses types leads to type conflicts

debugging test failures from #293 we found out that the solution based on CLASSES_DICT
leads to weird behaviour when classes with the same name are being defined (even if locally)

getting the type directly from the instatiated object instead of using the annotation might solve the problem

[Feature Request] data class like behaviour

Motivation

Some users have pointed out that data classes offer advantages over dict-like classes:

predictability: one knows in advance what fields to expect, the code is transparent although less generic
autocomplete: IDEs can keep track of the attributes through type hinting
dedicated behaviour: a class can have a set of dedicated methods that act on its attributes.

Proposed API

We could have a mixed approach where a data class would build a tensordict and interface with it. Attributes would point to keys. The methods would be shared across classes.

@tensordict_data  # better name?
class MyData:
   float_tensor
   sparse_tensor

   def stuff(self):
       return self.float_tensor + self.sparse_tensor 

data = MyData(float_tensor=a, sparse_tensor=b, batch_size=[3, 4])
data.reshape(-1) # returns a new MyData with shape/ batch_size of [12]
data.batch_size # picks the batch size of the tensordict
data.missing_key # returns a dedicated error since this key is not expected

Challenges

Any such dataclass methods (such as reshape, split etc) should return a new instance of the same class.
Upon creation (at loading time) one should check that the data class attributes do not collide with the tensordict ones.
Nesting with data classes will be slightly harder than with tensordict (see this thread)

[BUG] repr of SubTensorDict with nested items seems wrong

Describe the bug

When a SubTensorDict contains nested tensordicts, the representation of those tensordicts doesn't take into account the slicing done by SubTensorDict so the shape information is a bit confusing.

To Reproduce

import torch
from tensordict import TensorDict

td = TensorDict({"a": TensorDict({"b": torch.rand(1, 2)}, [1, 2])}, [1])
std = td.get_sub_tensordict(0)

Now compare

print(std)
# SubTensorDict(
#     fields={
#         a: TensorDict(
#             fields={
#                 b: Tensor(torch.Size([1, 2, 1]), dtype=torch.float32)},
#             batch_size=torch.Size([1, 2]),
#             device=None,
#             is_shared=False)},
#     batch_size=torch.Size([]),
#     device=None,
#     is_shared=False)

whereas

print(std["a"])
# TensorDict(
#     fields={
#         b: Tensor(torch.Size([2, 1]), dtype=torch.float32)},
#     batch_size=torch.Size([2]),
#     device=None,
#     is_shared=False)

which is different to what is listed under the fields of std above.

[Feature Request] Refactor _iter_helper to use metatensors instead of real tensors

Motivation

The current implementation of _iter_helper gathers all the values to assess they type and act accordingly given the recursivity requirements.
This is suboptimal as for lazy tensordict classes it involves executing the transformation to get the type of the key.
We should use meta-tensors instead.

[BUG] Auto-nested tensordict bugs

Describe the bug

Auto-nesting may be a desirable feature (e.g. to build graphs), but currently it is broken for multiple functions, e.g.

tensordict = TensorDict({}, [])
tensordict["self"] = tensordict
print(tensordict)  # fails
tensordict.flatten_keys()  # fails
list(tensordict.keys(include_nested=True))  # fails

Consideration

This is something that should be included in the tests. We could design a special test case in TestTensorDictsBase with a nested self.

## Solution

IMO there is not a single solution to this problem. For repr, could find a way of representing a nested tensordict, something like

TensorDict(fields={
   "self": ...
}, 
batch_size=[])

For keys, we could avoid returning a key if it a key pointing to the same value has already been returned (same for values and items).
For flatten_keys, it should be prohibited for TensorDict. The options are (1) leave it as it is since the maximum recursion already takes care of it or (2) build a wrapper around flatten_keys() to detect if the same method (i.e. the same call to the same method from the same class) is occurring twice, something like

def detect_self_nesting(fun):
    def new_fun(self, *args, **kwargs):
        if fun in self._being_called:
             raise RuntimeError
        self._being_called.append(fun)
        out = fun(self, *args, **kwargs)
        self._being_called.pop()
        return out
    return new_fun

There are probably other properties that I'm missing, but i'd expect them to be covered by the tests if we can design the dedicated test pipeline mentioned earlier.

[BUG] Partially instantiated sub-tensordict silently collide with flatten ones when unflattened

Describe the bug

When a tensordict is created with mixed nested and flattened keys, a call to unflatten_keys silently squashes them against each other.

To Reproduce

from tensordict import TensorDict
tensordict = TensorDict({"a": [1, 2], "c.a": [1, 2], "c": TensorDict({"b": [1, 2]}, [])}, [])
print(tensordict.unflatten_keys("."))

This produces the following output

TensorDict(
    fields={
        a: Tensor(torch.Size([2]), dtype=torch.int64),
        c: TensorDict(
            fields={
                a: Tensor(torch.Size([2]), dtype=torch.int64)},
            batch_size=torch.Size([]),
            device=None,
            is_shared=False)},
    batch_size=torch.Size([]),
    device=None,
    is_shared=False)

where the "c.b" key has disappeared.

Expected behavior

Both "c.b" and "c.a" should be there.

[Feature Request] tensorclass with non-tensor data

Motivation

@tensorclass is currently restricted to input types: Tensor, MemmapTensor, KeyedJaggedTensor, TensorDictBase or other tensorclass
Some users have expressed interest for a more generic type of content.

Solution

We could store any other data but restrict the tensor-behaviours to the tensor data.

API decisions

returning a new instance

Any operation that return a tensorclass would just copy the non-tensor data:

>>> @tensorclass
... class MyClass:
...     images: torch.Tensor
...     path: str
... 
>>> data = MyClass(torch.randn(2, 3, 64, 64), "path_to_data", batch_size=[2])
>>> print(data[0])
MyData(images=torch.Tensor(...), path="path_to_data", batch_size=[])

The same would do for .contiguous(), .clone(), .unbind(), etc
For stack or cat we would need to check that these attributes match. If not, an error is thrown.

### Indexing

In general, I think every non-tensor field should be treated in the same way, ie a list would not be indexed

>>> @tensorclass
... class MyClass:
...     images: torch.Tensor
...     captions: List[str]
... 
>>> data = MyClass(torch.randn(2, 3, 64, 64), [“a”, “b”], batch_size=[2])
>>> print(data[0])
MyData(images=torch.Tensor(...), captions=["a", "b"], batch_size=[])

If the number of captions is limited this could work

@tensorclass
class MyClass:
    images: torch.Tensor
    _captions: torch.Tensor
    caption_classes: List[str]

    @property 
    def captions(self):
        return [self.caption_classes[idx] for idx in self._captions]

data = MyClass(images=torch.randn(5, 3, 64, 64), _captions=torch.randint(2, (5,)), caption_classes=["a", "b"], batch_size=[5])

Otherwise, if the number of captions is much larger, it would be up to the user to implement indexing

@tensorclass
class MyClass:
    images: torch.Tensor
    captions: List[str]

    def __getitem__(self, item):
        c = super().__getitem__(item)
        c.captions = self.captions[item]
        return c

setitem

With tensorclass, we can do

>>> @tensorclass
... class MyClass:
...     images: torch.Tensor
>>> data = MyClass(torch.randn(3, 4, 5), batch_size=[3, 4])
>>> data[1] = MyClass(torch.randn(4, 5), batch_size=[4])

With a non-tensor field, I would think about solving it that way

>>> @tensorclass
... class MyClass:
...     images: torch.Tensor
...     meta_data: Optional[str] = None
>>> data = MyClass(torch.randn(3, 4, 5), "stuff", batch_size=[3, 4])
>>> data[1] = MyClass(torch.randn(4, 5), "stuff", batch_size=[4])  # match - no error
>>> data[1] = MyClass(torch.randn(4, 5), batch_size=[4])  # ignored - no error
>>> data[1] = MyClass(torch.randn(4, 5), "I am a pig, and as a pig, I have always stood out.", batch_size=[4])  # don't match - raise exception
>>> MyClass(torch.randn(3, 4, 5), batch_size=[3, 4])
>>> data[1] = MyClass(torch.randn(4, 5), "I am a pig, and as a pig, I have always stood out.", batch_size=[4])  # don't match - raise exception

Construction

One important question is also whether we want to allow any field to be a non-tensor, e.g.

>>> @tensorclass
... class MyClass:
...     images: torch.Tensor  # do we want to raise an exception if a string is given?
...     meta_data: Any  # will tensors be considered as tensors?
...     meta_data: Optional[str] = None  # do we want to raise an exception if a tensor is given?

cc @tcbegley @keyboardAnt

[Feature Request] Saving and loading memmap tensordicts / tensorclasses

Motivation

We can easily save tensordicts using torchsnapshot. However, as MemmapTensors store a file on disk, it would be fairly easy to save a tensordict that contains only MemmapTensors (is_memmap() returns True).
Ideally, the saved tensordict structure would follow the one of the original tensordict (+ metadata)

tensordict = TensorDict({"a": torch.randn(3, 4), {"b": torch.randn(3, 4)}}, [3, 4])
tensordict.memmap_()
save_memmap_tensordict(tensordict, "/path/to/save")

would result in

/path/to/save/metadata.pt
/path/to/save/a.memmap
/path/to/save/b/metadata.pt
/path/to/save/b/c.memmap

(we'd need metadata for each subtensordict since they may have a different device / batch_size).

Loading from such file would also be easy (and would not create a copy):

tensordict_loaded = load_from_memmap("/path/to/save/", mode="r+")
assert tensordict_loaded.is_memmap()
assert (tensordict_loaded == tensordict).all()

This should work for tensordict and tensorclass, but a little extra work may be needed for the latter.

TensorDict
tensorclass

@sreevasthav

[Feature Request] More arguments for `td.memmap_()`

Motivation

Since TensorDict.memmap_() creates a copy on physical mem of the content of a tensordict, it would be nice to have some more options:

MemmapTensors are temporary files. It could be nice to disable this when calling memmap_() (like temporary=False). In that way, one would know that the content of the memmap will be kept in memory after the process exits.
We should also be able to pass the path to where the files have to be saved (`path="/path/to/my_tensordict"). This would work nicely with #176
Similarly, it could be neat to be able to control if a memmap_ tensor can be written or not (by passing mode="r+" for instance). Of course, memmap_() would always write the content but then modify the permissions accordingly to make sure that the content is somewhat locked for in-place modifs.

[BUG] Unexpected shape change during pre-allocation

Describe the bug

The use case is to implement a lazily initialized rollout buffer for RL. However, when setting a nested TensorDict with finer-grained batch_size at index 0, the allocated buffer tensors get truncated batch_size while also changing the shape of the original TensorDict.

To Reproduce

import torch
from tensordict import TensorDict

buffer = TensorDict({}, batch_size=[500, 1024])

td_0 = TensorDict({
    "env.time": torch.rand(1024, 1),
    "agent.obs": TensorDict({ # assuming 3 agents in a multi-agent setting
        "image": torch.rand(1024, 3, 64),
        "state": torch.rand(1024, 3, 3, 32, 32)
    }, batch_size=[1024, 3])
}, batch_size=[1024])

td_1 = td_0.clone()
buffer[0] = td_0
buffer[1] = td_1

This would trigger

RuntimeError: indexed destination TensorDict batch size is torch.Size([1024]) (batch_size = torch.Size([500, 1024]), index=1), which differs from the source batch size torch.Size([1024, 3])

at buffer[1] = td_1. And the batch sizes of both buffer["b"] and td_0["b"] become [1024].

Expected behavior

The batch sizes of both buffer["b"] and td_0["b"] being [1024, 3], and buffer[1] = td_1 being successful.

System info

tensordict: 0.0.1c

Reason and Possible fixes

The cause looks to be around https://github.com/pytorch-labs/tensordict/blob/main/tensordict/tensordict.py#L1973 and https://github.com/pytorch-labs/tensordict/blob/main/tensordict/tensordict.py#L3393 where the target shape is unexpectedly considered to be incongruent:

# L1973, TensorDictBase.__setitem__
def __setitem__(
    self, index: INDEX_TYPING, value: Union[TensorDictBase, dict]
) -> None:
    ...
    if value.batch_size != indexed_bs:
        raise RuntimeError(
            f"indexed destination TensorDict batch size is {indexed_bs} "
            f"(batch_size = {self.batch_size}, index={index}), "
            f"which differs from the source batch size {value.batch_size}"
        )
    ...

#L3393, SubTensorDict.set
...
def set(
    self,
    key: NESTED_KEY,
    tensor: Union[dict, COMPATIBLE_TYPES],
    inplace: bool = False,
    _run_checks: bool = True,
) -> TensorDictBase:
    ...
    if isinstance(tensor, TensorDictBase) and tensor.batch_size != self.batch_size:
        tensor.batch_size = self.batch_size
    ...

A possible fix would be changing to if value.batch_size[: len(indexed_bs)] != indexed_bs and if isinstance(tensor, TensorDictBase) and tensor.batch_size[: len(self.batch_size)] != self.batch_size, respectively. All existing tests still pass after the modification.

It's really a minor problem and later I found torchrl.data.TensorDictReplayBuffer with LazyTensorStorage to be a working alternative. But it's also interesting to see they are implemented differently.

I'm glad to fix it if you would like.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

[Feature Request] key-level granularity in `skip_existing`

Currently skip_existing operates on all keys without any granularity.

This is a problem in RL when in a loss module for example you may want to skip existing "values" but you definitely never want to skip existing "memory" in a memory based model (RNN). AKA if you use skip_existing on memory keys you will never update your memory.

This is needed to support rnns in torch rl (issue pytorch/rl#1060)

We need a solution to make skip_existing more granular.

This is really simple and consists in feeding to the set_skip_existing funtion the keys we actually want to skip.

with set_skip_existing(["value", "value_target]):
     loss(td) # Will use existing values but not existing hidden memory

by default, if no keys are passed, the beahvior remains the same as the current set_skip_existing=True

[Feature Request] Array of dicts of tensors structure

Storing them in "columnar format" can be more compact in some circumstances (e.g. they are copy-on-write safe in multiprocessing context as they are stored as a very small number of tensors not depending on "dataset" size): https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57

[Feature Request] Better benchmarking

Motivation

Some benchmarks use timeit.timeit when repeat would give a better estimate of the time spec.
We should make that more consistent.

[Feature Request] clone instead of to_tensordict

Motivation

The behaviour of clone is rather poorly defined. We should use to_tensordict instead.
Some clone methods rely on copy, which is messed up for tensordict classes as it does keeps pointers to previous metadata.
Having a clone that only returns pure TensorDict would be cleaner.

[BUG] Error in `repr` of heterogeneous tensordicts

Describe the bug

An error is thrown in __repr__ of a tensordict when rolling out an env that returns heterogeneous lazy stacks.

To Reproduce

   from torchrl.envs.libs.vmas import VmasEnv
   print(VmasEnv("simple_crypto", num_envs=32).rollout(10, return_contiguous=False))

Traceback (most recent call last):
  File "/Users/Matteo/PycharmProjects/VectorizedMultiAgentSimulator/vmas/examples/torch_rl.py", line 59, in <module>
    print(env.rollout(10, return_contiguous=False))
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 1725, in __repr__
    fields = _td_fields(self)
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 5699, in _td_fields
    sorted([_make_repr(key, item, td) for key, item in td.items_meta()])
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 5699, in <listcomp>
    sorted([_make_repr(key, item, td) for key, item in td.items_meta()])
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 5691, in _make_repr
    return f"{key}: {repr(tensordict.get(key))}"
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 1725, in __repr__
    fields = _td_fields(self)
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 5699, in _td_fields
    sorted([_make_repr(key, item, td) for key, item in td.items_meta()])
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 5699, in <listcomp>
    sorted([_make_repr(key, item, td) for key, item in td.items_meta()])
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 859, in items_meta
    yield k, self._get_meta(k)
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 570, in _get_meta
    return self._dict_meta[key]
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/utils.py", line 267, in __missing__
    value = self.fun(key)
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/tensordict.py", line 4261, in _make_meta
    return torch.stack(
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/metatensor.py", line 327, in __torch_function__
    return META_HANDLED_FUNCTIONS[func](*args, **kwargs)
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/metatensor.py", line 496, in stack_meta
    return _stack_meta(
  File "/Users/Matteo/PycharmProjects/tensordict/tensordict/metatensor.py", line 464, in _stack_meta
    shape = list(shape)
TypeError: 'NoneType' object is not iterable

[BUG] Indexing discrepancies with numpy

Describe the bug

TensorDict indexing behavior differs from numpy. In most cases, the numpy behavior seems to make more sense.

Steps to reproduce and expected behavior

import numpy as np
from tensordict import TensorDict

nd = np.random.random((3,4))
nd_5d = np.random.random((5, 3, 4, 6, 7))
td = TensorDict({}, batch_size=(3, 4))
td_5d = TensorDict({}, batch_size=(5, 3, 4, 6, 7))

# None following ellipsis are used as the first dimension instead of the last
nd[1, ..., None].shape  # (4, 1)
td[1, ..., None].shape  # torch.Size([1, 4])

# Same issue
nd[None, 1, ..., None].shape  # (1, 4, 1)
td[None, 1, ..., None].shape  # RuntimeError: Not enough dimensions in TensorDict for index provided.

# Multiple None following ellipsis raise RuntimeError
nd[..., :2, None, None].shape  # (3, 2, 1, 1)
td[..., :2, None, None].shape  # RuntimeError: Not enough dimensions in TensorDict for index provided.

# Tuples within tuples are handled as lists by numpy, while we raise a NotImplementedError. 
# Already documented in https://github.com/pytorch-labs/tensordict/issues/99.
nd[(0, 1), (0, 1)].shape  # (2,)
td[(0, 1), (0, 1)].shape  # NotImplementedError: batch dim cannot be computed for type <class 'tuple'>. 

# Very specific edge case. 
nd_5d[2:, [[[0, 1]]], :3, 0].shape  # (1, 1, 2, 3, 3, 7)
td_5d[2:, [[[0, 1]]], :3, 0].shape  # torch.Size([3, 1, 1, 2, 3, 7])
# However, we get the same output when using a list index rather than an integer
nd_5d[2:, [[[0, 1]]], :3, [0]].shape  # (1, 1, 2, 3, 3, 7)
td_5d[2:, [[[0, 1]]], :3, [0]].shape  # torch.Size([1, 1, 2, 3, 3, 7])

System info

Describe the characteristic of your environment:

Describe how the library was installed (pip, source, ...)
Python version
Versions of any other relevant libraries

import tensordict, numpy, sys, torch
print(tensordict.__version__, numpy.__version__, sys.version, sys.platform, torch.__version__)

0.1.0+c03956a 1.24.2 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] darwin 1.13.1

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

[Feature Request] Implement `tensordict.pop(key, default)`

Motivation

dict.pop() is a handy feature that is missing from tensordict.

[Feature Request] Allow contributors to self-assign issues with GitHub workflow

Commenting #self-assign in any issue will let repository members to self-assign that issue to themselves.

👉 This reduces chores for repo-maintainer(s).

[BUG] Improvements on MemmapTensor

Describe the bug

We need to do some woodwork on MemmapTensor:

Remove the ownership and replace it by a check over the number of times the array is being opened (for temporary files only)
Solve a bug where calling MemmapTensor(memmap_tensor[idx]) returns the wrong view of the MemmapTensor #193
Improve MemmapTensor creation args, including: creating a Memmap on a specific location (#189)
Improve the way we copy a Memmap from one place to another, if we want and if we don't want to change the filename
simplify the logic of MemmapTensor so that extracting a tensor is less confusing (i.e. check as_tensor, _tensor, _load_item and how they interact).
the device attribute is confusing, as the data is never really on another device but gets loaded there if needed. The usage is that one can put the device attr as a destination, then move the Memmap from process to process and when accessing the data it'll be on the right device. However, this may give the false illusion that the content of the tensor is stored on the device...

cc @tcbegley

[Feature Request] Add a `conda` install option for `tensordict` on conda-forge channel

A conda installation option could be very helpful. I have already started working on this, to add tensordict to conda-forge.

Conda-forge PR:

conda-forge/staged-recipes#22307

Once the conda-forge PR is merged, you will be able to install the library with conda as follows:

conda install -c conda-forge tensordict

💡 I will push a PR to update the docs once the package is available on conda-forge.

[BUG] Conflicting keys silently collide in `tensordict.unflatten_keys`

Describe the bug

If two keys have the same resulting index in flatten_keys and unflatten_keys, they will be overwritten without an error being raised. This may lead to unexpected behaviours down the line.

To Reproduce

We would expect this code to fail:

from tensordict import TensorDict
import torch
t = TensorDict({'a.a': torch.randn(3), 'a': {'a': torch.randn(3)}}, [])
print(t.unflatten_keys('.'))

as ("a", "a") already exists.
Similarly,

from tensordict import TensorDict
import torch
t = TensorDict({'a.a': torch.randn(3), 'a': {'a': torch.randn(3)}}, [])
print(t.flatten_keys('.'))

should also fail as "a.a" is also an existing key.

[Feature Request] TensorDict to support data classes

Motivation

Keeping track of what keys go into the tensordict can be challenging across a large codebase. Especially given the tensordicts interior key mutability where different modules could return tensordicts with different key naming conventions.

Solution

Statically typed tensordict for a dataclass where every member of the data class must either be a tensordict or a PyTorch tensor.
Being able to create tensordicts with lists of named tuples/dataclasses.
Being able to index into the tensordict and get back the data class you created the tensordict with.

Alternatives

I haven't thought too much about alternatives.

Additional context

This is heavily inspired by Jax's data class style approach for functional programming. It seems to me that at a high level, tensordicts are trying to achieve a similar thing to Jax's usage of pytrees except specifically with tensors.

Checklist

[ x] I have checked that there is no similar issue in the repo (required)

[BUG] Incorrect output batch_size with functorch.vmap

Describe the bug

When used with functorch.vmap, TensorDictModule does not give the correct batch_size when in_dims and out_dims has non-zero items.

To Reproduce

A minimal example to reproduce:

import torch
import torch.nn as nn
import functorch
from tensordict import TensorDict
from tensordict.nn import TensorDictModule, make_functional

a = TensorDict({
    "a": torch.rand(1024, 3, 64),
    "b": torch.rand(1024, 3, 32),
}, batch_size=[1024, 3])

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.a = nn.Linear(64, 128)
        self.b = nn.Linear(32, 64)
    
    def forward(self, a, b):
        return self.a(a), self.b(b)

model = TensorDictModule(Model(), in_keys=["a", "b"], out_keys=["out.a", "out.b"])

# option 1
fmodel, params = functorch.make_functional(model)
functorch.vmap(fmodel, in_dims=(None, 1), out_dims=1)(params, a)

# option 2
params = make_functional(model)
functorch.vmap(model, in_dims=(1, None), out_dims=1)(a, params)

# option 3
functorch.vmap(model, 1, 1)(a)

Expected behavior

The expected output batch_size is [1024, 3]. But all three options give a batch_size of [3, 1024] (however, the entries' shapes are correct):

TensorDict(
    fields={
        a: Tensor(torch.Size([1024, 3, 64]), dtype=torch.float32),
        b: Tensor(torch.Size([1024, 3, 32]), dtype=torch.float32),
        out.a: Tensor(torch.Size([1024, 3, 128]), dtype=torch.float32),
        out.b: Tensor(torch.Size([1024, 3, 64]), dtype=torch.float32)},
    batch_size=torch.Size([3, 1024]),
    device=None,
    is_shared=False)

System info

tensordict: 0.0.1c
torch: 1.13.0

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

ImportError: cannot import name 'TensorDict' from partially initialized module 'tensordict'

Describe the bug

You have a circular import somewhere

To Reproduce

from tensordict import TensorDict
import torch

 from tensordict import TensorDict
ImportError: cannot import name 'TensorDict' from partially initialized module 'tensordict' (most likely due to a circular import) (/home/ubuntu/francesco/playground-with-torchdata/tensordict.py)

Describe the characteristic of your environment:

I am using pip install tensordict-nightly

Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1026-aws-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 10.1.243
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 470.161.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.1
[pip3] torch==1.13.1
[pip3] torchdata==0.5.1
[pip3] torchvision==0.14.1
[conda] Could not collect

pytorch / tensordict Goto Github PK

tensordict's Introduction

More About PyTorch

A GPU-Ready Tensor Library

Dynamic Neural Networks: Tape-Based Autograd

Python First

Imperative Experiences

Fast and Lean

Extensions Without Pain

Installation

Binaries

NVIDIA Jetson Platforms

From Source

Prerequisites

NVIDIA CUDA Support

AMD ROCm Support

Intel GPU Support

Install Dependencies

Get the PyTorch Source

Install PyTorch

Adjust Build Options (Optional)

Docker Image

Using pre-built images

Building the image yourself

Building the Documentation

Previous Versions

Getting Started

Resources

Communication

Releases and Contributing

The Team

License

tensordict's People

Contributors

Stargazers

Watchers

Forkers

tensordict's Issues

Motivation

Solution

Motivation

Motivation

Describe the bug

To Reproduce

Expected behavior

System info

Checklist

Motivation

Describe the bug

To Reproduce

Describe the bug

Motivation

Motivation

Checklist

Motivation

Cons:

Motivation

Motivation

Motivation

Describe the bug

To Reproduce

Expected behavior

System info

Reason and Possible fixes

Checklist

Motivation

Missing Feature

Possible improvement

Motivation

Motivation

Simple benchmarking code

Case 1: store data as tensors, create TensorDicts on the run

Case 2: store data as TensorDicts and directly load them

Case 3: store TensorDict data as dictionaries and create TensorDicts on the run with collate_fn

Motivation

Solution

Motivation

Motivation

Motivation

Proposed API