laurentmazare / tch-rs Goto Github PK

Rust bindings for the C++ api of PyTorch.

License: Apache License 2.0

Rust 63.78% C++ 18.78% C 16.56% OCaml 0.76% CMake 0.01% Python 0.11% Makefile 0.01%

deep-learning machine-learning neural-network pytorch rust

tch-rs's Introduction

tch-rs

Rust bindings for the C++ api of PyTorch. The goal of the tch crate is to provide some thin wrappers around the C++ PyTorch api (a.k.a. libtorch). It aims at staying as close as possible to the original C++ api. More idiomatic rust bindings could then be developed on top of this. The documentation can be found on docs.rs.

changelog

The code generation part for the C api on top of libtorch comes from ocaml-torch.

Getting Started

This crate requires the C++ PyTorch library (libtorch) in version v2.3.0 to be available on your system. You can either:

Use the system-wide libtorch installation (default).
Install libtorch manually and let the build script know about it via the LIBTORCH environment variable.
Use a Python PyTorch install, to do this set LIBTORCH_USE_PYTORCH=1.
When a system-wide libtorch can't be found and LIBTORCH is not set, the build script can download a pre-built binary version of libtorch by using the download-libtorch feature. By default a CPU version is used. The TORCH_CUDA_VERSION environment variable can be set to cu117 in order to get a pre-built binary using CUDA 11.7.

System-wide Libtorch

On linux platforms, the build script will look for a system-wide libtorch library in /usr/lib/libtorch.so.

Python PyTorch Install

If the LIBTORCH_USE_PYTORCH environment variable is set, the active python interpreter is called to retrieve information about the torch python package. This version is then linked against.

Libtorch Manual Install

Get libtorch from the PyTorch website download section and extract the content of the zip file.
For Linux and macOS users, add the following to your .bashrc or equivalent, where /path/to/libtorch is the path to the directory that was created when unzipping the file.

export LIBTORCH=/path/to/libtorch

The header files location can also be specified separately from the shared library via the following:

# LIBTORCH_INCLUDE must contain `include` directory.
export LIBTORCH_INCLUDE=/path/to/libtorch/
# LIBTORCH_LIB must contain `lib` directory.
export LIBTORCH_LIB=/path/to/libtorch/

For Windows users, assuming that X:\path\to\libtorch is the unzipped libtorch directory.
- Navigate to Control Panel -> View advanced system settings -> Environment variables.
- Create the LIBTORCH variable and set it to X:\path\to\libtorch.
- Append X:\path\to\libtorch\lib to the Path variable.
If you prefer to temporarily set environment variables, in PowerShell you can run

$Env:LIBTORCH = "X:\path\to\libtorch"
$Env:Path += ";X:\path\to\libtorch\lib"

You should now be able to run some examples, e.g. cargo run --example basics.

Windows Specific Notes

As per the pytorch docs the Windows debug and release builds are not ABI-compatible. This could lead to some segfaults if the incorrect version of libtorch is used.

It is recommended to use the MSVC Rust toolchain (e.g. by installing stable-x86_64-pc-windows-msvc via rustup) rather than a MinGW based one as PyTorch has compatibilities issues with MinGW.

Static Linking

When setting environment variable LIBTORCH_STATIC=1, libtorch is statically linked rather than using the dynamic libraries. The pre-compiled artifacts don't seem to include libtorch.a by default so this would have to be compiled manually, e.g. via the following:

git clone -b v2.3.0 --recurse-submodule https://github.com/pytorch/pytorch.git pytorch-static --depth 1
cd pytorch-static
USE_CUDA=OFF BUILD_SHARED_LIBS=OFF python setup.py build
# export LIBTORCH to point at the build directory in pytorch-static.

Examples

Basic Tensor Operations

This crate provides a tensor type which wraps PyTorch tensors. Here is a minimal example of how to perform some tensor operations.

use tch::Tensor;

fn main() {
    let t = Tensor::from_slice(&[3, 1, 4, 1, 5]);
    let t = t * 2;
    t.print();
}

Training a Model via Gradient Descent

PyTorch provides automatic differentiation for most tensor operations it supports. This is commonly used to train models using gradient descent. The optimization is performed over variables which are created via a nn::VarStore by defining their shapes and initializations.

In the example below my_module uses two variables x1 and x2 which initial values are 0. The forward pass applied to tensor xs returns xs * x1 + exp(xs) * x2.

Once the model has been generated, a nn::Sgd optimizer is created. Then on each step of the training loop:

The forward pass is applied to a mini-batch of data.
A loss is computed as the mean square error between the model output and the mini-batch ground truth.
Finally an optimization step is performed: gradients are computed and variables from the VarStore are modified accordingly.

use tch::nn::{Module, OptimizerConfig};
use tch::{kind, nn, Device, Tensor};

fn my_module(p: nn::Path, dim: i64) -> impl nn::Module {
    let x1 = p.zeros("x1", &[dim]);
    let x2 = p.zeros("x2", &[dim]);
    nn::func(move |xs| xs * &x1 + xs.exp() * &x2)
}

fn gradient_descent() {
    let vs = nn::VarStore::new(Device::Cpu);
    let my_module = my_module(vs.root(), 7);
    let mut opt = nn::Sgd::default().build(&vs, 1e-2).unwrap();
    for _idx in 1..50 {
        // Dummy mini-batches made of zeros.
        let xs = Tensor::zeros(&[7], kind::FLOAT_CPU);
        let ys = Tensor::zeros(&[7], kind::FLOAT_CPU);
        let loss = (my_module.forward(&xs) - ys).pow_tensor_scalar(2).sum(kind::Kind::Float);
        opt.backward_step(&loss);
    }
}

Writing a Simple Neural Network

The nn api can be used to create neural network architectures, e.g. the following code defines a simple model with one hidden layer and trains it on the MNIST dataset using the Adam optimizer.

use anyhow::Result;
use tch::{nn, nn::Module, nn::OptimizerConfig, Device};

const IMAGE_DIM: i64 = 784;
const HIDDEN_NODES: i64 = 128;
const LABELS: i64 = 10;

fn net(vs: &nn::Path) -> impl Module {
    nn::seq()
        .add(nn::linear(
            vs / "layer1",
            IMAGE_DIM,
            HIDDEN_NODES,
            Default::default(),
        ))
        .add_fn(|xs| xs.relu())
        .add(nn::linear(vs, HIDDEN_NODES, LABELS, Default::default()))
}

pub fn run() -> Result<()> {
    let m = tch::vision::mnist::load_dir("data")?;
    let vs = nn::VarStore::new(Device::Cpu);
    let net = net(&vs.root());
    let mut opt = nn::Adam::default().build(&vs, 1e-3)?;
    for epoch in 1..200 {
        let loss = net
            .forward(&m.train_images)
            .cross_entropy_for_logits(&m.train_labels);
        opt.backward_step(&loss);
        let test_accuracy = net
            .forward(&m.test_images)
            .accuracy_for_logits(&m.test_labels);
        println!(
            "epoch: {:4} train loss: {:8.5} test acc: {:5.2}%",
            epoch,
            f64::from(&loss),
            100. * f64::from(&test_accuracy),
        );
    }
    Ok(())
}

More details on the training loop can be found in the detailed tutorial.

Using some Pre-Trained Model

The pretrained-models example illustrates how to use some pre-trained computer vision model on an image. The weights - which have been extracted from the PyTorch implementation - can be downloaded here resnet18.ot and here resnet34.ot.

The example can then be run via the following command:

cargo run --example pretrained-models -- resnet18.ot tiger.jpg

This should print the top 5 imagenet categories for the image. The code for this example is pretty simple.

    // First the image is loaded and resized to 224x224.
    let image = imagenet::load_image_and_resize(image_file)?;

    // A variable store is created to hold the model parameters.
    let vs = tch::nn::VarStore::new(tch::Device::Cpu);

    // Then the model is built on this variable store, and the weights are loaded.
    let resnet18 = tch::vision::resnet::resnet18(vs.root(), imagenet::CLASS_COUNT);
    vs.load(weight_file)?;

    // Apply the forward pass of the model to get the logits and convert them
    // to probabilities via a softmax.
    let output = resnet18
        .forward_t(&image.unsqueeze(0), /*train=*/ false)
        .softmax(-1);

    // Finally print the top 5 categories and their associated probabilities.
    for (probability, class) in imagenet::top(&output, 5).iter() {
        println!("{:50} {:5.2}%", class, 100.0 * probability)
    }

Importing Pre-Trained Weights from PyTorch Using SafeTensors

safetensors is a new simple format by HuggingFace for storing tensors. It does not rely on Python's pickle module, and therefore the tensors are not bound to the specific classes and the exact directory structure used when the model is saved. It is also zero-copy, which means that reading the file will require no more memory than the original file.

For more information on safetensors, please check out https://github.com/huggingface/safetensors

Installing `safetensors`

You can install safetensors via the pip manager:

pip install safetensors

Exporting weights in PyTorch

import torchvision
from safetensors import torch as stt

model = torchvision.models.resnet18(pretrained=True)
stt.save_file(model.state_dict(), 'resnet18.safetensors')

Note: the filename of the export must be named with a .safetensors suffix for it to be properly decoded by tch.

Importing weights in `tch`

use anyhow::Result;
use tch::{
	Device,
	Kind,
	nn::VarStore,
	vision::{
		imagenet,
		resnet::resnet18,
	}
};

fn main() -> Result<()> {
	// Create the model and load the pre-trained weights
	let mut vs = VarStore::new(Device::cuda_if_available());
	let model = resnet18(&vs.root(), 1000);
	vs.load("resnet18.safetensors")?;
	
	// Load the image file and resize it to the usual imagenet dimension of 224x224.
	let image = imagenet::load_image_and_resize224("dog.jpg")?
		.to_device(vs.device());

	// Apply the forward pass of the model to get the logits
	let output = image
		.unsqueeze(0)
		.apply_t(&model, false)
		.softmax(-1, Kind::Float);
	
	// Print the top 5 categories for this image.
    for (probability, class) in imagenet::top(&output, 5).iter() {
        println!("{:50} {:5.2}%", class, 100.0 * probability)
    }
    
    Ok(())
}

Further examples include:

A simplified version of char-rnn illustrating character level language modeling using Recurrent Neural Networks.
Neural style transfer uses a pre-trained VGG-16 model to compose an image in the style of another image (pre-trained weights: vgg16.ot).
Some ResNet examples on CIFAR-10.
A tutorial showing how to deploy/run some Python trained models using TorchScript JIT.
Some Reinforcement Learning examples using the OpenAI Gym environment. This includes a policy gradient example as well as an A2C implementation that can run on Atari games.
A Transfer Learning Tutorial shows how to finetune a pre-trained ResNet model on a very small dataset.
A simplified version of GPT similar to minGPT.
A Stable Diffusion implementation following the lines of hugginface's diffusers library.

External material:

A tutorial showing how to use Torch to compute option prices and greeks.
tchrs-opencv-webcam-inference uses tch-rs and opencv to run inference on a webcam feed for some Python trained model based on mobilenet v3.

FAQ

What are the best practices for Python to Rust model translations?

See some details in this thread.

How to get this to work on a M1/M2 mac?

Check this issue.

Compilation is slow, torch-sys seems to be rebuilt every time cargo gets run.

See this issue, this could be caused by rust-analyzer not knowing about the proper environment variables like LIBTORCH and LD_LIBRARY_PATH.

Using Rust/tch code from Python.

It is possible to call Rust/tch code from Python via PyO3, tch-ext provides an example of such a Python extension.

Error loading shared libraries.

If you get an error about not finding some shared libraries when running the generated binaries (e.g. error while loading shared libraries: libtorch_cpu.so: cannot open shared object file: No such file or directory). You can try adding the following to your .bashrc where /path/to/libtorch is the path to your libtorch install.

# For Linux
export LD_LIBRARY_PATH=/path/to/libtorch/lib:$LD_LIBRARY_PATH
# For macOS
export DYLD_LIBRARY_PATH=/path/to/libtorch/lib:$DYLD_LIBRARY_PATH

License

tch-rs is distributed under the terms of both the MIT license and the Apache license (version 2.0), at your option.

See LICENSE-APACHE, LICENSE-MIT for more details.

tch-rs's People

Contributors

Stargazers

Watchers

Forkers

beingflo shareeff mwilliammyers cksac jbowles jerry73204 oldtree2008 infinite-joy9l infinite-joy pratyush taku-y r-wheeler green-s jean-airoldie atul9 safai-labs armenag vrminds another-s347 lyn-liyuan ajsyp jonboylecoding sd2k sailfish009 terjebra xiaoniu-578fa6bff964d005 robohouse-delft rozgo tiberiusferreira jamesoneill12 guillaume-be pkubik bytesnake danieldk andreytkachenko da505819 zeta1999 mbijon danielbank mbaneshi kindlychung papayura pbridger grtlr xuxue1 josrzn hscspring maxco2 mjs2600 epwalsh michaelburge nilslice spebern sundoge sbeckeriv rchavezj hoangpq moshg dhpollack awesomemachinelearning bumblepie toddwyl evgerher elaye fgsoap omerbenamram surdus nobles5e hilalisadev calmdown13 igordzreyev hakuyume vasanthakumarv songww stephenra f41gh7 akolishchak icodein chemicalheadsstudios ggoossen rmsc psilvestre adamnemecek isgasho rookboom liby99 edlanglois cyb0124 cruelu muhamedkamil arilou lostmsu aeklant alphastrata andrewrademacher thecog19 davidko3 metavai loongy ai-and-ml

tch-rs's Issues

Tensor::device() does not respect Tensor::to_device(...)

I ran into this issue when I'm working with multi-GPU training. Surprisingly, this test failed unexpectedly. The device() returns Cuda(1) after to_device(Cuda(0)). Suppose it an off-by-1 bug?

use tch::{Tensor, Device};

#[test]
fn test_device() {
    let x = Tensor::from(1).to_device(Device::Cuda(0));
    assert_eq!(x.device(), Device::Cuda(0)); // actually returns Device::Cuda(1)
}

convert CModule into Module

I have loaded a python saved model using the code

let model = tch::CModule::load(model_file)?;

how can i convert this to a model so that i can run the below code.

let train_images = tch::no_grad(|| image_dataset.train_images.apply_t(&model, false));

Currently the error is this

error[E0277]: the trait bound `tch::CModule: tch::nn::Module` is not satisfied
  --> src/main.rs:38:65
   |
38 |     let test_images = tch::no_grad(|| image_dataset.test_images.apply_t(&model, false));
   |                                                                 ^^^^^^^ the trait `tch::nn::Module` is not implemented for `tch::CModule`
   |
   = note: required because of the requirements on the impl of `tch::nn::ModuleT` for `tch::CModule`

Add some basic tests

Add some automated tests for tensor operations (including cat).
Also add some tests for gradient computation.

enforce fail at CPUAllocator.cpp:56

When trying to find the loss getting an interesting error. This is when running cross_entropy_for_logits on the labels.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err`
 value: TorchError { c_error: "[enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, 
nbytes) == 0. 12 vs 0\nframe #0: std::function<std::string ()>::operator()() const + 0x11 (0x7ffbfd74c441 
inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libc10.so)\nframe #1: 
c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 
(0x7ffbfd74c259 inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libc10.so)\nframe #2: c10::alloc_cpu(unsigned long) + 0x65e (0x7ffbfd73546e inn/target/debug
/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libc10.so)\nframe #3: <unknown function> + 
0x13dca (0x7ffbfd736dca inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libc10.so)\nframe #4: THStorage_resize + 0x76 (0x7ffbf40aa8b6 inn/target/debug/build/torch-
sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libcaffe2.so)\nframe #5: 
at::native::resize_cpu_(at::Tensor&, c10::ArrayRef<long>) + 0x38f (0x7ffbf3d014df inn/target/debug
/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libcaffe2.so)\nframe #6: <unknown function> 
+ 0xb2cb6e (0x7ffbf3e5ab6e inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libcaffe2.so)\nframe #7: at::Tensor::resize_(c10::ArrayRef<long>) + 0x4d (0x55777ea3df2b in 
target/debug/pytorch-image-classification)\nframe #8: <unknown function> + 0xb3d18c 
(0x7ffbf3e6b18c inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libcaffe2.so)\nframe #9: at::native::mm(at::Tensor const&, at::Tensor const&) + 0x65 
(0x7ffbf3c7a485 inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libcaffe2.so)\nframe #10: at::TypeDefault::mm(at::Tensor const&, at::Tensor const&) const + 0x5d 
(0x7ffbf4013a8d inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libcaffe2.so)\nframe #11: torch::autograd::VariableType::mm(at::Tensor const&, at::Tensor const&) 
const + 0x6ea (0x7ffbf27059fa inn/target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libtorch.so.1)\nframe #12: <unknown function> + 0x3238ea (0x7ffbf22488ea in target/debug/build
/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libtorch.so.1)\nframe #13: 
torch::autograd::generated::MmBackward::apply(std::vector<torch::autograd::Variable, 
std::allocator<torch::autograd::Variable> >&&) + 0x170 (0x7ffbf227b750 in target/debug/build/torch-
sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libtorch.so.1)\nframe #14: <unknown function> + 
0x30cd5a (0x7ffbf2231d5a in target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch
/lib/libtorch.so.1)\nframe #15: 
torch::autograd::Engine::evaluate_function(torch::autograd::FunctionTask&) + 0x385 (0x7ffbf222ae25 
in /target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libtorch.so.1)\nframe #16: 
torch::autograd::Engine::thread_main(torch::autograd::GraphTask*) + 0xc0 (0x7ffbf222ce20 in 
target/debug/build/torch-sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libtorch.so.1)\nframe #17: 
torch::autograd::Engine::thread_init(int) + 0x136 (0x7ffbf222a1f6 inn/target/debug/build/torch-
sys-8ab344225acfe8de/out/libtorch/libtorch/lib/libtorch.so.1)\nframe #18: <unknown function> + 
0xbd9e0 (0x7ffbfda229e0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)\nframe #19: <unknown function> 
+ 0x76db (0x7ffbf19016db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #20: clone + 0x3f 
(0x7ffbf141288f in /lib/x86_64-linux-gnu/libc.so.6)\n" }', src/libcore/result.rs:997:5

Add yolo as an example

The ocaml-torch bindings now have a yolo-v3 implementation. Port this to tch-rs as the two frameworks are quite similar.
https://github.com/LaurentMazare/ocaml-torch/blob/master/examples/yolo/yolo.ml

Possible UB in Tensor

Hi,

I'm not super familiar with the design of the library, but I've been trying to use it in a project of mine. In doing so I found that Tensor::f_copy mutates a Tensor that's passed as a immutable reference, which is UB in Rust. I think one of the arguments should be changed to &mut to fix this. Sorry if I've missed some context.

Thanks!

Safety of {int64,double}_value

The Tensor methods double_value() and int64_value() can possibly segfault if idx parameter is nonsense. For example:

let t: Tensor = (0.0).into(); // t is a scalar tensor
t.double_value(&[0]); // segfault!

The code segfaults due to irrelevant dimension index. The idx should be checked beforehand, or double_value() should be marked unsafe. It would help that we can identify errors by message rather than helpless segfaults.

Request for torch.distributions primitives

It seems current tch-rs lacks distribution primitives like torch.distributions and tf.distributions. Though it already have probability generators, there's still rooms for probability arithmetic and deductions.

I see a promising crate rv that may fit my needs. However, its interface is not general enough. We cannot pass a mean/std tensor to rv's Gaussion new() function.

Porting torch.distributions may solve the problem. It requires some patience to make it work, or I could go ahead to improve rv. I'd like to know if author has any plan to port the module, or leaves it to other crates.

If you're asking which place needs this feature, I'm implementing ELBO loss for GQN (paper).

Incremental compilation of the C code

It seems that the C wrappers get recompiled on pretty much any change. Using ccache makes it a bit better but we should try to fix this as it makes the development loop a bit slow.
A possibility could be to have a subcrate torch-sys which contains the C wrappers so that the C code only gets recompiled when parts of this new crate are modified.

Variable re-using on VarStore

It's mostly about my thoughts on VarStore.

First it's about the way current impl handles duplicated names. As you see this line, it puts a HashMap size suffix when collision. However, it leads to inconsistent namings depending the order you add the tensors.

Suppose we want to add three tensors named foo, and other three named bar.
If we start from foo tensors and then bar, it results in

foo, foo__1, foo__2, boo, boo__4, boo__5

If we add bar tensors first instead, we have

bar, bar__1, bar__2, foo, foo__4, foo__5

I'm not sure it's expected or not. It's fine in most cases. I observed this when I wrote tensor logging. Imagine you're developing a model and change the order of model building functions. It could break the log reader. We can take TensorFlow's naming as reference. At least, we can make it order independent.

Another scenario is variable reusing. The current impl always append a new tensor whenever you call methods on Path. Also, I see no way to obtain saved tensors from VarStore. That's inconvenient if we want to build two models on shared variables. We may provide an additional "try_reuse" parameter for this.

Decouple code generated from the implementation

It seems that changing something like Device implementation that is used in c_wrapper_generated.rs cannot be made automatic due to none existent code generation setup in the crate, so it's a greater barrier for further developments and improvements.

Could you remove the generated codes from the crate and make them available only when the crate is built as artifacts? (This is possible with torch-sys though)

Obsolete example in README.md

Hi I'm new to tch, and the Rust implementation is amazing. I took some time to write tch code, and found the example in "Writing a Simple Neural Network" section cannot compile. Simply the argument list of nn::linear is wrong. It expects additional LinearConfig type parameter, and first argument should have nn::Path type instead of nn::VarStore.

Another mild suggestion is that please make the example complete. At least include the extern crate tch; and a very simple main function. That would be a great help for newbies.

Building the basic example fails

First of all, thank you for the binding initiations. Rust needs to have it eventually :)

Running cargo run --example basics fails using stable(1.0) cpu libtorch

--- stderr
CMake Error at /home/ehsan/LibTorch/libtorch/share/cmake/Caffe2/public/utils.cmake:17 (add_dependencies):
  add_dependencies Cannot add target-level dependencies to INTERFACE library
  target "caffe2_library".

Call Stack (most recent call first):
  /home/ehsan/LibTorch/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:121 (caffe2_interface_library)
  /home/ehsan/LibTorch/libtorch/share/cmake/Torch/TorchConfig.cmake:39 (find_package)
  CMakeLists.txt:4 (find_package)

I think, we should consider adding a CI.

macos support

This may work out of the box but we should probably test this, fix if necessary, and add some specific CI integration if this starts becoming important.

trait for 'custom function' in tch-rs ?

I'm trying to translate https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html#sphx-glr-beginner-examples-autograd-two-layer-net-custom-function-py

In particular, this example reimplements MyReLU by providing it's own implementation for forward/backward passes.

class MyReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
    @staticmethod
    def backward(ctx, grad_output):v

Now, we need to find the tch-rs trait that corresponds to torch.autograd.Function .

However, when I grep for "pub trait", I get:

grep "pub trait" . -R
./wrappers/kind.rs:pub trait T {
./nn/optimizer.rs:pub trait OptimizerConfig
./nn/rnn.rs:pub trait RNN {
./nn/module.rs:pub trait Module: std::fmt::Debug {
./nn/module.rs:pub trait ModuleT: std::fmt::Debug {
./tensor/index.rs:pub trait IndexOp<T> {

Question: how do we implement custom functions (with it's own backward/forward) in tch-rs ?

Take very long time to build torch-sys with CUDA

I build the my package with tch dep and pass TORCH_CUDA_VERSION env. I observe that it stucks at compiling torch-sys for more than 10 minutes.

env TORCH_CUDA_VERSION=9.0 cargo build

I made a quick diagnosis by lsof and strace. The build script took very long time to unzip the libtorch zip file. The I/O rate only peaks at 5 MiB/s. The build takes place on an SSD disk so disk I/O should not be bottleneck. In contrast, the command unzip v1.1.0.zip completes in seconds. May we consider replace the zip crate with zip-sys, or faster implementations?

Suggesting a roadmap for v0.1

Hi Laurent

First of all, I wanted to thank you again for making this happen. Given the pace of the developments and I would love to see an amazing NN crate for Rust, below are my suggestions for v0.1 release.

Since you've put a lot of efforts so far and I guess functionality-wise you want to make this crate mimic your other similar projects, please let us know of any other plans to be on the same page.

multiply a scalar with a tensor

I have a tensor T and I would like to make changes in the tensor. One way I can do that is by multiplying the tensor with a scalar. Is there a way to do that?

Add a translation example

This example could be based on the following Python tutorial, e.g. by using an implementation close to the ocaml one.
Once this is done, it may be worth writing some tutorial around the example.

Request a min/max method accepting a scalar type

The current max1(&self, other: &Tensor) requires a Tensor argument. It leads to some inconvenience on scalar types. It gets worse if the input tensor varies in devices. To make sure max runs without errors, I have to write:

let max_val = Tensor::from(6_f32).to_device(xs.device());
xs.max1(&max_val)

I rather expect a method allowing arbitrary scalar like maxN<S: Into<Scalar>>(&self, other: S). In fact, I discovered #62 issue when I'm trying to implement this feature.

MIT license

It's customary for Rust crates to include both MIT and Apache v2.0. Would it be possible to have MIT license so that a user can choose one depending on their use cases?

Add a continuous integration

As different contributors are showing interests, having a CI becomes critical.

Memory allocation best practice

Here is my scenario: I'm working on a huge custom data set. The iterator code loads images sequentially and convert to tensor by Tensor::of_slice(). However, I observed obvious memory growing and got OOM eventually.

The code can be as simple as:

for array in data_source {
    let tensor = Tensor::of_slice(array.as_slice()).to_device(gpu_device);
    let output = model.forward_t(&tensor);
}

Although I comment out my model code and leave of_slice() alone, the memory still grows even the tensor goes out of scope. Hence, I got some wondering:

What's is canonical way to feed custom data set?
When and which action triggers the deallocation of tensor? It is done automatically or manually?
Does the second forward_t() double the memory?

Enforce `cargo fmt` using Travis CI

We could use TravisCI to ensure that all commit comply with the formatting rules in rustfmt.toml. This is just a matter of adding the following lines to .travis.yml:

before_script:
  - rustup component add rustfmt

script:
- cargo fmt -- --check

However, currently there are also some warnings when running cargo fmt (some code is not formatted accordingly and it seems like some nightly features are used).

Support CUDA in torch-sys build script?

What do you think about supporting CUDA, either via an environment variable TORCH_USE_CUDA, TORCH_CUDA_VERSION, TORCH_DEVICE or similar or having it automatically detect and use CUDA when available in torch-sys's build.rs ?

I would be happy to open a PR for it...

Saving and loading VarStores of Sequential models

I am having some trouble saving and loading VarStores of Sequential models. I have adapted the mnist_nn example in the following way:

extern crate tch;
use tch::{nn, nn::Module, nn::OptimizerConfig, Device};

const IMAGE_DIM: i64 = 784;
const HIDDEN_NODES: i64 = 128;
const LABELS: i64 = 10;

fn net(vs: &nn::Path) -> impl Module {
    nn::seq()
        .add(nn::linear(
            vs / "layer1",
            IMAGE_DIM,
            HIDDEN_NODES,
            Default::default(),
        ))
        .add_fn(|xs| xs.relu())
        .add(nn::linear(vs, HIDDEN_NODES, LABELS, Default::default()))
}

pub fn train(vs: &mut nn::VarStore) -> failure::Fallible<()> {
    let m = tch::vision::mnist::load_dir("data")?;
    let net = net(&vs.root());
    let opt = nn::Adam::default().build(&vs, 1e-3)?;
    for epoch in 1..200 {
        let loss = net
            .forward(&m.train_images)
            .cross_entropy_for_logits(&m.train_labels);
        opt.backward_step(&loss);
        let test_accuracy = net
            .forward(&m.test_images)
            .accuracy_for_logits(&m.test_labels);
        println!(
            "epoch: {:4} train loss: {:8.5} test acc: {:5.2}%",
            epoch,
            f64::from(&loss),
            100. * f64::from(&test_accuracy),
        );
    }
    Ok(())
}

fn main() -> failure::Fallible<()> {
    let args: Vec<String> = std::env::args().collect();
    let mut vs = nn::VarStore::new(Device::Cpu);
    if args.len() < 2 {
      train(&mut vs)?;
      vs.save("weights.pt")?;
    } else {
      vs.load(args[1].as_str())?;
    }

    println!("{:#?}", vs.root());
    Ok(())
}

For cargo run the output is:

[...]
epoch:  197 train loss:  0.19554 test acc: 94.28%
epoch:  198 train loss:  0.19488 test acc: 94.32%
epoch:  199 train loss:  0.19422 test acc: 94.32%
Path {
    path: [],
    var_store: VarStore {
        variables: Mutex {
            data: {
                "weight": Variable {
                    tensor: Tensor[[10, 128], Float],
                    trainable: true,
                },
                "bias": Variable {
                    tensor: Tensor[[10], Float],
                    trainable: true,
                },
                "layer1|bias": Variable {
                    tensor: Tensor[[128], Float],
                    trainable: true,
                },
                "layer1|weight": Variable {
                    tensor: Tensor[[128, 784], Float],
                    trainable: true,
                },
            },
        },
        device: Cpu,
    },
}

However, for a subsequentcargo run weights.pt the output is just:

Path {
    path: [],
    var_store: VarStore {
        variables: Mutex {
            data: {},
        },
        device: Cpu,
    },
}

Shouldn't the output of both be the same?

Refer or integrate mobilenet-v3-rs into examples

It happens that I had to implement MobileNetV3 (repo link) for some purpose. I think it's small enough to fit into examples. It provides an example program and library interface. I have plan A and B to publish this work.

A: Integrate into the examples. It would be better to let users import the model in the code.
B: Keep it a standalone repo, and I will publish it to crates.io.

If plan A, we would need someone else to verify the correctness of the impl. So far I'm only certain that the precision reaches paper's claim in minutes.

purpose of '.add_fn(|x| x * 0.125)' in cifar example

At https://github.com/LaurentMazare/tch-rs/blob/master/examples/cifar/main.rs#L45 we have a line of code where we take a Tensor, then multiply every element by 0.125 .

What is the purpose of this line and why do we do this? (Scaling everything by 1/8th seems pointless). Is this some type of magical constant that causes the algorithm to converge faster? If so, where can I read up more on this?

Windows Support

Any objections to a PR adding windows support?

Shapes expect i64 instead of usize

Maybe this comes from generating the bindings automatically, but is there a use case for expecting &[i64] in Tensor::reshape and similar methods? Maybe it would make sense to change this to &[usize]? This change would also bring Tensor more in line with ndarray, which uses usize to represent shapes.

element_size_inbytes is not present anymore

While trying to build this from a manually downloaded torch library, I got the below error.

cargo:warning=In file included from libtch/torch_api.cpp:6:0:
cargo:warning=libtch/torch_api.cpp: In function ‘at::Tensor* at_tensor_of_data(void*, int64_t*, int, int, int)’:
cargo:warning=libtch/torch_api.cpp:48:48: error: ‘class at::DeprecatedTypeProperties’ has no member named ‘elementSizeInBytes’
cargo:warning=     if (element_size_in_bytes != tensor.type().elementSizeInBytes())
cargo:warning=                                                ^

Now I can see that this method element_size_in_bytes has been removed recently. pytorch/pytorch#17785, which might be the reason for this failure in the build process

Implement Tensor::chunk

tch-rs does not provide torch.chunk. Look like it's missing in f_* functions. The backend seems to be generated by automated means and I would like to follow the convention, so I don't know how to make proper patches for this. Can you implement it or tell me how to make contributions?

LSTM options

Hi Laurent!

In libtorch it is possible to provide options to an LSTM object, such as with_bias.

However, it is not clear how to do the same in tch: constructor for LSTM seems to have a config parameter, but the config type is generic and doesn't seem to support setting RNN-specific options.

Fix the reinforcement learning examples.

These examples have been broken when switching to PyTorch v1.1.0.

Allow multiple input on forward() function

The forward function only allows single tensor. It cannot fit the need of multiple input models. I looked PyTorch example and it allows to pass a list of tensors. For example, we can provide forward(&[&Tensor]) for this purpose.

Basic examples fail to run OSX: Library not loaded libmklml.dylib

Hi,

I have tried running the basic examples with: cargo run --example basics on MacOSX and here is the error message that I am getting:

dyld: Library not loaded: @rpath/libmklml.dylib
Referenced from: /Users/vegapit/dev/libtorch/lib/libcaffe2.dylib
Reason: image not found
Abort trap: 6

It seems to be looking for a library that is not available. I downloaded the libtorch 1.1.0 zip file at the following link:

libtorch 1.1.0 for MacOSX

Have I misunderstood the installation process?

Replace ptr_option with unwrap_or

In the ops generated code, ptr_option is used to get a C pointer from an option or null. Rather than pattern matching use unwrap_or.

Convenient indexing methods

I'm wondering if a convenient slicing function that automatically select(), narrow(), masked_index() or index_select() tensors. Just like that in PyTorch. For the sake of limitations of Index and IndexMut, we could name a polymorphic method tensor.i(), which impl depends on input type. This snipplet illustrates the idea.

trait TensorIndex<T> {
    fn i(&self, index: T) -> Tensor;
}

impl TensorIndex<Range> for Tensor {...}

I looked into how PyTorch handles slice indexes of distinct types, and summarize them into these categories

type	impl
tuple of {integer, range, list of {integer, range}}	Each tuple component corresponds to one dimension. For example, `tensor[0, :2, [1, 3, 5]]` results in selecting 0th row on first dim, up to 2nd row on second dim, and `index_select()` on third dim.
integer or range	I treat is as degenerate case of above.
tensor	basically `masked_index()`

I think Rust is capable of providing above semantics. However, unlike Python, we cannot have mixed typed slices. We need to play with macros to cope with explosive combinations of mixed-type tuples. So I leave the thought here and seek if anyone knows the best way.

creating first tensor takes 4 seconds

Consider the following code:

extern crate tch;
use tch::{Cuda, Tensor};

pub fn main() {
    println!("cuda: {:?}", Cuda::is_available());

    let opts = (tch::Kind::Float, tch::Device::Cuda(1));

    let start = std::time::Instant::now();
    let x_empty = Tensor::empty(&[5, 3], opts);
    let mid = std::time::Instant::now();

    let x_rand = Tensor::rand(&[5, 3], opts);
    let x_zeros = Tensor::zeros(&[5, 3], opts);
    let t = Tensor::of_slice(&[5, 3]);

    let end = std::time::Instant::now();

    println!("time to create 1st tensor: {:?}", mid - start);
    println!("time to create next 3 tensor: {:?}", end - mid);

    println!("start: {:?}", start);
    println!("mid: {:?}", mid);
    println!("end: {:?}", end);
}

I get results of:

cuda: true
time to create 1st tensor: 4.124049426s
time to create next 3 tensor: 907.468µs
start: Instant { tv_sec: 28481, tv_nsec: 825629454 }
mid: Instant { tv_sec: 28485, tv_nsec: 949678880 }
end: Instant { tv_sec: 28485, tv_nsec: 950586348 }

Clearly I am doing something wrong, as it should not take 4 seconds to initialize CUDA. What am I doing wrong?

tensor issues when loading images to dataset.

When going through the code found that imagenet has a handy function for loading images from a directory provided they are in a good structure.

use tch::vision::imagenet::load_from_dir;

I have a code in the below format.

let image_dataset = load_from_dir(DATASET_FOLDER).unwrap();

where the dataset folder is in this format

dataset
├── train
│   ├── accordion
│   │   ├── image_0001.jpg
│   │   ├── image_0002.jpg
│   ├── airplanes
│   │   ├── image_0001.jpg
│       └── image_0060.jpg
	...
└── val
    ├── accordion
    │   └── image_0036.jpg
    ├── airplanes
    │   └── image_0685.jpg
	...

204 directories, 9060 files

Running this code gets me the below error

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TorchError { c_error: "invalid argument 0: Sizes of tensors must match except in dimension 0. Got 225 and 224 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:711" }

Not sure if this is a mistake from my end or a bug.

Significant of struct repr

Could example the significance of using the following pattern over ffi in your code vs. empty struct despite having the same size and alignment?

#[repr(C)]
pub struct MyStruct {
    _private: [u8; 0],
}

vs.

#[repr(C)]
pub struct MyStruct;

Use meaningful reduction parameter in Tensor API

Several Tensor methods, for example mse_loss(), has i64 typed reduction parameter. It's less intuitive for users is suggested to replace it with existing enum.

It seems this patch requires a patch on auto-generated source. Suppose we can separate the sys API and wrapper api? For now I find the mapping in this Torch source.

whats the difference between the ot files and the pth files

in this example for loading pretrained files, .ot files are used while the general pytorch convention is to save in .pth files. Whats the difference between the two?

Copy VarStore over devices

I raised this question for multi-GPU. In my scenario, I would run forward step on multiple GPU devices. Then, after computing gradient on one device, I would like to copy the model parameters over.

I seems there's no easy way to copy the model parameters. VarStore::trainable_variables() returns vec of tensors does not help much. Also, I did not see demanding method of tch::nn::Path to iterate over known variables.

So far the only solution I came up with is to save the VarStore to a file, and load them back into multiple devices. However, it implies overhead. Does anyone have suggestion for this?

Semantic padding parameter for tch::nn::ConvConfigND

I raised this question from my experience in Keras/TensorFlow. Keras allows users to specify 'same'/'valid' padding besides integers. That's a great convenience. I see that PyTorch does not provide this feature, and thus it's asked on their forum.

I would propose a solution that may get the best of two. We could provide a method set_padding() on tch::nn::ConvConfigND. Following Rust's convention, we could write:

let config = ConvConfig { /* ... */ }.set_padding("same");

We can let set_padding() to accept an enum over str or int. Likely bias, strides and other fields may apply for this change.

translating 'grad_h[h<0] = 0'

I am trying to translate https://pytorch.org/tutorials/beginner/examples_tensor/two_layer_net_tensor.html#sphx-glr-beginner-examples-tensor-two-layer-net-tensor-py to tch-rs.

It's not clear to me how to translate the following line:

grad_h[h<0] = 0

My understanding is that h<0 is supposed to return a vec of bool, which we use as a mask for which elements of grad_h to modify.

It's not clear to me how to do this in a 'parallel' (i.e. not a single for loop) in tch-rs.

Interop with ndarray

Continue discussion from #51.

Interoperability with the ndarray crate would be very nice. As a first step I think we should be able to convert to/from ndarray using std::convert::Into. I'm thinking something along these lines:

use ndarray::{Array, ArrayD, Dimension, IxDyn};

pub fn to_tensor<D>(arr: Array<f64, D>) -> tch::Tensor
where
    D: Dimension,
{
    let tn = tch::Tensor::of_slice(arr.as_slice().unwrap());
    let shape: Vec<i64> = arr.shape().iter().map(|s| *s as i64).collect();
    tn.reshape(&shape)
}

pub fn from_tensor(tensor: &tch::Tensor) -> ArrayD<f64> {
    let v: Vec<f64> = tensor.into();
    let shape: Vec<usize> = tensor.size().iter().map(|s| *s as usize).collect();
    ArrayD::from_shape_vec(IxDyn(&shape), v).unwrap()
}

Segfault on functions that take as input a list of tensors.

The generated code for functions that use a list of tensors, e.g. cat, use the ptr_list function which creates a vec and converts it to a pointer via as_ptr. However the vec is deallocated at the end of ptr_list resulting in a dangling pointer.
The corresponding rust type should also be &[&Tensor] rather than &[Tensor].

cargo run --example reinforcement-learning --features=python a2c FAILS

cargo run --example reinforcement-learning --features=python pg

cargo run --example reinforcement-learning --features=python pg
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/reinforcement-learning pg`
action space: 2
observation space: [4]
epoch: 0   episodes: 242   avg reward per episode: 20.73
epoch: 1   episodes: 202   avg reward per episode: 24.76
epoch: 2   episodes: 202   avg reward per episode: 24.77
epoch: 3   episodes: 181   avg reward per episode: 27.86
epoch: 4   episodes: 163   avg reward per episode: 30.69

cargo run --example reinforcement-learning --features=python a2c

cargo run --example reinforcement-learning --features=python a2c
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/examples/reinforcement-learning a2c`
action space: 6
observation space: [1, 1, 84, 84]
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "examples/reinforcement-learning/atari_wrappers.py", line 245, in worker
    ob = env.reset()
  File "/home/x/.local/lib/python3.7/site-packages/gym/core.py", line 257, in reset
    observation = self.env.reset(**kwargs)
  File "/home/x/.local/lib/python3.7/site-packages/gym/core.py", line 271, in reset
    return self.env.reset(**kwargs)
  File "/home/x/.local/lib/python3.7/site-packages/gym/core.py", line 258, in reset
    return self.observation(observation)
  File "/home/x/.local/lib/python3.7/site-packages/gym/core.py", line 266, in observation
    raise NotImplementedError
NotImplementedError

(This only shows one thread, as I change the 16 to 1).

it appears to work running directly from Python:

cat test.py ; echo "==="; python3 test.py 
import gym


env = gym.make('SpaceInvaders-v0')
env.reset()
actions = env.action_space.n
===

Question: what am I doing wrong & how do we fix this?

How does gen work?

Do you mind explaining how tch-rs/gen work? and when does it kick in?

Why not binding against c-api in $LIBTORCH/include/torch/csrc/api/include/torch/ directly?

Thanks