torch / cutorch Goto Github PK

View Code? Open in Web Editor NEW

337.0 337.0 211.0 2.06 MB

A CUDA backend for Torch7

License: Other

CMake 1.91% Cuda 53.18% C 20.78% Lua 21.64% Objective-C 0.14% Shell 0.06% C++ 2.28%

cutorch's Introduction

THIS REPOSITORY IS DEPRECEATED.

Please use https://github.com/torch/torch7

For install scripts, please look at: https://github.com/torch/ezinstall

Torch7 Library.

Torch7 provides a Matlab-like environment for state-of-the-art machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to an easy and fast scripting language (Lua) and a underlying C implementation.

In order to install Torch7 you can follow these simple instructions, but we suggest reading the detailed manual at http://www.torch.ch/manual/install/index

Requirements

C/C++ compiler
cmake
gnuplot
git

Optional

Readline
QT (QT4.8 is now supported)
CBLAS
LAPACK

Installation

$ git clone git://github.com/andresy/torch.git
$ cd torch
$ mkdir build
$ cd build

$ cmake .. 
OR
$ cmake .. -DCMAKE_INSTALL_PREFIX=/my/install/path

$make install

Running

$torch
Type help() for more info
Torch 7.0  Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Lua 5.1  Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7>

3rd Party Packages

Torch7 comes with a package manager based on Luarocks. With it it's easy to install new packages:

$ torch-rocks install image
$ torch-rocks list
$ torch-rocks search --all

Documentation

The full documentation is installed in /my/install/path/share/torch/html/index.html

Also, http://www.torch.ch/manual/index points to the latest documentation of Torch7.

cutorch's People

Contributors

Stargazers

Watchers

Forkers

sigmike paidi akfidjeland ajtulloch ioannisantonoglou nicholas-leonard steveorsomethin hycis sergomezcol sgalawneh oliver-batchelor dominikgrewe jonathantompson narayana1208 clementfarabet bpiwowar uikit0 pflaquerre fmassa flybass stencilman rotmanmi willwilliams ezhangle soumith kashif chagge adamlerer leotam moodstocks colesbury noa subramanyam86 cc272309126 zakattacktwitter stachenfeld d11 mys007 hughperkins diz-vara samehkhamis sxq2004123 linusu 0wu gopal-m zhen-hao wickedfoo alband zhj12388 harshavardhanp milestonesvn pold87 kingofoz caomw vefaliahmet jacobmenick ibcn-cloudlet varunnagpaal jedy1986 galv asappinc egonina cvml meir770 georgostrovski borisfom davidsaxton guillembelloc misko huntpig ngimel ajabri ebetica joostvdoorn patiencett apaszke caldweln qucheng fagg lukealonso houda-lam bshillingford htwaijry alululululululu josejuanmijares grgvineet renbozqin peratham genekogan elikosan bartvm marsbzp zomeelee brainpicture gfl699468 jiapei100 mdaiter deeplearningsprint comenix colastaralex

cutorch's Issues

Tegra K1

Hi guys, we are trying to run some Torch code on the Tegra K1 board:
http://www.nvidia.com/object/tegra-k1-processor.html
We are running into problems with cunn allocating too many resources, possibly because it is targeted to larger GPUs. The Tegra only has 192 CUDA cores.

error in SpatialMaxSampling.updateOutput: too many resourced requested for launch

I was wondering if you can give me some guidance on how to lower the resources.

Note: SpatialConvolutionMM works, possibly because BLAS is adjusted to the GPU specifications, but SpatialMaxPooling (not SpatialMaxPoolingCUDA) gives the error.

Possibly there needs to be a check for GPU resources in that function.

Exposing the state of the random number generator

In torch it's possible to get and set the state of the random number generator, which is very useful when experiments can be interrupted and we want to be able to restart in exactly the same state.
The RNG in cutorch doesn't provide this facility, because the cuRAND host API only exposes curandGenerator_t which is a pointer to an opaque type.
The device API on the other hand does expose its state (see curand_mtgp32.h). Could we re-write the cutorch RNG to use the device API and then expose its state? The only potential problem I can see is that the structs storing the state may change from one version of CUDA to the next. I haven't checked if that's the case for the last few versions.

Or is there a better way to save and restore the RNG state?

getDeviceProperties(gpuid).freeGlobalMem doesn't take the GPU id into account

The field freeGlobalMem in function getDeviceProperties calls cudaMemGetInfo, which outputs the available memory in the current device. Thus, the gpuid is not taken into account in this field, unless one manually calls cutorch.setDevice(gpuid) before (or putting it inside getDeviceProperties and then restoring to the previous gpu after this call). Is there a better way to solve this problem, without needing to set the device in each call to getDeviceProperties ?

addmm test fails with CUDA 7

Tested on OS X and Ubuntu with latest cutorch and CUDA 7 driver 346.6, another machine which is on CUDA 6.5 and 340.29 is fine

addmm
 Function call failed
/usr/local/share/lua/5.1/cutorch/test.lua:112: /opt/rocks/cutorch/lib/THC/THCBlas.cu(249) : cublas runtime error : the GPU program failed to execute
stack traceback:
    [C]: at 0x123b2bb0
    /usr/local/share/lua/5.1/cutorch/test.lua:112: in function 'compareFloatAndCudaTensorArgs'
    /usr/local/share/lua/5.1/cutorch/test.lua:976: in function </usr/local/share/lua/5.1/cutorch/test.lua:960>
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /usr/local/share/lua/5.1/torch/Tester.lua:172: in function 'run'
    /usr/local/share/lua/5.1/cutorch/test.lua:1555: in function 'f'
    [string "local f = function() return cutorch.test() en..."]:1: in main chunk
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/trepl/init.lua:608: in function 'repl'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x010f502400

isTensor

check that isTensor works for cuda tensors
torch/torch7#84

+ - / * operators not working

- / * operators dont work with cuda tensors!!!

cublas_v2 handles (and torch contexts)

Can we find a way to make use of cublas_v2 handles (and carry these around)?

I propose we add a global singleton to the torch package. Making it global means that we don't need to require the user to carry it around. We can make it thread-safe by having the singleton encapsulate THContext structs (for multi-threaded environments, each thread would need to call torch.setContext(contextId). For the user, it would be similar to using cutorch.setDevice().

The singleton could be implemented using something like this:

#define MAXCONTEXT 100
THContext* getContext(contextId)
{
   static context contexts[MAXCONTEXT];
   context* c = contexts[contextId];
   if (c == NULL) {
      c = malloc(sizeof(THContext));
   }
  return context;
}

We could add a our cublas handle to the context and other such goodies.
This context should also be thread-safe (as long as each thread creates its
own context, or these are created beforehand).

I am not C expert, but the context could look something like this:

typedef struct THContext
{
  // torchstuff;
  // otherstuff;
  void *cutorchStuff;   // cutorch could allocate this when loaded
}

If we don't see any reason for including it in torch, we could limit it to cutorch.

missing API against TH

The following math functions are missing in THC but present in TH:

When these are implemented, cwrap entries can be added that would make cutorch completely API compatible with torch

New indexCopy not working with non-contiguous tensor

Simple example to demonstrate:

require 'cutorch'
function testCopy(tensor)
ones = tensor {1}:expand(3)
indices = torch.LongTensor {1, 3, 5}

copy = tensor (5):zero()
copy:indexCopy(1, indices, ones)

return copy
end

print( testCopy(torch.Tensor), testCopy(torch.CudaTensor) )

Output should be:
1
0
1
0
1
[torch.DoubleTensor of dimension 5]

But for CudaTensor it gives:

1.0000
0.0000
0.0023
0.0000
0.0833
[torch.CudaTensor of dimension 5]

(presumably the 0.0023 and 0.0833 are uninitialized memory)

Build error in TensorMath.lua: "attempt to call method 'registerDefaultArgument' (a nil value)"

Hi there,

I'm currently struggling to build cutorch on Ubuntu 14.04 (CUDA 6.5, GCC-4.8.2). I'm hitting a build error in TensorMath.lua:

/home/alistair/torch/install/bin/luajit: /tmp/luarocks_cutorch-scm-1-9849/cutorch/TensorMath.lua:184: attempt to call method 'registerDefaultArgument' (a nil value)
stack traceback:
        /tmp/luarocks_cutorch-scm-1-9849/cutorch/TensorMath.lua:184: in main chunk
        [C]: at 0x00406170
make[2]: *** [TensorMath.c] Error 1
make[1]: *** [CMakeFiles/cutorch.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

The full build log is below. Any thoughts on what might cause this?

Cheers,
Alistair

~$ luarocks install cutorch                                                                                                                          
Installing https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
Cloning into 'cutorch'...
remote: Counting objects: 52, done.
remote: Compressing objects: 100% (45/45), done.
remote: Total 52 (delta 4), reused 25 (delta 4)
Receiving objects: 100% (52/52), 62.89 KiB | 0 bytes/s, done.
Resolving deltas: 100% (4/4), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/alistair/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/alistair/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make

-- The C compiler identification is GNU 4.8.2
-- The CXX compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/alistair/torch/install
-- Found CUDA: /usr/local/cuda (found suitable version "6.5", minimum required is "5.5") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cutorch-scm-1-9849/cutorch/build
[  7%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o
CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorage.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorageCopy.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensor.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorCopy.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorRandom.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorMath.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorConv.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorLapack.h


Scanning dependencies of target THC
[ 15%] Building C object lib/THC/CMakeFiles/THC.dir/THCGeneral.c.o
[ 23%] Building C object lib/THC/CMakeFiles/THC.dir/THCStorage.c.o
[ 30%] Building C object lib/THC/CMakeFiles/THC.dir/THCStorageCopy.c.o
[ 38%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensor.c.o
[ 46%] Building C object lib/THC/CMakeFiles/THC.dir/THCTensorCopy.c.o
Linking CXX shared library libTHC.so
[ 46%] Built target THC
[ 53%] Generating TensorMath.c
/home/alistair/torch/install/bin/luajit: /tmp/luarocks_cutorch-scm-1-9849/cutorch/TensorMath.lua:184: attempt to call method 'registerDefaultArgument' (a nil value)
stack traceback:
        /tmp/luarocks_cutorch-scm-1-9849/cutorch/TensorMath.lua:184: in main chunk
        [C]: at 0x00406170
make[2]: *** [TensorMath.c] Error 1
make[1]: *** [CMakeFiles/cutorch.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

Multi-GPU support

MultiGPU support has been implemented in cutorch (and by extension all torch cuda libraries like cunn, cudnn etc.).

Switch the device on the fly with cutorch.setDevice(devID)
All cuda calls are asynchronous, and can be synchronized with cutorch.synchronize()

Example usage for tensors:

-- Let us do matrix addition for matrices sitting on two different GPUs
cutorch.setDevice(1)
matrix1 = torch.CudaTensor(10):fill(1)
print(matrix1) -- printing is a synchronous call, so you dont have to explicitly call cutorch.synchronize()
cutorch.setDevice(2)
matrix2 = torch.CudaTensor(10):fill(2)
print(matrix2) 
matrix2:add(matrix1) -- matrix1 is seamlessly copied onto GPU2 and added to matrix2
print(matrix2)

if you want to do data-parallel training of neural nets (including convnets), your training loop can run like this:

For each mini-batch:

1. load data (preferably using multiple threads, for example using [threads-ffi](https://github.com/torch/threads-ffi))
2. loop over GPUs (the loop below will be completely anynchronous, so will run parallely)
  2.1. model[gpuX]:forward
  2.2. criterion[gpuX]:forward
  2.3. criterion[gpuX]:backward
  2.4. model[gpuX]:backward
3. cutorch.synchronize()
4. accumulate GPUx's gradParameters to GPU1's gradParameters
5. do SGD on GPU1
6. copy back GPU1's parameters to GPUx
7. cutorch.synchronize() and print accuracy etc.

Loop back to 1 for next mini-batch

Also, to train ConvNets using multiple GPUs, I recommend using CuDNN for the convolution layers, as I've tested that they are completely asynchronous (meaning that the processing runs parallely on multiple GPUs)

Comments below describe the technical details of changes made. If you just want to use Multi-GPU, you can stop reading now.

CudaTensor:max() not compatible with FloatTensor:max()

FloatTensor / CudaTensor are not drop-in replacements for each other. So moving a model from the CPU to GPU can break supporting code.

Specifically:

FloatTensor:max(2) returns both the max element and its index.
CudaTensor:max(2) does not return the index.

This breaks my linear classifier, which needs both the max element and its index.

Workaround: manually cast the CudaTensor back to a FloatTensor.

Yosemite issue: malformed libraries

When I compile cutorch on OS X Yosemite 10.10, I get

malformed object (load command ## cmdsize is zero)

And code does not run:

/usr/local/bin/luajit: /usr/local/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libcutorch' from file '/usr/local/lib/lua/5.1/libcutorch.so':
    dlopen(/usr/local/lib/lua/5.1/libcutorch.so, 6): no suitable image found.  Did find:
    /usr/local/lib/lua/5.1/libcutorch.so: malformed mach-o image: load command #24 length (0) too small in /usr/local/lib/lua/5.1/libcutorch.so
stack traceback:
    [C]: in function 'a_loader'
    /usr/local/share/lua/5.1/luarocks/loader.lua:117: in function </usr/local/share/lua/5.1/luarocks/loader.lua:114>
    [C]: in function 'require'
    /usr/local/share/lua/5.1/cutorch/init.lua:2: in main chunk
    [C]: in function 'require'
    /usr/local/share/lua/5.1/cunn/init.lua:1: in main chunk
    [C]: in function 'require'
    ./src/buildModel.lua:125: in function 'buildModel'
    general-profiler.lua:28: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:129: in main chunk
    [C]: at 0x010479ebc0

torch.random and torch.randn dont work

when set default tensor type to CudaTensor, these two functions dont work

:csub does not exist

subtraction

Troublies with installation (with Cuda 6)

I am trying to install cutorch but I have the following error I am unable to fix.

Error log

elab@elab-GPU2 ~ $ luarocks install cutorch                                                                                             
Installing https://raw.github.com/torch/rocks/master/cutorch-scm-1.rockspec...                                                          
Using https://raw.github.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode                                     
Cloning into 'cutorch'...                                                                                                               
remote: Counting objects: 48, done.                                                                                                     
remote: Compressing objects: 100% (46/46), done.                                                                                        
remote: Total 48 (delta 1), reused 34 (delta 0)                                                                                         
Receiving objects: 100% (48/48), 59.48 KiB, done.                                                                                       
Resolving deltas: 100% (1/1), done.                                                                                                     
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/usr/local/bin/.." -DCMAKE_INSTALL_PREFIX="/usr/local/lib/luarocks/rocks/cutorch/scm-1" && make                                                                           

-- The C compiler identification is GNU                                                                                                 
-- The CXX compiler identification is GNU                                                                                               
-- Check for working C compiler: /usr/bin/gcc                                                                                           
-- Check for working C compiler: /usr/bin/gcc -- works                                                                                  
-- Detecting C compiler ABI info                                                                                                        
-- Detecting C compiler ABI info - done                                                                                                 
-- Check for working CXX compiler: /usr/bin/c++                                                                                         
-- Check for working CXX compiler: /usr/bin/c++ -- works                                                                                
-- Detecting CXX compiler ABI info                                                                                                      
-- Detecting CXX compiler ABI info - done                                                                                               
-- Found Torch7 in /usr/local                                                                                                           
-- Found CUDA: /usr/local/cuda-6.0 (Required is at least version "4.0")                                                                 
-- Configuring done                                                                                                                     
-- Generating done                                                                                                                      
-- Build files have been written to: /tmp/luarocks_cutorch-scm-1-29/cutorch/build                                                       
[  9%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o                                               
gcc: error trying to exec 'cc1plus': execvp: No such file or directory                                                                  
CMake Error at THC_generated_THC.cu.o.cmake:198 (message):                                                                              
  Error generating                                                                                                                      
  /tmp/luarocks_cutorch-scm-1-29/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o                                     


make[2]: *** [lib/THC/CMakeFiles/THC.dir/./THC_generated_THC.cu.o] Error 1                                                              
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2                                                                                   
make: *** [all] Error 2                                                                                                                 

Error: Build error: Failed building.

gcc

  elab@elab-GPU2 ~ $ gcc -v                                                                                                               
Using built-in specs.                                                                                                                   
COLLECT_GCC=gcc                                                                                                                         
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper                                                                       
Target: x86_64-linux-gnu                                                                                                                
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.1-2ubuntu1~12.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.
Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-i
d --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib 
--enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --ena
ble-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0
-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/
java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch 
--disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64 --with-tune=generic --enable-checking=release --build=x
86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu                                                                       
Thread model: posix                                                                                                                     
gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04)

nvcc

nvcc: NVIDIA (R) Cuda compiler driver       
Copyright (c) 2005-2013 NVIDIA Corporation  
Built on Thu_Mar_13_11:58:58_PDT_2014       
Cuda compilation tools, release 6.0, V6.0.1

CUDA 6 OS X 10.9 build

I had a hard time compiling cutorch on OS X 10.9 with CUDA 6. As it supports clang now, I had to disable some lines in CMakeLists.txt and add the following:
LIST(APPEND CUDA_NVCC_FLAGS "-Xcompiler -stdlib=libstdc++ -Xlinker -stdlib=libstdc++")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")

and also I copied FindCUDA.cmake files from the latest OpenCV. Overall I don't know what worked, it was random after calling luarocks make multiple times. The biggest problem was @rpath/libcudart.dylib appearing in the install and cmake telling something about MACOSX_RPATH not set.

I think someone could face the same problem in the future.

Half of cutorch tests fail with CUDA 7

This is bad, the latest streams PR doesn't work with CUDA 7.
Errors:
https://gist.github.com/szagoruyko/31d59abf06ee32a396d4

Malformed mach-o image error

I am still facing the build problem as described in https://groups.google.com/forum/#!topic/torch7/wZnFS5HYu8o on OS X 10.10. Cutorch installs without any complaints but attempting to use it spits the following error:

th> require 'cutorch'
...trivedigaurav/torch/install/share/lua/5.1/trepl/init.lua:354: ...trivedigaurav/torch/install/share/lua/5.1/trepl/init.lua:354: ...digaurav/torch/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libcutorch' from file '/Users/trivedigaurav/torch/install/lib/lua/5.1/libcutorch.so':
    dlopen(/Users/trivedigaurav/torch/install/lib/lua/5.1/libcutorch.so, 6): no suitable image found.  Did find:
    /Users/trivedigaurav/torch/install/lib/lua/5.1/libcutorch.so: malformed mach-o image: load command #20 length (0) too small in /Users/trivedigaurav/torch/install/lib/lua/5.1/libcutorch.so
stack traceback:
    [C]: in function 'error'
    ...trivedigaurav/torch/install/share/lua/5.1/trepl/init.lua:354: in function 'f'
    [string "local f = function() return require 'cutorch'..."]:1: in main chunk
    [C]: in function 'xpcall'
    ...trivedigaurav/torch/install/share/lua/5.1/trepl/init.lua:620: in function 'repl'
    ...urav/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x0106a4f780

torch.CudaTensor:cmul(src) doesn't resize result

Hi,

in torch7, torch.Tensor:cmul() automatically resizes the result for the user. While in cutorch we leave this to the user.

I was wondering if this was a design decision or if we could harmonize this behaviour with torch7?

The effect of this different behaviour is that some pure-lua Modules may work okay when unit tested with torch.DoubleTensors, but fail when the Module is cast to cuda.

--Nick

Cutorch, RNG is undefined

I am compiling cutorch on Ubuntu 12.04, CUDA v 6.0.

When I call: "luarocks install cutorch" it results in this error:

[ 9%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o
/home/greg/Documents/cutorch/lib/THC/THCTensorRandom.cu(27): error: identifier "CURAND_RNG_PSEUDO_MTGP32" is undefined

1 error detected in the compilation of "/tmp/tmpxft_00000bb8_00000000-4_THC.cpp1.ii".
CMake Error at THC_generated_THC.cu.o.cmake:262 (message):
Error generating file
/home/greg/Documents/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o

make[2]: *** [lib/THC/CMakeFiles/THC.dir/./THC_generated_THC.cu.o] Error 1
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

It appears to be a linking issue, but I am unable to get it working with cmake. The env variable during cmake is "CUDA_curand_LIBRARY=/usr/lib/x86_64-linux-gnu/libcurand.so", which appears correct.

One random number generator per GPU

Currently there's a single random number generator shared between all GPUs. The way it is set up seems to indicate that we really want one generator per GPU, but that's not what's happening:
In luaopen_libcutorch we call THCudaInit which itself calls THRandom_manualSeed for each GPU, each time creating a new generator and replacing the old one. We then call THCRandom_seed which again replaces the previously created generator with a new one.

Before #45 we'd only call THCRandom_manualSeed in cutorch_setDevice but that seems equally wrong.

If we want one generator per GPU we should create N generators at initialization and then pick the one for the currently chosen device when generating random numbers. Any thoughts?

Moving static cuBlas state into THCudaState

Follow-up from @clementfarabet's patch. @soumith, did you say you wanted to do this? If not, I'm happy to write a patch. Just wanted to check that you haven't got one already.

cutorch test segfaults if ran 2 times

Doing this

cutorch.test()
cutorch.test()

produces a segfault (both with CUDA 6.5 and CUDA 7)

th> cutorch.test()
Running 104 tests
_____|__________________________________________________________________________________________________  ==> multi_gpu_copy_noncontig/opt/rocks/cutorch/lib/THC/THCStorage.c(111) : cuda runtime error : an illegal memory access was encountered
stack traceback:
    [C]: at 0x7f24ee186300
    [C]: in function 'collectgarbage'
    /opt/rocks/distro/install/share/lua/5.1/torch/Tester.lua:187: in function 'run'
    /opt/rocks/distro/install/share/lua/5.1/cutorch/test.lua:1555: in function 'f'
    [string "local f = function() return cutorch.test() en..."]:1: in main chunk
    [C]: in function 'xpcall'
    /opt/rocks/distro/install/share/lua/5.1/trepl/init.lua:620: in function 'repl'
    ...cks/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x00406170
Segmentation fault (core dumped)

build error with rocks

I get the following error when trying to install cutorch. How to fix it?

torch-rocks install cutorch

Installing https://raw.github.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.github.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
Cloning into 'cutorch'...
remote: Counting objects: 51, done.
remote: Compressing objects: 100% (48/48), done.
remote: Total 51 (delta 2), reused 30 (delta 1)
Receiving objects: 100% (51/51), 65.13 KiB | 0 bytes/s, done.
Resolving deltas: 100% (2/2), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/usr/local/bin/.." -DCMAKE_INSTALL_PREFIX="/usr/local/lib/torchrocks/rocks/cutorch/scm-1" && make

-- The C compiler identification is GNU 4.8.2
-- The CXX compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /usr/local
-- Found CUDA: /usr/local/cuda (Required is at least version "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cutorch-scm-1-347/cutorch/build
[ 9%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o
/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.h(31): error: identifier "cudaTextureObject_t" is undefined

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(3): error: identifier "cudaTextureObject_t" is undefined

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(5): error: identifier "cudaTextureObject_t" is undefined

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(6): error: incomplete type is not allowed

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(8): error: identifier "cudaResourceTypeLinear" is undefined

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(13): error: incomplete type is not allowed

/tmp/luarocks_cutorch-scm-1-347/cutorch/lib/THC/THCTensor.cu(15): error: identifier "cudaCreateTextureObject" is undefined

7 errors detected in the compilation of "/tmp/tmpxft_0000234e_00000000-4_THC.cpp1.ii".
CMake Error at THC_generated_THC.cu.o.cmake:262 (message):
Error generating file
/tmp/luarocks_cutorch-scm-1-347/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THC.cu.o

make[2]: *** [lib/THC/CMakeFiles/THC.dir/./THC_generated_THC.cu.o] Error 1
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2

Error: Build error: Failed building.

Troubles with comparison operator

There are some troubles with nan, -inf and +inf.

Basically, I would expect

print(torch.CudaTensor{0/0}:max())

to return nan instead of -inf as the following instruction does

print(torch.DoubleTensor{0/0}:max())

Here you have my testing script. Furthermore, the last two examples, b and c, showcase a bug also in case of Double.

C++ exception with nn.Tanh layer

I managed to install cutorch with OS X 10.9.4, CUDA 6.5 and cmake 3.0.1, with the small changes described in ticket #27.

However, I now have some problems running the forward method when my neural network includes a nn.Tanh layer. It works with a nn.Linear layer, but if I add the hyperbolic tangent I then get a C++ Exception.

Is there a way to see more details about what was the root cause of the exception? In the "th" interactive tool I don´t get further details, just a "C++ Exception" error message.

Did this happen to anyone else?

Thanks

Update - I now realise that I should probably have opened this ticket in the "cunn" repository instead. Going to ask it there. Feel free to close this ticket here if it's not the right place.

CUDA 7 runtime compilation integration

Has anyone looked into libnvrtc already? It would be awesome to have functions to which you give kernel strings, for example cutorch apply. I do it with opencl now, but it of course lacks the cutorch power.

Annoying non-existent dependency file warnings

I've got these annoying warnings on build:

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorage.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorageCopy.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensor.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorCopy.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorRandom.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorMath.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorConv.h


CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorLapack.h

I comment out the line in make2cmake file, but is it possible to do something better with it? These warnings show up whenever you build something with THC.

(Sorry accidentally posted an empty issue in the beginning)

New operations

Hi guys,

I see a few new operations on CudaTensors were recently implemented, such as indexCopy, which is really great. I noticed they could be easily extended by adding very little code to things like indexAdd (example use case: summing up labelled points to compute the mean).

Of course there's no standard torch tensor equivalent, but I could implement them both.

Would torch be open to include things like this or prefer I add them to a separate library?

Other operations I've missed (and implemented separately) are things like element-wise min/max/clamp.

Cheers,
Oliver

cublas fails for sgemv with zero strides

th> require 'cutorch'
{
  seed : function: 0x41863958
  getDeviceCount : function: 0x418636d0
  getDeviceProperties : function: 0x41863838
  deviceReset : function: 0x41863758
  test : function: 0x40f6e8b8
  getDevice : function: 0x41863708
  synchronize : function: 0x418636a8
  manualSeed : function: 0x418639a8
  initialSeed : function: 0x41863980
  setDevice : function: 0x418637a8
}
                                                                      [3.4736s] 
th> torch.CudaTensor(2):addmv(torch.CudaTensor(2,1), torch.CudaTensor(1))
 ** On entry to SGEMV  parameter number 6 had an illegal value
0
0
[torch.CudaTensor of dimension 2]

torch.{xxx} doesn't work with CudaTensor

you can do
x=torch.CudaTensor(10)

x:norm()
but you cant do

torch.norm(x)

reshape, view, resize differences

Could someone please clear the differences between THCudaTensor_resize(n)d and :view, :reshape? In which cases resize doesn't copy memory, and is there an equivalent of view in TH? Does reshape call resize? For example this is surprising:

THCudaTensor_resize2d(tensor, oH, oW);
THCudaTensor_resize3d(tensor, 1, oH, oW); // no copy
THCudaTensor_resize2d(tensor, 1, oH*oW); // copy!

I've tried looking into TH and THC source code but it is still unclear to me. Thanks!

sum operation over dimension dim doesn't work on cudatensor of dimension larger than 4

use case:

m=torch.CudaTensor(2,3,4,5,6):fill(1)
reduce = torch.sum(m,dim) --fails [torch.CudaTensor with no dimension]

where dim can be any integer in [1..5].

reduce=torch.sum(m) --/does/ work

Cloning an empty CudaTensor results in a segfault

Torch 7.0 Copyright (C) 2001-2011 Idiap, NEC Labs, NYU
Lua 5.1 Copyright (C) 1994-2008 Lua.org, PUC-Rio
t7> require 'cutorch'
t7> x = torch.FloatTensor(5):clone() -- ok
t7> x = torch.FloatTensor(0):clone() -- ok
t7> x = torch.CudaTensor(5):clone() -- ok
t7> x = torch.CudaTensor(0):clone() -- results in a segfault:
Segmentation fault (core dumped)

torch.cmul inconsistent behavior, mutates first argument?

Maybe im missing something, but in vanilla Torch z=torch.cmul(x,y) does not mutate x, while in cutorch it does:

require 'cutorch'

cuda = true

local x = torch.Tensor(4):fill(1)
local y = torch.Tensor(4):zero()

print(x)
if cuda then
    x = x:cuda()
    y = y:cuda()
end
local z = torch.cmul(x, y)
print(x)
>>>

 1
 1
 1
 1
[torch.FloatTensor of dimension 4]

0
0
0
0
[torch.CudaTensor of dimension 4]

cutorch var() and std() don't allow dimension argument

Incompatibility with torch. To reproduce:

r = torch.rand(8,2)
print(r:var(1))  -- OK
print(r:cuda():var(1))    -- not OK
print(r:cuda():std(1))    -- not OK
print(r:cuda():mean(1))  -- OK!

Repeated calls to uniform() produce non-uniform output

Repeated calls to :uniform(0, 1) produce non-uniform output. This only happens with CudaTensors and only across many calls to uniform, filling a large tensor with a single call seems okay.

-- this produces uniform output
-- local block = 10000

-- this does not
local block = 1

local r = torch.zeros(block):cuda()
for j = 1, 10000/block do
    r:uniform(0, 1)
    -- this is also broken
    -- r:rand(r:size())

    for i = 1, r:size(1) do
        print(r[i])
    end
end

Very many warnings

Every time I install cutorch and cunn I get the following warnings

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorage.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THStorageCopy.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensor.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorCopy.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorRandom.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorMath.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorConv.h

CMake Warning at /usr/share/cmake-2.8/Modules/FindCUDA/make2cmake.cmake:66 (message):
   Removing non-existent dependency file: generic/THTensorLapack.h

Can we simply remove this line 66 instead?
I am not even sure where this file comes from...

sum is buggy

There were never unit tests for sum. I just added them, and it doesn't work for anything of a good size (like 600,600 for example).

copy gives wrong results with non-contiguous tensors

need to add makeContiguous at places!

a=torch.randn(2,2,2)
print(a)
(1,.,.) =
-1.7695 -1.1897
-1.1269 -1.1290

(2,.,.) =
0.2287 -1.0936
-0.1886 1.5330
[torch.DoubleTensor of dimension 2x2x2]
b=a.new(2,2):copy(a[{{},{},{2}}])
print(b)
-1.1897 -1.1290
-1.0936 1.5330
[torch.DoubleTensor of dimension 2x2]

a=a:cuda()
b=a.new(2,2):copy(a[{{},{},{2}}])
print(b)
-1.1897 -1.1269
-1.0936 -0.1886
[torch.CudaTensor of dimension 2x2]
WRONG ANSWER!!!
b=a.new(2,2):copy(a[{{},{},{2}}]:contiguous())
print(b)
-1.1897 -1.1290
-1.0936 1.5330
[torch.CudaTensor of dimension 2x2]
CORRECT ANSWER!!!

conv2 wraps SpatialConvolutionMM ops maybe?

@fmassa pointed out that it would be a good idea to wrap the convolutions using Toeplitz matrix as cutorch's conv2 op.

An issue to track that.

remove synchronous mem copies everywhere

@clementfarabet pointed out that there's synchronous memory copies sprinkled everywhere, that can be a pain when doing Multi-GPU. In the short term (before our work gets open-sourced with the fancy copy kernels), a fast plug would be to fix all cudaMemCpy calls to cudaMemCpyAsync

http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__MEMORY_g732efed5ab5cb184c920a21eb36e8ce4.html

Segmentation fault when cutorch.deviceReset() was called earlier

The code below produces a segmentation fault when deviceReset is called. When it is not called, it seems to work.

a.lua

require 'cutorch'

cutorch.deviceReset()
cutorch.setDevice(1)
cutorch.manualSeed(1)

print('Here1')
local a = torch.CudaTensor(1,106):fill(1)
local b = torch.CudaTensor(106,2):fill(3)
local c = torch.CudaTensor(1,2):fill(7)
print('Here2')
c:addmm(a,b)
print('Here3')

print('Here1')
local a = torch.CudaTensor(1,10006):fill(1)
local b = torch.CudaTensor(10006,2):fill(3)
local c = torch.CudaTensor(1,2):fill(7)
print('Here2')
c:addmm(a,b)
print('Here3')

Output:

# CUDA_VISIBLE_DEVICES="1" th a.lua 
Here1   
Here2   
Here3   
Here1   
Here2   
Segmentation fault (core dumped)

Output without the deviceReset:

# CUDA_VISIBLE_DEVICES="1" th a.lua 
Here1   
Here2   
Here3   
Here1   
Here2   
Here3

illegal memory access asserted

Two weeks ago, I installed cudnn and changed convolution operations from cunn to cudnn.
while the first time I typped require 'cutorch' in th there comes message like:

th> require 'cudnn'
Error in CuDNN. Status Code:    6   
true

Well, till this morning my code keeps running even with 6 status,
however, when I start new code this evening, this errrors assert at test() function

The code errors with

Error in CuDNN. Status Code:       7
Error in CuDNN. Status Code:       8

@soumith , I've checked the torch7 group, someone met something like this before,
but I still have no idea how to handle this, would you help me figure out this?
Also welcome for help from who's currently using this lib.

inconsistent API against TH

The following function signatures are inconsistent between the TH API and the THC API:

cadd (in THC it is split into cadd, cadd_tst)
addcmul (simple fix of adding a result tensor)
addcdiv (simple fix of adding a result tensor)
min (THC doesn't have the indices part)
max (THC doesn't have the indices part)

Using RNG with CUDA default type leads to error

luajit -lcutorch
torch.setdefaulttensortype('torch.CudaTensor')
a=torch.randn(10)

leads to:

input = torch.randn(ninput)
[string "input = torch.randn(ninput)..."]:1: internal error: the default tensor type does not seem to be an actual tensor

Cannot run multiple CUDA scripts simutaneously

After I updated to the most recent version of cutorch, I find I can no longer run multiple torch7 CUDA scripts in parallel ( I use setDevice to assign gpus to different scripts) . The first script will run normally, but the second and later scripts will fail at the " require 'cutorch' line" with error message "unable to initialize cublas"

/home/jzjz/bin/luajit: unable to initialize cublas
stack traceback:
[C]: at 0x2aaaaea338f0
[C]: in function 'require'
/home/jzjz/share/lua/5.1/cutorch/init.lua:2: in main chunk

non-contiguous copy still buggy

Test case:

require 'cutorch'

val = 1
ps = torch.LongStorage({4, 4, 4})
cube = torch.Tensor(ps):apply(function()
          val = val + 1
          return val
         end):cuda()

ps = torch.LongStorage({4, 12})
x = torch.CudaTensor(ps):fill(-1)

l = 2
h = 1
w = 2

print(cube)
print(cube[l][{{h,h+2},{w,w+2}}])

x[{{1},{1,9}}]:copy(cube[l][{{h,h+2},{w,w+2}}]:contiguous())
print(x)

x[{{1},{1,9}}]:copy(cube[l][{{h,h+2},{w,w+2}}])
print(x)

x[{{1,1},{1,9}}]:copy(cube[l][{{h,h+2},{w,w+2}}])
print(x)


x[{1,{1,9}}]:copy(cube[l][{{h,h+2},{w,w+2}}])
print(x)

Output:

(1,.,.) =
   2   3   4   5
   6   7   8   9
  10  11  12  13
  14  15  16  17

(2,.,.) =
  18  19  20  21
  22  23  24  25
  26  27  28  29
  30  31  32  33

(3,.,.) =
  34  35  36  37
  38  39  40  41
  42  43  44  45
  46  47  48  49

(4,.,.) =
  50  51  52  53
  54  55  56  57
  58  59  60  61
  62  63  64  65
[torch.CudaTensor of dimension 4x4x4]

 19  20  21
 23  24  25
 27  28  29
[torch.CudaTensor of dimension 3x3]

 19  20  21  23  24  25  27  28  29  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
[torch.CudaTensor of dimension 4x12]

 19  20  21  22  23  24  25  26  27  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
[torch.CudaTensor of dimension 4x12]

 19  20  21  22  23  24  25  26  27  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
[torch.CudaTensor of dimension 4x12]

 19  20  21  22  23  24  25  26  27  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
 -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
[torch.CudaTensor of dimension 4x12]

cutorch fails seeding on deviceReset

luajit -lcutorch -e "cutorch.deviceReset(1); cutorch.setDevice(1); cutorch.manualSeed(1)"

luajit: (command line):1: Creating MTGP kernel state failed.
stack traceback:
[C]: in function 'manualSeed'
(command line):1: in main chunk
[C]: at 0x00406170

@dominikgrewe any idea what's going on?
I can consistently reproduce this in luajit (both interpreter and -e),
however when I drop to the th shell in interpreted mode, I dont see the error, but very weird behavior

$ luajit
LuaJIT 2.1.0-alpha -- Copyright (C) 2005-2014 Mike Pall. http://luajit.org/
th> require 'cutorch'
th> cutorch.deviceReset(1); cutorch.setDevice(1); cutorch.manualSeed(1)
stdin:1: Creating MTGP kernel state failed.
stack traceback:
[C]: in function 'manualSeed'
stdin:1: in main chunk
[C]: at 0x00406170

$th
th> require 'cutorch';
[0.3256s]
th> cutorch.deviceReset(1); cutorch.setDevice(1); cutorch.manualSeed(1); cutorch.test()
WEIRD BEHAVIOR (see for yourself)