Git Product home page Git Product logo

docker-torch-rnn's People

Contributors

crisbal avatar lord-alfred avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-torch-rnn's Issues

Noob Question: How do I use this with my own files?

Hi,
I'm using the Docker file and it works great with the example shakespeare text.
I'm trying to run:
docker run -it -v ~/data2:/data2 crisbal/torch-rnn:base bash
to mount my data folder and use it, but it doesn't show up. It works for other docker files, so are there any permissions I need to change? Or is there another way I can run it using my data?
I'm new to this and appreciate the help. Thank you

train.lua cannot find cutorch

When I'm trying to run the training in the base image, I get the following issue:

/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cutorch' not found:No LuaRocks module found for cutorch
no field package.preload['cutorch']
no file '/root/.luarocks/share/lua/5.1/cutorch.lua'
no file '/root/.luarocks/share/lua/5.1/cutorch/init.lua'
no file '/root/torch/install/share/lua/5.1/cutorch.lua'
no file '/root/torch/install/share/lua/5.1/cutorch/init.lua'
no file './cutorch.lua'
no file '/root/torch/install/share/luajit-2.1.0-beta1/cutorch.lua'
no file '/usr/local/share/lua/5.1/cutorch.lua'
no file '/usr/local/share/lua/5.1/cutorch/init.lua'
no file '/root/.luarocks/lib/lua/5.1/cutorch.so'
no file '/root/torch/install/lib/lua/5.1/cutorch.so'
no file '/root/torch/install/lib/cutorch.so'
no file './cutorch.so'
no file '/usr/local/lib/lua/5.1/cutorch.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
train.lua:55: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

'THCudaCheck FAIL' Using Cuda7.5 Docker Image

After installing the NVIDIA docker image, and loading the Torch RNN docker via:

nvidia-docker run --rm -ti crisbal/torch-rnn:cuda7.5 bash

and preprocessing via

root@3da15ad69af8:~/torch-rnn# python scripts/preprocess.py --input_txt data/library.txt --output_h5 data/library.h5 --output_json data/library.json

Attempting to train the system results in the following:

root@3da15ad69af8:~/torch-rnn# th train.lua -input_h5 data/library.h5 -input_json data/library.json
Running with CUDA on GPU 0
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c line=608 error=8 : invalid device function
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/nn/Container.lua:67:
In 2 module of nn.Sequential:
./LSTM.lua:128: cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-9234/cutorch/lib/THC/THCGeneral.c:608
stack traceback:
[C]: in function 'resize'
./LSTM.lua:128: in function <./LSTM.lua:118>
[C]: in function 'xpcall'
/root/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:130: in function 'opfunc'
/root/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'adam'
train.lua:187: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

How to view training progress when run the container in detached mode?

I'm currently running the training with docker exec -it <container_name> bash to enter the container and use Ctrl+P then Ctrl+Q to escape from the container after starting the training script.

Using docker logs <container_name> will only show the logs when the training is finished, whereas I haven't been able to track the training progress, so that I can planning other tasks on the machine.

Was wondering if there's way to inspect training progress with docker?

Meanwhile, it would be great to have the time/batch print out as in the char-rnn repo.

the train.lua stop print informatation when the input file is bigger than 500MB?

Hi, my input file is very big, about 500MB. and the train.lua stop print information when run for a while. the last info it's print is

Epoch 1.02 / 50, i = 994 / 3145800, loss = 4.990288
Epoch 1.02 / 50, i = 995 / 3145800, loss = 5.104537
Epoch 1.02 / 50, i = 996 / 3145800, loss = 4.961758
Epoch 1.02 / 50, i = 997 / 3145800, loss = 4.969568
Epoch 1.02 / 50, i = 998 / 3145800, loss = 5.046015
Epoch 1.02 / 50, i = 999 / 3145800, loss = 4.955519
Epoch 1.02 / 50, i = 1000 / 3145800, loss = 4.886581

and the process CPU run at 100%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31256 root 20 0 2333920 2.0g 1916 R 108.0 54.5 172:40.70 luajit

is this normal? should I just wait?

Error when training

Just tested your docker (I run docker 1.12.1) and when executing your example th train.lua -input_h5 data/tiny-shakespeare.h5 -input_json data/tiny-shakespeare.json I had this error:

/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:384: module 'cutorch' not found:No LuaRocks module found for cutorch
    no field package.preload['cutorch']
    no file '/root/.luarocks/share/lua/5.1/cutorch.lua'
    no file '/root/.luarocks/share/lua/5.1/cutorch/init.lua'
    no file '/root/torch/install/share/lua/5.1/cutorch.lua'
    no file '/root/torch/install/share/lua/5.1/cutorch/init.lua'
    no file './cutorch.lua'
    no file '/root/torch/install/share/luajit-2.1.0-beta1/cutorch.lua'
    no file '/usr/local/share/lua/5.1/cutorch.lua'
    no file '/usr/local/share/lua/5.1/cutorch/init.lua'
    no file '/root/.luarocks/lib/lua/5.1/cutorch.so'
    no file '/root/torch/install/lib/lua/5.1/cutorch.so'
    no file '/root/torch/install/lib/cutorch.so'
    no file './cutorch.so'
    no file '/usr/local/lib/lua/5.1/cutorch.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
    [C]: in function 'error'
    /root/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    train.lua:55: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Manual Manipulation of Model Weights

Hi all!

I am wondering if there is an easy way to read in the parameter weights from the .t7 files saved during checkpointing. I want to perform some purely mathematical operations on the weights for a research project, but am unsure of an easy way to access them in raw format as stated. Please let me know!

-Matt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.