facebookresearch / fairseq-lua Goto Github PK
View Code? Open in Web Editor NEWFacebook AI Research Sequence-to-Sequence Toolkit
License: Other
Facebook AI Research Sequence-to-Sequence Toolkit
License: Other
Any plan on pre-trained data for English-to-Chinese translation?
I just read about it and I am interested in using it. I am curious to know whether it can be used on other seq2seq problems which a RNN (e.g LSTM, GRU cells) are able to solve?
E.g Image captioning, Document summarization.
Thank you and amazing work!
Lookin forward for your positive response.
Regards,
Is it possible to continue training from the pre-trained models you have released?
My understanding is that Torch needs both a model file and a state file in order to continue a training run, but you've only released model files. Is that right? Thanks!
Use case: domain adaptation experiments where your pre-trained models are used for initialization.
Hello, I am trying to use fairseq and I run into this issue, please advise
micheles-mba:fairseq.git micheles$ luarocks make rocks/fairseq-scm-1.rockspec --local
Error: File not found: rocks/fairseq-scm-1.rockspec
micheles-mba:fairseq.git micheles$ ls
branches trunk
fairseq.git wmt14.en-fr.fconv-cuda
micheles-mba:fairseq.git micheles$
Hi,
I am trying to train the fconv model for summarization task. I get the following error when doing so. Can someone kindly let me know what the issue is ?
root@c4cf4e23aba9:~/torch/fairseq# fairseq train -sourcelang comments -targetlang summaries -datadir data-bin/questions -model fconv -nenclayer 4 -nlayer 4 -dropout 0.2 -optim sgd -lr 0.9 -clip 0.5 -momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv
| [summaries] Dictionary: 55314 types
| [comments] Dictionary: 129040 types
| IndexedDataset: loaded data-bin/questions with 77000 examples
| IndexedDataset: loaded data-bin/questions with 1600 examples
| IndexedDataset: loaded data-bin/questions with 110 examples
| IndexedDataset: loaded data-bin/questions with 1600 examples
| IndexedDataset: loaded data-bin/questions with 110 examples
*** Error in `/root/torch/install/bin/luajit': realloc(): invalid next size: 0x00007fb6fd7719f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb9464cc7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x82a5a)[0x7fb9464d7a5a]
/lib/x86_64-linux-gnu/libc.so.6(realloc+0x179)[0x7fb9464d8c89]
/root/torch/install/lib/libTH.so.0(THRealloc+0x3a)[0x7fb9454bde8a]
/root/torch/install/lib/libTH.so.0(THByteStorage_resize+0x33)[0x7fb9454bfe33]
/root/torch/install/lib/libTH.so.0(THByteTensor_newWithTensor+0x62)[0x7fb9454dcef2]
/root/torch/install/lib/lua/5.1/libtorch.so(+0x5b8af)[0x7fb945c858af]
/root/torch/install/bin/luajit[0x47dbaa]
/root/torch/install/lib/libluaT.so.0(+0x2aa6)[0x7fb945a22aa6]
/root/torch/install/bin/luajit[0x47dbaa]
/root/torch/install/bin/luajit[0x44068e]
/root/torch/install/bin/luajit[0x47df19]
/root/torch/install/bin/luajit(lua_close+0x90)[0x46d800]
/root/torch/install/lib/libthreadsmain.so(THThread_main+0x57)[0x7fb92b08f997]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fb946a3b6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb94655b82d]
======= Memory map: ========
00400000-0049a000 r-xp 00000000 00:2d 139896 /root/torch/install/bin/luajit
005d0000-00640000 r-xp 00000000 00:00 0
00699000-0069a000 r--p 00099000 00:2d 139896 /root/torch/install/bin/luajit
0069a000-0069b000 rw-p 0009a000 00:2d 139896 /root/torch/install/bin/luajit
01420000-01470000 r-xp 00000000 00:00 0
01aaf000-0bcdf000 rw-p 00000000 00:00 0 [heap]
0bce0000-0bd10000 r-xp 00000000 00:00 0
I did
# Install CUDA libraries
RUN luarocks install torch && \
luarocks install cutorch && \
luarocks install cunn && \
luarocks install cudnn
in a docker container with torch
installed.
root@cf9f9a5a3f42:~/fairseq# fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
Tried loading libnccl.so.1 but got error /root/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory
Tried loading libnccl.1.dylib but got error /root/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/rnnlib/cudnnutils.lua:13: You must have the torch Cudnn bindings.
stack traceback:
[C]: in function 'assert'
/root/torch/install/share/lua/5.1/rnnlib/cudnnutils.lua:13: in main chunk
[C]: in function 'require'
...orch/install/share/lua/5.1/rnnlib/nn/WrappedCudnnRnn.lua:14: in main chunk
[C]: in function 'require'
/root/torch/install/share/lua/5.1/rnnlib/init.lua:18: in main chunk
[C]: in function 'require'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:18: in main chunk
[C]: in function 'require'
/root/torch/install/share/lua/5.1/fairseq/models/init.lua:15: in main chunk
[C]: in function 'require'
/root/torch/install/share/lua/5.1/fairseq/init.lua:14: in main chunk
[C]: in function 'require'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:15: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50
root@cf9f9a5a3f42:~/fairseq#
I have cuDNN
installed:
root@cf9f9a5a3f42:~# luarocks install cudnn
Installing https://raw.githubusercontent.com/torch/rocks/master/cudnn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cudnn-scm-1.rockspec... switching to 'build' mode
Cloning into 'cudnn.torch'...
remote: Counting objects: 60, done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 60 (delta 15), reused 16 (delta 3), pack-reused 0
Receiving objects: 100% (60/60), 67.93 KiB | 0 bytes/s, done.
Resolving deltas: 100% (15/15), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cudnn/scm-1" && make
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /root/torch/install
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found suitable version "8.0", minimum required is "7.0")
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
-- CuDNN 5.1 not found at install-time. Please make sure it's in LD_LIBRARY_PATH at runtime
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cudnn-scm-1-8695/cudnn.torch/build
cd build && make install
Install the project...
-- Install configuration: "Release"
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialCrossEntropyCriterion.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialBatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BLSTM.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/LSTM.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/TemporalConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pointwise.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/convert.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/find.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialLogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialCrossMapLRN.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ClippedReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ffi.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/LogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialMaxPooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Sigmoid.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialAveragePooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricLogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricBatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricMaxPooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNN.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricCrossEntropyCriterion.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BGRU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialDivisiveNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNNTanh.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/env.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/functional.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialFullConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Tanh.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pooling3D.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNNReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricFullConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/init.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricAveragePooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/GRU.lua
Updating manifest for /root/torch/install/lib/luarocks/rocks
cudnn scm-1 is now built and installed in /root/torch/install/ (license: BSD)
Is it possible to provide a Caffe2 model along with the torch one ?
Hi Jonas, just a curiosity. What are the correct formats and steps to test new models in other languages, please? Please advise. Thank you.
First I have done http://torch.ch/docs/getting-started.html and luarocks make rocks/fairseq-scm-1.rockspec
and facebookresearch/fairseq#24
Then I run train.lua
in zbstudio, but get error (at line require 'fairseq'
https://github.com/facebookresearch/fairseq/blob/master/train.lua#L17):
Tried loading libnccl.1.dylib but got error /home/gt/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
Tried loading libnccl.so.1 but got error /home/gt/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory
The program could continue to run until:
| IndexedDataset: loaded /home/gt/fairseq/data-bin/iwslt14.tokenized.de-en with 6750 examples
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:281: attempt to index field 'nccl' (a nil value)
stack traceback:
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
train.lua:404: in main chunk
Are there plans to add a PyTorch implementation of "Convolutional Sequence to Sequence Learning" to this repo?
Is this architecture compatible with zero shot translation?
If yes: Are there plans to implement it?
Hi,
is it possible to give list of tensors as input to fairseq torch code?
Ex : Input -> list of tensors
output -> list of tensors
Will I be able to accomplish the above task using fairseq? If so, can you please point me to a example?
Thanks
Add an assert that checks if T is empty and provide a sensible error message. Currently, this will produce a trace as described in #46. An additional check could be added for pre-processing.
Data: downloaded de-en pair from opensubtitles.org. About 13M sentence.
Train: same as provided in the readme.md
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en
-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv
Machine: Amazon EC2 instance with 8GPU each 12G memory
Error: memory error.
Which parameter I can change without loss of accuracy?
when i use:
fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
decode the example model, i found problem as follow:
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
hello
/usr/local/torch/install/bin/luajit: /usr/local/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
In 2 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 2 module of nn.Sequential:
In 2 module of nn.Sequential:
...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:10: attempt to call field 'GatedLinear_updateOutput' (a nil value)
stack traceback:
...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:10: in function <...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:8>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function </usr/local/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
...
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:355: in function 'encode'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:108: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...install/lib/luarocks/rocks/fairseq-cpu/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064f0
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:355: in function 'encode'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:108: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...install/lib/luarocks/rocks/fairseq-cpu/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064f0
I have update torch and nn version, and there is no problem in cpu
and also when i train a new model, Standard bi-directional LSTM model and Convolutional encoder, LSTM decoder is ok, but Fully convolutional sequence-to-sequence model hava a error as follow:
I try to product the experiment result as the paper reported on K40 (12GB) , but i find i can't set the batch size as much as the paper reported like 48 in wmt14 en-ge?
When launching the install with luarocks make rocks/fairseq-scm-1.rockspec
I get
Missing dependencies for fairseq:
rnnlib
torchnet
visdom
torchnet-sequential
nccl
tbc
tds
Will the install script adding those dependencies automatically?
Hi,
I tried to install fairseq but failed with the below error message. It seems that the server I am using is missing OpenBLAS which is required to install torch/tds library. However, I am not a sudoer to install OpenBLAS on /opt/OpenBLAS. Is there any way to install OpenBLAS locally and link it to luarocks? Thanks!
Missing dependencies for fairseq:
torchnet
visdom
torchnet-sequential
tbc
nccl
tdsUsing https://raw.githubusercontent.com/torch/rocks/master/torchnet-scm-1.rockspec... switching to 'build' mode
Missing dependencies for torchnet:
tds >= 1.0Using https://raw.githubusercontent.com/torch/rocks/master/tds-scm-1.rockspec... switching to 'build' mode
Cloning into 'tds'...
remote: Counting objects: 32, done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 32 (delta 5), reused 10 (delta 1), pack-reused 0
Receiving objects: 100% (32/32), 23.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (5/5), done.
Checking connectivity... done.
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /home/vhoang2/torch/install
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_tds-scm-1-9010/tds/build.luarocks
Scanning dependencies of target tds
[ 16%] Building C object CMakeFiles/tds.dir/tds_utils.c.o
[ 33%] Building C object CMakeFiles/tds.dir/tds_elem.c.o
[ 50%] Building C object CMakeFiles/tds.dir/tds_hash.c.o
[ 66%] Building C object CMakeFiles/tds.dir/tds_vec.c.o
[ 83%] Building C object CMakeFiles/tds.dir/tds_atomic_counter.c.o
make[2]: *** No rule to make target '/opt/OpenBLAS/lib/libopenblas.so', needed by 'libtds.so'. Stop.
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/tds.dir/all' failed
make[1]: *** [CMakeFiles/tds.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/torchnet-scm-1.rockspec - Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/tds-scm-1.rockspec - Build error: Failed building.
I need see the friend and the page yo my girls friend
I want to use dropout in the training. I read the file train.lua. I found that add "-dropout 0.25" in the this command line.
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en
-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv -dropout 0.25
is this is the right way?
I'm training a set of translation models using the suggested fconv
parameters (but the model switched to blstm
):
fairseq train -sourcelang en -targetlang fr -datadir data/fairseq/en-fr -model blstm -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -momentum 0.99 -timeavg -bptt 0 -savedir data/fairseq/en-fr.blstm -batchsize 16 | tee train..blstm.log
I'm seeing loss and perplexity become nan
after a few epochs:
| epoch 000 | 0001000 updates | words/s 4328| trainloss 8.72 | train ppl 420.34
| epoch 000 | 0002000 updates | words/s 4559| trainloss 6.91 | train ppl 120.29
| checkpoint 001 | epoch 001 | 0002645 updates | s/checkpnt 767 | words/s 4461 | lr 0.250000
| checkpoint 001 | epoch 001 | 0002645 updates | trainloss 7.40 | train ppl 169.38
| checkpoint 001 | epoch 001 | 0002645 updates | validloss 5.87 | valid ppl 58.37 | testloss 5.82 | test ppl 56.55
| epoch 001 | 0003645 updates | words/s 4371| trainloss 5.85 | train ppl 57.84
| epoch 001 | 0004645 updates | words/s 4373| trainloss 5.58 | train ppl 47.91
| checkpoint 002 | epoch 002 | 0005290 updates | s/checkpnt 783 | words/s 4373 | lr 0.250000
| checkpoint 002 | epoch 002 | 0005290 updates | trainloss 5.65 | train ppl 50.15
| checkpoint 002 | epoch 002 | 0005290 updates | validloss 5.25 | valid ppl 38.13 | testloss 5.21 | test ppl 36.96
| epoch 002 | 0006290 updates | words/s 4327| trainloss 5.33 | train ppl 40.15
| epoch 002 | 0007290 updates | words/s 4274| trainloss 5.24 | train ppl 37.82
| checkpoint 003 | epoch 003 | 0007935 updates | s/checkpnt 800 | words/s 4281 | lr 0.250000
| checkpoint 003 | epoch 003 | 0007935 updates | trainloss 5.25 | train ppl 38.07
| checkpoint 003 | epoch 003 | 0007935 updates | validloss 4.99 | valid ppl 31.81 | testloss 4.95 | test ppl 30.86
| epoch 003 | 0008935 updates | words/s 4235| trainloss nan | train ppl nan
| epoch 003 | 0009935 updates | words/s 4341| trainloss nan | train ppl nan
| checkpoint 004 | epoch 004 | 0010580 updates | s/checkpnt 791 | words/s 4325 | lr 0.250000
| checkpoint 004 | epoch 004 | 0010580 updates | trainloss nan | train ppl nan
| checkpoint 004 | epoch 004 | 0010580 updates | validloss nan | valid ppl nan | testloss nan | test ppl nan
| epoch 004 | 0011580 updates | words/s 4341| trainloss nan | train ppl nan
| epoch 004 | 0012580 updates | words/s 4347| trainloss nan | train ppl nan
| checkpoint 005 | epoch 005 | 0013225 updates | s/checkpnt 791 | words/s 4328 | lr 0.250000
| checkpoint 005 | epoch 005 | 0013225 updates | trainloss nan | train ppl nan
| checkpoint 005 | epoch 005 | 0013225 updates | validloss nan | valid ppl nan | testloss nan | test ppl nan
Is this something I should expect? Would you guess this is a parameter configuration issue (eg the optimizer is being too aggressive and overflowing) or does this suggest a bug (eg an overflow in the loss or perplexity code)?
I don't want to train a whole model and predict. I only want to understand it. Is it possible to debug it, only using CPU without GPU?
Thank you!
I have re-installed torch and dependency libraries. But I met the thread errors at the beginning of the training. Command are as follows:
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en \
-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv
Error are as follows:
In 3 module of nn.Sequential:
In 2 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 2 module of nn.Sequential:
In 1 module of nn.Sequential:
...rch/install/share/lua/5.1/tbc/TemporalConvolutionTBC.lua:44: attempt to index field 'TBC' (a nil value)
stack traceback:
...rch/install/share/lua/5.1/tbc/TemporalConvolutionTBC.lua:44: in function 'updateOutput'
...din/public/torch/install/share/lua/5.1/nn/WeightNorm.lua:115: in function <...din/public/torch/install/share/lua/5.1/nn/WeightNorm.lua:111>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...din/public/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...din/public/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:371: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...ic/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064a0
when i run airseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
i get error
Tried loading libnccl.so.1 but got error /home/actl/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory
Tried loading libnccl.1.dylib but got error /home/actl/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
why?
lenovo1601@lenovo1601-Lenovo:~/fairseq$ fairseq preprocess -sourcelang de -targetlang en -trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test -thresholdsrc 3 -thresholdtgt 3 -destdir data-bin/iwslt14.tokenized.de-en
/home/lenovo1601/torch/install/bin/lua: unable to convert argument 2 from cdata<int ()(struct tds_elem_, struct tds_elem_)> to cdata<int ()(const struct tds_elem_, const struct tds_elem_)>
stack traceback:
[C]: in function 'tds_vec_sort'
/home/lenovo1601/torch/install/share/lua/5.2/tds/vec.lua:98: in function 'sort'
.../torch/install/share/lua/5.2/fairseq/text/Dictionary.lua:79: in function 'finalize'
...1/torch/install/share/lua/5.2/fairseq/text/tokenizer.lua:79: in function <...1/torch/install/share/lua/5.2/fairseq/text/tokenizer.lua:74>
(...tail calls...)
...rch/install/share/lua/5.2/fairseq/scripts/preprocess.lua:110: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: in ?
When I install torch following http://torch.ch/docs/getting-started.html, and fairseq, and run fairseq generate-lines command, it comes this error.
fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
/home/torch/install/bin/luajit: /home/torch/install/share/lua/5.1/torch/init.lua:102: class nn.ZipTable has been already assigned a parent class
stack traceback:
[C]: in function 'newmetatable'
/home/torch/install/share/lua/5.1/torch/init.lua:102: in function 'class'
.../torch/install/share/lua/5.1/rnnlib/nn/ZipTable.lua:12: in main chunk
[C]: in function 'require'
/home/torch/install/share/lua/5.1/rnnlib/init.lua:34: in main chunk
[C]: in function 'require'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:18: in main chunk
[C]: in function 'require'
.../torch/install/share/lua/5.1/fairseq/models/init.lua:15: in main chunk
[C]: in function 'require'
/home/torch/install/share/lua/5.1/fairseq/init.lua:14: in main chunk
[C]: in function 'require'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:15: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00406670
How could I solve this problem?
root@7ab936c99d66:~/fairseq# fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1095/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-1095/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function </root/torch/install/share/lua/5.1/torch/File.lua:245>
[C]: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...
/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
/root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:32: in function '__init'
/root/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:153: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50
Not sure, but here the THCStorage
it seems torch
is trying to allocate more GPU mem of what I have right here
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 38C P8 17W / 125W | 3977MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 18259 C /root/torch/install/bin/luajit 3975MiB |
+-----------------------------------------------------------------------------+
How to set a max GPU memory allocation then?
I tried running the example for translating English to French:
fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
However, the following error was given:
| [target] Dictionary: 44666 types | [source] Dictionary: 44409 types THCudaCheck FAIL file=/home/james/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/james/torch/install/bin/luajit: /home/james/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /home/james/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function </home/james/torch/install/share/lua/5.1/torch/File.lua:245> [C]: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' ... /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' .../install/share/lua/5.1/fairseq/models/ensemble_model.lua:32: in function '__init' /home/james/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new' ...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:153: in main chunk [C]: in function 'require' ...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk [C]: at 0x00405d50
I tried reinstalling Torch and fairseq as well as making sure that I have the most recent version of nn (luarocks install nn
). Unfortunately none of these attempts fixed the problem.
The weird thing is that neither my 2 GB of GPU memory nor my 16 GB of RAM seemed to get remotely full before the error message appeared.
Any help would be appreciated.
Hi
I am training a model using wmt en-fr data. I meet a error.
/home/work/liuhongyu/torch/torch/install/bin/luajit: ...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:41: bad argument #2 to 'narrow' (out of range at /home/work/liuhongyu/torch/torch/pkg/torch/lib/TH/generic/THTensor.c:368)
stack traceback:
[C]: in function 'narrow'
...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:41: in function 'makeInput'
...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:594: in function 'get'
.../install/share/lua/5.1/torchnet/dataset/batchdataset.lua:120: in function 'get'
...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:110: in function <...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:109>
[C]: in function 'xpcall'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:150: in function 'loop'
...stall/share/lua/5.1/torchnet/dataset/datasetiterator.lua:107: in function 'next_from_base'
...hare/lua/5.1/fairseq/torchnet/ShardedDatasetIterator.lua:102: in function 'loop'
...stall/share/lua/5.1/torchnet/dataset/datasetiterator.lua:107: in function 'gloop'
...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:47: in function <...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:43>
[C]: in function 'xpcall'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:63: in function '(for generator)'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:320: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...ch/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00406820
I can run the training using wmt en-de data. Is there something wrong with the data preprocess? or some thing else.
If I try to run this example I get an exeption:
fairseq generate-lines -path wmt14.en-de.fconv-cuda/model.th7 -sourcedict wmt14.en-de.fconv-cuda/dict.de.th7 -targetdict wmt14.en-de.fconv-cuda/dict.en.th7 -beam 5
| [target] Dictionary: 42242 types
| [source] Dictionary: 43675 types
Dieser Zug fährt vom München nach Paris
/home/xxx/tools/torch/install/bin/luajit: ...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: attempt to index field 'weight' (a nil value)
stack traceback:
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: in function 'onEvaluate'
...install/share/lua/5.1/fairseq/modules/TrainTestLayer.lua:29: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:652: in function 'setup'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:102: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50
Any ideas?
I have re-installed torch and dependency libraries. But I met the thread errors when training on my own training data. Although the beginning seems correctly. I trained the model on 2 GPU cards. The training command is as follow.
fairseq train -sourcelang src -targetlang ref -datadir nist-bin/ -model blstm -nhid 512 -dropout 0.2 -dropout_hid 0 -optim adam -lr 0.0001 -savedir data/blstm -batchsize 128 -maxbatch 70 -validbleu -nembed 128
The error is :
| [ref] Dictionary: 40003 types
| [src] Dictionary: 40003 types
| IndexedDataset: loaded data/ with 127615 examples
| IndexedDataset: loaded data/ with 88 examples
| IndexedDataset: loaded data/ with 164 examples
| IndexedDataset: loaded data/ with 88 examples
| IndexedDataset: loaded data/ with 164 examples
| epoch 000 | 0001000 updates | words/s 685| trainloss 10.67 | train ppl 1627.17
| epoch 000 | 0002000 updates | words/s 879| trainloss 9.95 | train ppl 987.71
| epoch 000 | 0003000 updates | words/s 668| trainloss 9.68 | train ppl 821.81
/home/lixiang/library/dnn/torch/install/bin/luajit: ...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 2 callback] ...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:348: split(2) cannot split 0 outputs
stack traceback:
[C]: in function 'error'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:348: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
.../torch/install/share/lua/5.1/rnnlib/nn/SequenceTable.lua:150: in function 'forward'
.../torch/install/share/lua/5.1/rnnlib/nn/SequenceTable.lua:150: in function 'uo'
.../torch/install/share/lua/5.1/rnnlib/recurrentnetwork.lua:54: in function 'func'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:371: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...nn/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50
It would be nice if there were any Spanish models available. Could you point me to them in case I missed? Thanks.
For some text, the hypothesis H for the target language (FR), is in the source language (EN):
S <unk> <unk> for me ,
O nobody pray for me ,
H -1.7627129554749 I feel that my return is wrong .
A 1 1 1 2 1 6 6 6 6
The execution args were
[ 'generate-lines',
'-path',
'/root/wmt14.en-fr.fconv-float/model.th7',
'-sourcedict',
'/root/wmt14.en-fr.fconv-float/dict.en.th7',
'-targetdict',
'/root/wmt14.en-fr.fconv-float/dict.fr.th7',
'-beam',
'5',
'-input',
'-' ]
Does this support translation from English into Arabic? What tokenizer do you use?
Can I still train a model using CPU or is GPU a must to use your toolkit?
Thanks,
Mohamed
When doing generate-lines
on CPU
pretrained models, on my MacBookPro, 16GB RAM, i7 I'm always getting not enough memory
error after the 2nd,3rd sentence sent via stdin
:
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
S but now i <unk> m <unk> <unk> this
O but now i ' m countin ' this
H -0.7654139995575 mais maintenant je suis m � me d � crit � ce
A 1 2 4 4 6 6 6 6 6 6 6 7 9
S <unk> where my <unk> lives
O parmesan where my accountant lives
H -1.7098189592361 de la vie de mes enfants
A 1 1 1 1 4 4 4
exec:fairseq stderr:/Users/loretoparisi/torch/install/bin/luajit: not enough memory
My run command was
fairseq generate-lines -path /root/wmt14.en-fr.fconv-float/model.th7 -sourcedict /root/wmt14.en-fr.fconv-float/dict.en.th7 -targetdict /root/wmt14.en-fr.fconv-float/dict.fr.th7 -beam 5 -input -
Note the pre-trained CPU model instead of the CUDA
one.
@jgehring In my understanding, fairseq is using torch tds
vectors, thus it does not rely on LuaJIT
memory allocator (in theory), so it should not have a OOM due to Lua memory limits, etc. If so, what causes the OOM?
NOTE. I'm not having this OOM issue when running on nvidia-docker
with 4GB GPU
and pre-trained GPU model of course.
I get an AccessDenied error when trying to download the pretrained networks.
I ran the following for Data Pre-processing:
fairseq preprocess -sourcelang de -targetlang en \
-trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test \
-thresholdsrc 0 -thresholdtgt 0 -destdir data-bin/iwslt14.tokenized.de-en
As you can see, I set -thresholdsrc
& -thresholdtgt
to 0, still it is replacing tokens in validation and test set with the unknown symbol.
Here's the output of the above command:
| [de] Dictionary: 113533 types
| [de] data/iwslt14.tokenized.de-en/train.de: 160215 sents, 3258560 tokens, 0.00% replaced by <unk>
| [de] data/iwslt14.tokenized.de-en/valid.de: 7282 sents, 149250 tokens, 1.91% replaced by <unk>
| [de] data/iwslt14.tokenized.de-en/test.de: 6750 sents, 132488 tokens, 2.45% replaced by <unk>
| [de] Wrote preprocessed data to data-bin/iwslt14.tokenized.de-en
| [en] Dictionary: 53330 types
| [en] data/iwslt14.tokenized.de-en/train.en: 160215 sents, 3433520 tokens, 0.00% replaced by <unk>
| [en] data/iwslt14.tokenized.de-en/valid.en: 7282 sents, 157272 tokens, 0.59% replaced by <unk>
| [en] data/iwslt14.tokenized.de-en/test.en: 6750 sents, 137891 tokens, 0.85% replaced by <unk>
| [en] Wrote preprocessed data to data-bin/iwslt14.tokenized.de-en
In validation & test data, value should also be 0.00 % right? I am using this for some other task where I don't want the tokens to be replaced by unknown
Is there some option I am missing here?
package.path = package.path .. ';/path/to/fairseq/fairseq/?.lua'
--require 'fairseq'
does not work.
I get this error when I am using one of the pre-trained model.
fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 3
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
> Why is it rare to discover new marine mam@@ mal species ?
/home/elab/Installations/torch/install/bin/luajit: ...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: attempt to index field 'weight' (a nil value)
stack traceback:
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: in function 'onEvaluate'
...install/share/lua/5.1/fairseq/modules/TrainTestLayer.lua:29: in function 'evaluate'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:652: in function 'setup'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:102: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50
Despite fairseq installation on ubuntu/docker takes time, it works pretty straight forward, on macOS
(CPU only) it can be very complicated through luarocks
, due to a lot of dependencies, so I'm putting here this gist with all dependencies to install to get it working. Hope this helps!
Hi,
I am trying to test this model for summarization task (en->en) . I am trying to preprocess my articles and summaries using fairseq preprocess. I get the following error
The command i use for preprocessing is
fairseq preprocess -sourcelang articles -targetlang summaries -trainpref train -validpref valid -testpref test -destdir data-bin/summarize
I have the following tokenized files in my directory :
train.articles , train.summaries , valid.articles, valid.summaries, test.articles, test.summaries -> each containing a sentence per line
Can someone kindly let me know what am I missing here ?
Hi Jonas,
Please advise the following and also the fact that I have a GPU but is Intel and not Nvidia. So I am running into issues. Thank you.
micheles-mba:fairseq micheles$ luarocks make rocks/fairseq-cpu-scm-1.rockspec --local
Missing dependencies for fairseq-cpu:
torch >= 7.0
lua-cjson
rnnlib
torchnet-sequential
visdom
torchnet
argcheck
tbc
threads
tds
nngraph
Error: Could not satisfy dependency: torch >= 7.0
micheles-mba:fairseq micheles$
"This model uses a Byte Pair Encoding (BPE) vocabulary, so we'll have to apply the encoding to the source text."
I couldn't find it in the 'Data Pre-processing' section.
In one of the pre-trained model:
"- We use a BPE sub-word vocublary with 40k tokens, trained jointly on the
training data for both languages (see bpecodes)"
Do you guys use similar approach like Google's seq2seq?
thanks.
Hi,
My training jumped from epoch 20 to epoch 100 and stopped. The command launched for initiating the training was:
fairseq train -sourcelang tr -targetlang en -datadir ../data-bin/wmt17-tr-en \
-model blstm -nembed 200 -noutembed 200 -nhid 500 -dropout 0.3 -dropout_hid 0 -optim adam -lr 0.0003125 \
-savedir blstm -ngpus 1 -validbleu -log_interval 10 -batchsize 64
Here's the end of the log:
| epoch 019 | 0161656 updates | words/s 14482| trainloss 1.94 | train ppl 3.84
| epoch 019 | 0161666 updates | words/s 13245| trainloss 1.97 | train ppl 3.91
| epoch 019 | 0161676 updates | words/s 14300| trainloss 2.45 | train ppl 5.46
| checkpoint 020 | epoch 020 | 0161680 updates | s/checkpnt 632 | words/s 14126 | lr 0.000313
| checkpoint 020 | epoch 020 | 0161680 updates | trainloss 2.10 | train ppl 4.30
| checkpoint 020 | epoch 020 | 0161680 updates | validloss 4.36 | valid ppl 20.55 | testloss 4.40 | test ppl 21.16
| checkpoint 020 | epoch 020 | 0161680 updates | validbleu 17.67 | valid BP 0.90
| checkpoint 020 | epoch 020 | 0161680 updates | saved model to blstm/model_epoch20.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_epoch100.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_last.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_last.th7
| Test with beam=1: BLEU4 = 19.06, 47.9/24.1/14.3/9.0 (BP=0.971, ratio=1.029, sys_len=85120, ref_len=87630)
| Test with beam=5: BLEU4 = 20.60, 51.5/27.1/16.5/10.5 (BP=0.929, ratio=1.074, sys_len=81595, ref_len=87630)
| Test with beam=10: BLEU4 = 20.91, 51.9/27.6/17.0/10.8 (BP=0.923, ratio=1.080, sys_len=81120, ref_len=87630)
| Test with beam=20: BLEU4 = 20.94, 52.0/27.7/17.1/10.9 (BP=0.919, ratio=1.084, sys_len=80841, ref_len=87630)
Another thing is that although I gave -batchsize 64
, the # of updates that were done for completing an epoch was much larger than it should be: ~8000 vs. ~5000 (training set size: 348K samples)
I was wondering if it is possible to modify such efficient seq2seq model to a classification task? If is, what would be a general guideline for such purpose? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.