facebookresearch / fairseq-lua Goto Github PK

View Code? Open in Web Editor NEW

3.7K 211.0 617.0 2.3 MB

Facebook AI Research Sequence-to-Sequence Toolkit

License: Other

CMake 0.37% Shell 0.80% Lua 95.79% C++ 1.88% Python 1.17%

fairseq-lua's People

Contributors

Stargazers

Watchers

Forkers

nguyenducnhaty ainewsbot yalechang wojohowitz00 chagge ericschles coderx7 dchichkov ink-pad yunfeigao stevenlol quanpn90 daedalus wycharry hushuitian ilyeong-ai flaght bityangke kingofoz syx528911137 fireae akirti sophiealex perfectwu chenmoshushi shihuaxing lemonnight jeffrey-antoine enchantedtt shalei120 123chengbo oppa3109 ml-lab jn7163 crazy121 liudongcan pustar tuxxon u20024804 piandpower yanxg xiao2mo johnsonc jayjinseokkim hustluy xuerenlv michaelfeng87 shantanuwadnerkar ml-ai-nlp-ir xiaolei89tw mdmustafizurrahman sunbiaozj baeeq qoboty sync-mp4 leezqcst android20122 eternallovelin unixcrh flysky1991 lzlz99 vanpersie32 shashank31mar alphadl zhangxuanaj xinghudamowang fangzheng354 sher-ali thinhptran center1 ltinphan jvdbogae sdxshuai victoriaw1 iguazi suzhoushr hellohz nlpgz djher ichito vangao codeaudit frialex liviust williamhardys iamsimha xezxey huguanglong nagyist onpoeet datagold2017 little1tow jolinxql collawolley zgsxwsdxg benjamesbabala shsdust shrutijpalaskar neilljc ahastudio

fairseq-lua's Issues

Any plan on pre-trained data for English-to-Chinese translation?

Can fairseq be used on other problems?

I just read about it and I am interested in using it. I am curious to know whether it can be used on other seq2seq problems which a RNN (e.g LSTM, GRU cells) are able to solve?
E.g Image captioning, Document summarization.
Thank you and amazing work!
Lookin forward for your positive response.
Regards,

Instructions for fine-tuning pre-trained models

Is it possible to continue training from the pre-trained models you have released?

My understanding is that Torch needs both a model file and a state file in order to continue a training run, but you've only released model files. Is that right? Thanks!

Use case: domain adaptation experiments where your pre-trained models are used for initialization.

Test Cases failing ?

Hi
When I execute

lua test/test.lua

The test cases shown in the image are failing. Can you please let me know if I am doing something wrong?

how to disable nccl if I just got one GPU and I won't have nccl installed

fairseq-scm-1.rockspec file not found

Hello, I am trying to use fairseq and I run into this issue, please advise

micheles-mba:fairseq.git micheles$ luarocks make rocks/fairseq-scm-1.rockspec --local

Error: File not found: rocks/fairseq-scm-1.rockspec
micheles-mba:fairseq.git micheles$ ls
branches trunk
fairseq.git wmt14.en-fr.fconv-cuda
micheles-mba:fairseq.git micheles$

invalid next size error while training

Hi,

I am trying to train the fconv model for summarization task. I get the following error when doing so. Can someone kindly let me know what the issue is ?

root@c4cf4e23aba9:~/torch/fairseq# fairseq train -sourcelang comments -targetlang summaries -datadir data-bin/questions -model fconv -nenclayer 4 -nlayer 4 -dropout 0.2 -optim sgd -lr 0.9 -clip 0.5 -momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv
| [summaries] Dictionary: 55314 types
| [comments] Dictionary: 129040 types
| IndexedDataset: loaded data-bin/questions with 77000 examples
| IndexedDataset: loaded data-bin/questions with 1600 examples
| IndexedDataset: loaded data-bin/questions with 110 examples
| IndexedDataset: loaded data-bin/questions with 1600 examples
| IndexedDataset: loaded data-bin/questions with 110 examples
*** Error in `/root/torch/install/bin/luajit': realloc(): invalid next size: 0x00007fb6fd7719f0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb9464cc7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x82a5a)[0x7fb9464d7a5a]
/lib/x86_64-linux-gnu/libc.so.6(realloc+0x179)[0x7fb9464d8c89]
/root/torch/install/lib/libTH.so.0(THRealloc+0x3a)[0x7fb9454bde8a]
/root/torch/install/lib/libTH.so.0(THByteStorage_resize+0x33)[0x7fb9454bfe33]
/root/torch/install/lib/libTH.so.0(THByteTensor_newWithTensor+0x62)[0x7fb9454dcef2]
/root/torch/install/lib/lua/5.1/libtorch.so(+0x5b8af)[0x7fb945c858af]
/root/torch/install/bin/luajit[0x47dbaa]
/root/torch/install/lib/libluaT.so.0(+0x2aa6)[0x7fb945a22aa6]
/root/torch/install/bin/luajit[0x47dbaa]
/root/torch/install/bin/luajit[0x44068e]
/root/torch/install/bin/luajit[0x47df19]
/root/torch/install/bin/luajit(lua_close+0x90)[0x46d800]
/root/torch/install/lib/libthreadsmain.so(THThread_main+0x57)[0x7fb92b08f997]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fb946a3b6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb94655b82d]
======= Memory map: ========
00400000-0049a000 r-xp 00000000 00:2d 139896                             /root/torch/install/bin/luajit
005d0000-00640000 r-xp 00000000 00:00 0
00699000-0069a000 r--p 00099000 00:2d 139896                             /root/torch/install/bin/luajit
0069a000-0069b000 rw-p 0009a000 00:2d 139896                             /root/torch/install/bin/luajit
01420000-01470000 r-xp 00000000 00:00 0
01aaf000-0bcdf000 rw-p 00000000 00:00 0                                  [heap]
0bce0000-0bd10000 r-xp 00000000 00:00 0

cudnnutils.lua:13: You must have the torch Cudnn bindings.

I did

# Install CUDA libraries
RUN luarocks install torch && \
  luarocks install cutorch && \
  luarocks install cunn && \
  luarocks install cudnn

in a docker container with torch installed.

root@cf9f9a5a3f42:~/fairseq# fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
Tried loading libnccl.so.1 but got error /root/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory
Tried loading libnccl.1.dylib but got error /root/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/rnnlib/cudnnutils.lua:13: You must have the torch Cudnn bindings.
stack traceback:
	[C]: in function 'assert'
	/root/torch/install/share/lua/5.1/rnnlib/cudnnutils.lua:13: in main chunk
	[C]: in function 'require'
	...orch/install/share/lua/5.1/rnnlib/nn/WrappedCudnnRnn.lua:14: in main chunk
	[C]: in function 'require'
	/root/torch/install/share/lua/5.1/rnnlib/init.lua:18: in main chunk
	[C]: in function 'require'
	...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:18: in main chunk
	[C]: in function 'require'
	/root/torch/install/share/lua/5.1/fairseq/models/init.lua:15: in main chunk
	[C]: in function 'require'
	/root/torch/install/share/lua/5.1/fairseq/init.lua:14: in main chunk
	[C]: in function 'require'
	...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:15: in main chunk
	[C]: in function 'require'
	...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
	[C]: at 0x00405d50
root@cf9f9a5a3f42:~/fairseq#

I have cuDNN installed:

root@cf9f9a5a3f42:~# luarocks install cudnn
Installing https://raw.githubusercontent.com/torch/rocks/master/cudnn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cudnn-scm-1.rockspec... switching to 'build' mode
Cloning into 'cudnn.torch'...
remote: Counting objects: 60, done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 60 (delta 15), reused 16 (delta 3), pack-reused 0
Receiving objects: 100% (60/60), 67.93 KiB | 0 bytes/s, done.
Resolving deltas: 100% (15/15), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cudnn/scm-1" && make

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /root/torch/install
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found suitable version "8.0", minimum required is "7.0") 
-- Could NOT find PkgConfig (missing:  PKG_CONFIG_EXECUTABLE) 
-- CuDNN 5.1 not found at install-time. Please make sure it's in LD_LIBRARY_PATH at runtime
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cudnn-scm-1-8695/cudnn.torch/build
cd build && make install
Install the project...
-- Install configuration: "Release"
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialCrossEntropyCriterion.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialBatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BLSTM.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/LSTM.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/TemporalConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pointwise.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/convert.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/find.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialLogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialCrossMapLRN.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ClippedReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/ffi.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/LogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialMaxPooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Sigmoid.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialAveragePooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricLogSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricBatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricMaxPooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNN.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricCrossEntropyCriterion.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BGRU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialDivisiveNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNNTanh.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/BatchNormalization.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/env.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/functional.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialFullConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Tanh.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/Pooling3D.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/RNNReLU.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricFullConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricConvolution.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/init.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/SpatialSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricSoftMax.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/VolumetricAveragePooling.lua
-- Installing: /root/torch/install/lib/luarocks/rocks/cudnn/scm-1/lua/cudnn/GRU.lua
Updating manifest for /root/torch/install/lib/luarocks/rocks
cudnn scm-1 is now built and installed in /root/torch/install/ (license: BSD)

Caffe2 model ?

Is it possible to provide a Caffe2 model along with the torch one ?

testing new models

Hi Jonas, just a curiosity. What are the correct formats and steps to test new models in other languages, please? Please advise. Thank you.

Debug it in zbstudio, but get error. Thank you.

First I have done http://torch.ch/docs/getting-started.html and luarocks make rocks/fairseq-scm-1.rockspec and facebookresearch/fairseq#24
Then I run train.lua in zbstudio, but get error (at line require 'fairseq'
https://github.com/facebookresearch/fairseq/blob/master/train.lua#L17):

Tried loading libnccl.1.dylib but got error /home/gt/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
Tried loading libnccl.so.1 but got error /home/gt/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory

The program could continue to run until:

| IndexedDataset: loaded /home/gt/fairseq/data-bin/iwslt14.tokenized.de-en with 6750 examples
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:281: attempt to index field 'nccl' (a nil value)
stack traceback:
	...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua: in function 'doTrain'
	...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
	train.lua:404: in main chunk

PyTorch fconv implementation

Are there plans to add a PyTorch implementation of "Convolutional Sequence to Sequence Learning" to this repo?

Zero shot

Is this architecture compatible with zero shot translation?
If yes: Are there plans to implement it?

Using fairseq to train for a general seq2seq task

Hi,

is it possible to give list of tensors as input to fairseq torch code?

Ex : Input -> list of tensors
output -> list of tensors

Will I be able to accomplish the above task using fairseq? If so, can you please point me to a example?

Thanks

Check for empty tensor in data.makeInput() and during pre-processing

Add an assert that checks if T is empty and provide a sensible error message. Currently, this will produce a trace as described in #46. An additional check could be added for pre-processing.

Out of memory error with data from opensubtitle.org

Data: downloaded de-en pair from opensubtitles.org. About 13M sentence.

Train: same as provided in the readme.md
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en
-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv

Machine: Amazon EC2 instance with 8GPU each 12G memory

Error: memory error.

Which parameter I can change without loss of accuracy?

decoder problem

when i use:
fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
decode the example model, i found problem as follow:

| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types

hello
/usr/local/torch/install/bin/luajit: /usr/local/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
In 2 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 2 module of nn.Sequential:
In 2 module of nn.Sequential:
...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:10: attempt to call field 'GatedLinear_updateOutput' (a nil value)
stack traceback:
...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:10: in function <...local/torch/install/share/lua/5.1/nn/GatedLinearUnit.lua:8>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function </usr/local/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
...
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:355: in function 'encode'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:108: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...install/lib/luarocks/rocks/fairseq-cpu/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064f0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/usr/local/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/usr/local/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/usr/local/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:355: in function 'encode'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:108: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...install/lib/luarocks/rocks/fairseq-cpu/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064f0

I have update torch and nn version, and there is no problem in cpu

and also when i train a new model, Standard bi-directional LSTM model and Convolutional encoder, LSTM decoder is ok, but Fully convolutional sequence-to-sequence model hava a error as follow:

The gpu memory of M40 as the paper reported is 12GB or 24GB?

I try to product the experiment result as the paper reported on K40 (12GB) , but i find i can't set the batch size as much as the paper reported like 48 in wmt14 en-ge?

Missing dependencies for fairseq

When launching the install with luarocks make rocks/fairseq-scm-1.rockspec I get

Missing dependencies for fairseq:
rnnlib 
torchnet 
visdom 
torchnet-sequential 
nccl 
tbc 
tds

Will the install script adding those dependencies automatically?

Installation issue

Hi,

I tried to install fairseq but failed with the below error message. It seems that the server I am using is missing OpenBLAS which is required to install torch/tds library. However, I am not a sudoer to install OpenBLAS on /opt/OpenBLAS. Is there any way to install OpenBLAS locally and link it to luarocks? Thanks!

Missing dependencies for fairseq:
torchnet
visdom
torchnet-sequential
tbc
nccl
tds

Using https://raw.githubusercontent.com/torch/rocks/master/torchnet-scm-1.rockspec... switching to 'build' mode

Missing dependencies for torchnet:
tds >= 1.0

Using https://raw.githubusercontent.com/torch/rocks/master/tds-scm-1.rockspec... switching to 'build' mode
Cloning into 'tds'...
remote: Counting objects: 32, done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 32 (delta 5), reused 10 (delta 1), pack-reused 0
Receiving objects: 100% (32/32), 23.14 KiB | 0 bytes/s, done.
Resolving deltas: 100% (5/5), done.
Checking connectivity... done.
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /home/vhoang2/torch/install
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_tds-scm-1-9010/tds/build.luarocks
Scanning dependencies of target tds
[ 16%] Building C object CMakeFiles/tds.dir/tds_utils.c.o
[ 33%] Building C object CMakeFiles/tds.dir/tds_elem.c.o
[ 50%] Building C object CMakeFiles/tds.dir/tds_hash.c.o
[ 66%] Building C object CMakeFiles/tds.dir/tds_vec.c.o
[ 83%] Building C object CMakeFiles/tds.dir/tds_atomic_counter.c.o
make[2]: *** No rule to make target '/opt/OpenBLAS/lib/libopenblas.so', needed by 'libtds.so'. Stop.
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/tds.dir/all' failed
make[1]: *** [CMakeFiles/tds.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/torchnet-scm-1.rockspec - Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/tds-scm-1.rockspec - Build error: Failed building.

Facebook

I need see the friend and the page yo my girls friend

is this is the right to use dropout

I want to use dropout in the training. I read the file train.lua. I found that add "-dropout 0.25" in the this command line.
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en
-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv -dropout 0.25

is this is the right way?

nan loss / ppl when training blstm model

I'm training a set of translation models using the suggested fconv parameters (but the model switched to blstm):

fairseq train -sourcelang en -targetlang fr -datadir data/fairseq/en-fr -model blstm -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -momentum 0.99 -timeavg -bptt 0 -savedir data/fairseq/en-fr.blstm -batchsize 16 | tee train..blstm.log

I'm seeing loss and perplexity become nan after a few epochs:

| epoch 000 | 0001000 updates | words/s    4328| trainloss     8.72 | train ppl   420.34
| epoch 000 | 0002000 updates | words/s    4559| trainloss     6.91 | train ppl   120.29
| checkpoint 001 | epoch 001 | 0002645 updates | s/checkpnt     767 | words/s    4461 | lr 0.250000
| checkpoint 001 | epoch 001 | 0002645 updates | trainloss     7.40 | train ppl   169.38
| checkpoint 001 | epoch 001 | 0002645 updates | validloss     5.87 | valid ppl    58.37 | testloss     5.82 | test ppl    56.55
| epoch 001 | 0003645 updates | words/s    4371| trainloss     5.85 | train ppl    57.84
| epoch 001 | 0004645 updates | words/s    4373| trainloss     5.58 | train ppl    47.91
| checkpoint 002 | epoch 002 | 0005290 updates | s/checkpnt     783 | words/s    4373 | lr 0.250000
| checkpoint 002 | epoch 002 | 0005290 updates | trainloss     5.65 | train ppl    50.15
| checkpoint 002 | epoch 002 | 0005290 updates | validloss     5.25 | valid ppl    38.13 | testloss     5.21 | test ppl    36.96
| epoch 002 | 0006290 updates | words/s    4327| trainloss     5.33 | train ppl    40.15
| epoch 002 | 0007290 updates | words/s    4274| trainloss     5.24 | train ppl    37.82
| checkpoint 003 | epoch 003 | 0007935 updates | s/checkpnt     800 | words/s    4281 | lr 0.250000
| checkpoint 003 | epoch 003 | 0007935 updates | trainloss     5.25 | train ppl    38.07
| checkpoint 003 | epoch 003 | 0007935 updates | validloss     4.99 | valid ppl    31.81 | testloss     4.95 | test ppl    30.86
| epoch 003 | 0008935 updates | words/s    4235| trainloss      nan | train ppl      nan
| epoch 003 | 0009935 updates | words/s    4341| trainloss      nan | train ppl      nan
| checkpoint 004 | epoch 004 | 0010580 updates | s/checkpnt     791 | words/s    4325 | lr 0.250000
| checkpoint 004 | epoch 004 | 0010580 updates | trainloss      nan | train ppl      nan
| checkpoint 004 | epoch 004 | 0010580 updates | validloss      nan | valid ppl      nan | testloss      nan | test ppl      nan
| epoch 004 | 0011580 updates | words/s    4341| trainloss      nan | train ppl      nan
| epoch 004 | 0012580 updates | words/s    4347| trainloss      nan | train ppl      nan
| checkpoint 005 | epoch 005 | 0013225 updates | s/checkpnt     791 | words/s    4328 | lr 0.250000
| checkpoint 005 | epoch 005 | 0013225 updates | trainloss      nan | train ppl      nan
| checkpoint 005 | epoch 005 | 0013225 updates | validloss      nan | valid ppl      nan | testloss      nan | test ppl      nan

Is this something I should expect? Would you guess this is a parameter configuration issue (eg the optimizer is being too aggressive and overflowing) or does this suggest a bug (eg an overflow in the loss or perplexity code)?

Can I only use CPU to debug and understand it?

I don't want to train a whole model and predict. I only want to understand it. Is it possible to debug it, only using CPU without GPU?
Thank you!

training error

I have re-installed torch and dependency libraries. But I met the thread errors at the beginning of the training. Command are as follows:
fairseq train -sourcelang de -targetlang en -datadir data-bin/iwslt14.tokenized.de-en \

-model fconv -nenclayer 4 -nlayer 3 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1
-momentum 0.99 -timeavg -bptt 0 -savedir trainings/fconv

Error are as follows:

In 3 module of nn.Sequential:
In 2 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 2 module of nn.Sequential:
In 1 module of nn.Sequential:
...rch/install/share/lua/5.1/tbc/TemporalConvolutionTBC.lua:44: attempt to index field 'TBC' (a nil value)
stack traceback:
...rch/install/share/lua/5.1/tbc/TemporalConvolutionTBC.lua:44: in function 'updateOutput'
...din/public/torch/install/share/lua/5.1/nn/WeightNorm.lua:115: in function <...din/public/torch/install/share/lua/5.1/nn/WeightNorm.lua:111>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...din/public/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...odin/public/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...din/public/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...n/public/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...din/public/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...din/public/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...n/public/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:371: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...ic/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x004064a0

test problem

when i run airseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
i get error
Tried loading libnccl.so.1 but got error /home/actl/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.so.1: cannot open shared object file: No such file or directory
Tried loading libnccl.1.dylib but got error /home/actl/torch/install/share/lua/5.1/nccl/ffi.lua:192: libnccl.1.dylib: cannot open shared object file: No such file or directory
why?

help me unable to convert argument 2 from cdata<int

lenovo1601@lenovo1601-Lenovo:~/fairseq$ fairseq preprocess -sourcelang de -targetlang en -trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test -thresholdsrc 3 -thresholdtgt 3 -destdir data-bin/iwslt14.tokenized.de-en
/home/lenovo1601/torch/install/bin/lua: unable to convert argument 2 from cdata<int ()(struct tds_elem_, struct tds_elem_)> to cdata<int ()(const struct tds_elem_, const struct tds_elem_)>
stack traceback:
[C]: in function 'tds_vec_sort'
/home/lenovo1601/torch/install/share/lua/5.2/tds/vec.lua:98: in function 'sort'
.../torch/install/share/lua/5.2/fairseq/text/Dictionary.lua:79: in function 'finalize'
...1/torch/install/share/lua/5.2/fairseq/text/tokenizer.lua:79: in function <...1/torch/install/share/lua/5.2/fairseq/text/tokenizer.lua:74>
(...tail calls...)
...rch/install/share/lua/5.2/fairseq/scripts/preprocess.lua:110: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: in ?

Error：...../torch/install/share/lua/5.1/torch/init.lua:102: class nn.ZipTable has been already assigned a parent class

When I install torch following http://torch.ch/docs/getting-started.html, and fairseq, and run fairseq generate-lines command, it comes this error.

fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
/home/torch/install/bin/luajit: /home/torch/install/share/lua/5.1/torch/init.lua:102: class nn.ZipTable has been already assigned a parent class

stack traceback:
[C]: in function 'newmetatable'
/home/torch/install/share/lua/5.1/torch/init.lua:102: in function 'class'
.../torch/install/share/lua/5.1/rnnlib/nn/ZipTable.lua:12: in main chunk
[C]: in function 'require'
/home/torch/install/share/lua/5.1/rnnlib/init.lua:34: in main chunk
[C]: in function 'require'
...h/install/share/lua/5.1/fairseq/models/avgpool_model.lua:18: in main chunk
[C]: in function 'require'
.../torch/install/share/lua/5.1/fairseq/models/init.lua:15: in main chunk
[C]: in function 'require'
/home/torch/install/share/lua/5.1/fairseq/init.lua:14: in main chunk
[C]: in function 'require'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:15: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00406670

How could I solve this problem？

THCStorage.cu out of memory

root@7ab936c99d66:~/fairseq# fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1095/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-1095/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
	[C]: in function 'read'
	/root/torch/install/share/lua/5.1/torch/File.lua:351: in function </root/torch/install/share/lua/5.1/torch/File.lua:245>
	[C]: in function 'read'
	/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
	/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/root/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
	/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	...
	/root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
	/root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:32: in function '__init'
	/root/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new'
	...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:153: in main chunk
	[C]: in function 'require'
	...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
	[C]: at 0x00405d50

Not sure, but here the THCStorage it seems torch is trying to allocate more GPU mem of what I have right here

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   38C    P8    17W / 125W |   3977MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     18259    C   /root/torch/install/bin/luajit                3975MiB |
+-----------------------------------------------------------------------------+

How to set a max GPU memory allocation then?

Are you going to post Chinese version of it?

Out of memory error when running the quick start pre-trained model

I tried running the example for translating English to French:

fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 5

However, the following error was given:

| [target] Dictionary: 44666 types | [source] Dictionary: 44409 types THCudaCheck FAIL file=/home/james/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/james/torch/install/bin/luajit: /home/james/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /home/james/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function </home/james/torch/install/share/lua/5.1/torch/File.lua:245> [C]: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read' /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' ... /home/james/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject' /home/james/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' .../install/share/lua/5.1/fairseq/models/ensemble_model.lua:32: in function '__init' /home/james/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new' ...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:153: in main chunk [C]: in function 'require' ...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk [C]: at 0x00405d50

I tried reinstalling Torch and fairseq as well as making sure that I have the most recent version of nn (luarocks install nn). Unfortunately none of these attempts fixed the problem.

The weird thing is that neither my 2 GB of GPU memory nor my 16 GB of RAM seemed to get remotely full before the error message appeared.

Any help would be appreciated.

Error when using generate-lines with CPU model

Hi,
I am trying to test the translation for the English to German CPU model. I get the following error when trying to load the model

bad argument #2 to 'narrow'

Hi
I am training a model using wmt en-fr data. I meet a error.
/home/work/liuhongyu/torch/torch/install/bin/luajit: ...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] ...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:41: bad argument #2 to 'narrow' (out of range at /home/work/liuhongyu/torch/torch/pkg/torch/lib/TH/generic/THTensor.c:368)
stack traceback:
[C]: in function 'narrow'
...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:41: in function 'makeInput'
...ch/torch/install/share/lua/5.1/fairseq/torchnet/data.lua:594: in function 'get'
.../install/share/lua/5.1/torchnet/dataset/batchdataset.lua:120: in function 'get'
...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:110: in function <...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:109>
[C]: in function 'xpcall'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...are/lua/5.1/torchnet/dataset/paralleldatasetiterator.lua:150: in function 'loop'
...stall/share/lua/5.1/torchnet/dataset/datasetiterator.lua:107: in function 'next_from_base'
...hare/lua/5.1/fairseq/torchnet/ShardedDatasetIterator.lua:102: in function 'loop'
...stall/share/lua/5.1/torchnet/dataset/datasetiterator.lua:107: in function 'gloop'
...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:47: in function <...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:43>
[C]: in function 'xpcall'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ngyu/torch/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:15: in main chunk
stack traceback:
[C]: in function 'error'
...yu/torch/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...hare/lua/5.1/fairseq/torchnet/SingleParallelIterator.lua:63: in function '(for generator)'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:320: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...ch/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00406820

I can run the training using wmt en-de data. Is there something wrong with the data preprocess? or some thing else.

attempt to index field 'weight' (a nil value)

If I try to run this example I get an exeption:

fairseq generate-lines -path wmt14.en-de.fconv-cuda/model.th7 -sourcedict wmt14.en-de.fconv-cuda/dict.de.th7 -targetdict wmt14.en-de.fconv-cuda/dict.en.th7 -beam 5
| [target] Dictionary: 42242 types
| [source] Dictionary: 43675 types

Dieser Zug fährt vom München nach Paris
/home/xxx/tools/torch/install/bin/luajit: ...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: attempt to index field 'weight' (a nil value)
stack traceback:
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: in function 'onEvaluate'
...install/share/lua/5.1/fairseq/modules/TrainTestLayer.lua:29: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
...el/tools/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:652: in function 'setup'
.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:102: in function 'generate'
...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50

Any ideas?

Traning error

I have re-installed torch and dependency libraries. But I met the thread errors when training on my own training data. Although the beginning seems correctly. I trained the model on 2 GPU cards. The training command is as follow.

fairseq train -sourcelang src -targetlang ref -datadir nist-bin/ -model blstm -nhid 512 -dropout 0.2 -dropout_hid 0 -optim adam -lr 0.0001 -savedir data/blstm -batchsize 128 -maxbatch 70 -validbleu -nembed 128

The error is :

| [ref] Dictionary: 40003 types
| [src] Dictionary: 40003 types
| IndexedDataset: loaded data/ with 127615 examples
| IndexedDataset: loaded data/ with 88 examples
| IndexedDataset: loaded data/ with 164 examples
| IndexedDataset: loaded data/ with 88 examples
| IndexedDataset: loaded data/ with 164 examples
| epoch 000 | 0001000 updates | words/s 685| trainloss 10.67 | train ppl 1627.17
| epoch 000 | 0002000 updates | words/s 879| trainloss 9.95 | train ppl 987.71
| epoch 000 | 0003000 updates | words/s 668| trainloss 9.68 | train ppl 821.81
/home/lixiang/library/dnn/torch/install/bin/luajit: ...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 2 callback] ...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:348: split(2) cannot split 0 outputs
stack traceback:
[C]: in function 'error'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:348: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
.../torch/install/share/lua/5.1/rnnlib/nn/SequenceTable.lua:150: in function 'forward'
.../torch/install/share/lua/5.1/rnnlib/nn/SequenceTable.lua:150: in function 'uo'
.../torch/install/share/lua/5.1/rnnlib/recurrentnetwork.lua:54: in function 'func'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...rary/dnn/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:356: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:333>
[C]: in function 'xpcall'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...ibrary/dnn/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...rary/dnn/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:371: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...nn/torch/install/share/lua/5.1/fairseq/scripts/train.lua:405: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00405d50

Are you going to post Spanish models?

It would be nice if there were any Spanish models available. Could you point me to them in case I missed? Thanks.

Pretrained: the H evaluation is not in the TARGET language

For some text, the hypothesis H for the target language (FR), is in the source language (EN):

S	<unk> <unk> for me , 
O	nobody pray for me ,  
H	-1.7627129554749	I feel that my return is wrong . 
A	1 1 1 2 1 6 6 6 6

The execution args were

 [ 'generate-lines',
  '-path',
  '/root/wmt14.en-fr.fconv-float/model.th7',
  '-sourcedict',
  '/root/wmt14.en-fr.fconv-float/dict.en.th7',
  '-targetdict',
  '/root/wmt14.en-fr.fconv-float/dict.fr.th7',
  '-beam',
  '5',
  '-input',
  '-' ]

Arabic support

Does this support translation from English into Arabic? What tokenizer do you use?

Can I still train a model using CPU or is GPU a must to use your toolkit?

Thanks,
Mohamed

generate-lines: not enough memory for CPU pre-trained

When doing generate-lines on CPU pretrained models, on my MacBookPro, 16GB RAM, i7 I'm always getting not enough memory error after the 2nd,3rd sentence sent via stdin:

| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
S	but now i <unk> m <unk> <unk> this 
O	but now i ' m countin '  this
H	-0.7654139995575	mais maintenant je suis m � me d � crit � ce 
A	1 2 4 4 6 6 6 6 6 6 6 7 9
S	<unk> where my <unk> lives 
O	parmesan where my accountant lives
H	-1.7098189592361	de la vie de mes enfants 
A	1 1 1 1 4 4 4
exec:fairseq stderr:/Users/loretoparisi/torch/install/bin/luajit: not enough memory

My run command was

fairseq generate-lines -path /root/wmt14.en-fr.fconv-float/model.th7 -sourcedict /root/wmt14.en-fr.fconv-float/dict.en.th7 -targetdict /root/wmt14.en-fr.fconv-float/dict.fr.th7 -beam 5 -input -

Note the pre-trained CPU model instead of the CUDA one.

@jgehring In my understanding, fairseq is using torch tds vectors, thus it does not rely on LuaJIT memory allocator (in theory), so it should not have a OOM due to Lua memory limits, etc. If so, what causes the OOM?

NOTE. I'm not having this OOM issue when running on nvidia-docker with 4GB GPU and pre-trained GPU model of course.

Can't download pretrained networks

I get an AccessDenied error when trying to download the pretrained networks.

omra

LICENSE.txt
README.txt

Tokens getting replaced by <unk> even when the threshold is 0

I ran the following for Data Pre-processing:

fairseq preprocess -sourcelang de -targetlang en \
  -trainpref $TEXT/train -validpref $TEXT/valid -testpref $TEXT/test \
  -thresholdsrc 0 -thresholdtgt 0 -destdir data-bin/iwslt14.tokenized.de-en

As you can see, I set -thresholdsrc & -thresholdtgt to 0, still it is replacing tokens in validation and test set with the unknown symbol.

Here's the output of the above command:

| [de] Dictionary: 113533 types
| [de] data/iwslt14.tokenized.de-en/train.de: 160215 sents, 3258560 tokens, 0.00% replaced by <unk>
| [de] data/iwslt14.tokenized.de-en/valid.de: 7282 sents, 149250 tokens, 1.91% replaced by <unk>
| [de] data/iwslt14.tokenized.de-en/test.de: 6750 sents, 132488 tokens, 2.45% replaced by <unk>
| [de] Wrote preprocessed data to data-bin/iwslt14.tokenized.de-en
| [en] Dictionary: 53330 types
| [en] data/iwslt14.tokenized.de-en/train.en: 160215 sents, 3433520 tokens, 0.00% replaced by <unk>
| [en] data/iwslt14.tokenized.de-en/valid.en: 7282 sents, 157272 tokens, 0.59% replaced by <unk>
| [en] data/iwslt14.tokenized.de-en/test.en: 6750 sents, 137891 tokens, 0.85% replaced by <unk>
| [en] Wrote preprocessed data to data-bin/iwslt14.tokenized.de-en

In validation & test data, value should also be 0.00 % right? I am using this for some other task where I don't want the tokens to be replaced by unknown

Is there some option I am missing here?

how to require the local lua file, not the installed one.

package.path = package.path .. ';/path/to/fairseq/fairseq/?.lua'
--require 'fairseq'

does not work.

Error using pre-trained model

I get this error when I am using one of the pre-trained model.

 fairseq generate-lines -path wmt14.en-fr.fconv-cuda/model.th7 -sourcedict wmt14.en-fr.fconv-cuda/dict.en.th7 -targetdict wmt14.en-fr.fconv-cuda/dict.fr.th7 -beam 3
| [target] Dictionary: 44666 types
| [source] Dictionary: 44409 types
> Why is it rare to discover new marine mam@@ mal species ?
/home/elab/Installations/torch/install/bin/luajit: ...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: attempt to index field 'weight' (a nil value)
stack traceback:
	...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:84: in function 'onEvaluate'
	...install/share/lua/5.1/fairseq/modules/TrainTestLayer.lua:29: in function 'evaluate'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'func'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:73: in function 'applyToModules'
	...stallations/torch/install/share/lua/5.1/nn/Container.lua:91: in function 'evaluate'
	...rch/install/share/lua/5.1/fairseq/models/fconv_model.lua:652: in function 'setup'
	.../install/share/lua/5.1/fairseq/models/ensemble_model.lua:102: in function 'generate'
	...install/share/lua/5.1/fairseq/scripts/generate-lines.lua:218: in main chunk
	[C]: in function 'require'
	...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
	[C]: at 0x00405d50

Install on macOS CPU

Despite fairseq installation on ubuntu/docker takes time, it works pretty straight forward, on macOS (CPU only) it can be very complicated through luarocks, due to a lot of dependencies, so I'm putting here this gist with all dependencies to install to get it working. Hope this helps!

How to use this for summarization?

Hi,
I am trying to test this model for summarization task (en->en) . I am trying to preprocess my articles and summaries using fairseq preprocess. I get the following error

The command i use for preprocessing is

fairseq preprocess -sourcelang articles -targetlang summaries -trainpref train -validpref valid -testpref test -destdir data-bin/summarize

I have the following tokenized files in my directory :
train.articles , train.summaries , valid.articles, valid.summaries, test.articles, test.summaries -> each containing a sentence per line

Can someone kindly let me know what am I missing here ?

Error: Could not satisfy dependency: torch >= 7.0 and using only CPU no GPU compatible

Hi Jonas,

Please advise the following and also the fact that I have a GPU but is Intel and not Nvidia. So I am running into issues. Thank you.

micheles-mba:fairseq micheles$ luarocks make rocks/fairseq-cpu-scm-1.rockspec --local

Missing dependencies for fairseq-cpu:
torch >= 7.0
lua-cjson
rnnlib
torchnet-sequential
visdom
torchnet
argcheck
tbc
threads
tds
nngraph

Error: Could not satisfy dependency: torch >= 7.0
micheles-mba:fairseq micheles$

How to do BPE learn and apply?

"This model uses a Byte Pair Encoding (BPE) vocabulary, so we'll have to apply the encoding to the source text."

I couldn't find it in the 'Data Pre-processing' section.

In one of the pre-trained model:
"- We use a BPE sub-word vocublary with 40k tokens, trained jointly on the
training data for both languages (see bpecodes)"

Do you guys use similar approach like Google's seq2seq?

thanks.

Epoch jumped from 20 to 100

Hi,

My training jumped from epoch 20 to epoch 100 and stopped. The command launched for initiating the training was:

fairseq train -sourcelang tr -targetlang en -datadir ../data-bin/wmt17-tr-en \
  -model blstm -nembed 200 -noutembed 200 -nhid 500 -dropout 0.3 -dropout_hid 0 -optim adam -lr 0.0003125 \
  -savedir blstm -ngpus 1 -validbleu -log_interval 10 -batchsize 64

Here's the end of the log:

| epoch 019 | 0161656 updates | words/s   14482| trainloss     1.94 | train ppl     3.84
| epoch 019 | 0161666 updates | words/s   13245| trainloss     1.97 | train ppl     3.91
| epoch 019 | 0161676 updates | words/s   14300| trainloss     2.45 | train ppl     5.46
| checkpoint 020 | epoch 020 | 0161680 updates | s/checkpnt     632 | words/s   14126 | lr 0.000313
| checkpoint 020 | epoch 020 | 0161680 updates | trainloss     2.10 | train ppl     4.30
| checkpoint 020 | epoch 020 | 0161680 updates | validloss     4.36 | valid ppl    20.55 | testloss     4.40 | test ppl    21.16
| checkpoint 020 | epoch 020 | 0161680 updates | validbleu    17.67 | valid BP      0.90
| checkpoint 020 | epoch 020 | 0161680 updates | saved model to blstm/model_epoch20.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_epoch100.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_last.th7
| epoch 100 | 0161680 updates | 0000000 epoch updates | path blstm/state_last.th7
| Test with beam=1: BLEU4 = 19.06, 47.9/24.1/14.3/9.0 (BP=0.971, ratio=1.029, sys_len=85120, ref_len=87630)
| Test with beam=5: BLEU4 = 20.60, 51.5/27.1/16.5/10.5 (BP=0.929, ratio=1.074, sys_len=81595, ref_len=87630)
| Test with beam=10: BLEU4 = 20.91, 51.9/27.6/17.0/10.8 (BP=0.923, ratio=1.080, sys_len=81120, ref_len=87630)
| Test with beam=20: BLEU4 = 20.94, 52.0/27.7/17.1/10.9 (BP=0.919, ratio=1.084, sys_len=80841, ref_len=87630)

Another thing is that although I gave -batchsize 64, the # of updates that were done for completing an epoch was much larger than it should be: ~8000 vs. ~5000 (training set size: 348K samples)

Is it possible to modify this model for classification purpose

I was wondering if it is possible to modify such efficient seq2seq model to a classification task? If is, what would be a general guideline for such purpose? Thanks!