Git Product home page Git Product logo

namas's Introduction

Attention-Based Summarization

This project contains the Abs. neural abstractive summarization system from the paper

 A Neural Attention Model for Abstractive Summarization.
 Alexander M. Rush, Sumit Chopra, Jason Weston.

The release includes code for:

  • Extracting the summarization data set
  • Training the neural summarization model
  • Constructing evaluation sets with ROUGE
  • Tuning extractive features

Setup

To run the system, you will need to have Torch7 installed. You will also need Python 2.7, NLTK, and GNU Parallel to run the data processing scripts. Additionally the code currently requires a CUDA GPU for training and decoding.

Finally the scripts require that you set the $ABS environment variable.

> export ABS=$PWD
> export LUA_PATH="$LUA_PATH;$ABS/?.lua"

Constructing the Data Set

The model is trained to perform title generation from the first line of newspaper articles. Since the system is completely data-driven it requires a large set of aligned input-title pairs for training.

To provide these pairs we use the Annotated Gigaword corpus as our main data set. The corpus is available on LDC, but it requires membership. Once the annotated gigaword is obtained, you can simply run the provided script to extract the data set in text format.

Generating the data

To construct the data set run the following script to produce working_dir/, where `working_dir/' is the path to the directory where you want to store the processed data. The script 'construct_data.sh' makes use of the 'parallel' utility, so please make sure that it is in your path. WARNING: This may take a couple hours to run.

 > ./construct_data.sh agiga/ working_dir/

Format of the data files

The above command builds aligned files of the form split.type.txt where split is train/valid/test and type is title/article.

The output of the script is several aligned plain-text files. Each has one title or article per line.

 > head train.title.txt
 australian current account deficit narrows sharply
 at least two dead in southern philippines blast
 australian stocks close down #.# percent
 envoy urges north korea to restart nuclear disablement
 skorea announces tax cuts to stimulate economy

These files can be used to train the ABS system or be used by other baseline models.

Training the Model

Once the data set has been constructed, we provide a simple script to train the model.

./train_model.sh working_dir/ model.th

The training process consists of two stages. First we convert the text files into generic input-title matrices and then we train a conditional NNLM on this representation.

Once the model has been fully trained (this may require 3-4 days), you can use the test script to produce summaries of any plain text file.w

./test_model.sh working_dir/valid.article.filter.txt model.th length_of_summary

Training options

These scripts utilize the Torch code available in $ABS/summary/

There are two main torch entry points. One for training the model from data matrices and the other for evaluating the model on plain-text.

 > th summary/train.lua -help

 Train a summarization model.

   -articleDir      Directory containing article training matrices. []
   -titleDir        Directory containing title training matrices. []
   -validArticleDir Directory containing article matricess for validation. []
   -validTitleDir   Directory containing title matrices for validation. []
   -auxModel        The encoder model to use. [bow]
   -bowDim          Article embedding size. [50]
   -attenPool       Attention model pooling size. [5]
   -hiddenUnits     Conv net encoder hidden units. [1000]
   -kernelWidth     Conv net encoder kernel width. [5]
   -epochs          Number of epochs to train. [5]
   -miniBatchSize   Size of training minibatch. [64]
   -printEvery      How often to print during training. [1000]
   -modelFilename   File for saving loading/model. []
   -window          Size of NNLM window. [5]
   -embeddingDim    Size of NNLM embeddings. [50]
   -hiddenSize      Size of NNLM hidden layer. [100]
   -learningRate    SGD learning rate. [0.1]

Testing options

The run script is used for beam-search decoding with a trained model. See the paper for a description of the extractive features used at decoding time.

> th summary/run.lua -help

-blockRepeatWords Disallow generating a repeated word. [false]
-allowUNK         Allow generating <unk>. [false]
-fixedLength      Produce exactly -length words. [true]
-lmWeight         Weight for main model. [1]
-beamSize         Size of the beam. [100]
-extractive       Force fully extractive summary. [false]
-lmWeight         Feature weight for the neural model. [1]
-unigramBonus     Feature weight for unigram extraction. [0]
-bigramBonus      Feature weight for bigram extraction. [0]
-trigramBonus     Feature weight for trigram extraction. [0]
-lengthBonus      Feature weight for length. [0]
-unorderBonus     Feature weight for out-of-order extraction. [0]
-modelFilename    Model to test. []
-inputf           Input article files.  []
-nbest            Write out the nbest list in ZMert format. [false]
-length           Maximum length of summary.. [5]

Evaluation Data Sets

We evaluate the ABS model using the shared task from the Document Understanding Conference (DUC).

This release also includes code for interactive with the DUC shared task on headline generation. The scripts for processing and evaluating on this data set are in the DUC/ directory.

The DUC data set is available online, unfortunately you must manually fill out a form to request the data from NIST. Send the request to Angela Ellis.

Processing DUC

After receiving credentials you should obtain a series of tar files containing the data used as part of this shared task.

  1. Make a directory DUC_data/ which should contain the given files

    >DUC2003\_Summarization\_Documents.tgz
    >DUC2004\_Summarization\_Documents.tgz
    >duc2004\_results.tgz
    >detagged.duc2003.abstracts.tar.gz
    
  2. Run the setup script (this requires python and NLTK for tokenization)

    ./DUC/setup.sh DUC_data/

After running the scripts there should be directories

   DUC_data/clean_2003/
   DUC_data/clean_2004/

Each contains a file input.txt where each line is a tokenized first line of an article.

 > head DUC_data/clean_2003/input.txt
 schizophrenia patients whose medication could n't stop the imaginary voices in their heads gained some relief after researchers repeatedly sent a magnetic field into a small area of their brains .
 scientists trying to fathom the mystery of schizophrenia say they have found the strongest evidence to date that the disabling psychiatric disorder is caused by gene abnormalities , according to a researcher at two state universities .
 a yale school of medicine study is expanding upon what scientists know  about the link between schizophrenia and nicotine addiction .
 exploring chaos in a search for order , scientists who study the reality-shattering mental disease schizophrenia are becoming fascinated by the chemical environment of areas of the brain where perception is regulated .

As well as a set of references:

> head DUC_data/clean_2003/references/task1_ref0.txt
Magnetic treatment may ease or lessen occurrence of schizophrenic voices.
Evidence shows schizophrenia caused by gene abnormalities of Chromosome 1.
Researchers examining evidence of link between schizophrenia and nicotine addiction.
Scientists focusing on chemical environment of brain to understand schizophrenia.
Schizophrenia study shows disparity between what's known and what's provided to patients.

System output should be added to the directory system/task1_{name}.txt. For instance the script includes a baseline PREFIX system.

DUC_data/clean_2003/references/task1_prefix.txt

ROUGE for Eval

To evaluate the summaries you will need the ROUGE eval system.

The ROUGE script requires output in a very complex HTML form. To simplify this process we include a script to convert the simple output to one that ROUGE can handle.

Export the ROUGE directory export ROUGE={path_to_rouge} and then run the eval scripts

> ./DUC/eval.sh DUC_data/clean_2003/
FULL LENGTH
   ---------------------------------------------
   prefix ROUGE-1 Average_R: 0.17831 (95%-conf.int. 0.16916 - 0.18736)
   prefix ROUGE-1 Average_P: 0.15445 (95%-conf.int. 0.14683 - 0.16220)
   prefix ROUGE-1 Average_F: 0.16482 (95%-conf.int. 0.15662 - 0.17318)
   ---------------------------------------------
   prefix ROUGE-2 Average_R: 0.04936 (95%-conf.int. 0.04420 - 0.05452)
   prefix ROUGE-2 Average_P: 0.04257 (95%-conf.int. 0.03794 - 0.04710)
   prefix ROUGE-2 Average_F: 0.04550 (95%-conf.int. 0.04060 - 0.05026)

Tuning Feature Weights

For our system ABS+ we additionally tune extractive features on the DUC summarization data. The final features we obtained our distributed with the system as tuning/params.best.txt.

The MERT tuning code itself is located in the tuning/ directory. Our setup uses ZMert for this process.

It should be straightforward to tune the system on any developments summarization data. Take the following steps to run tuning on the DUC-2003 data set described above.

First copy over reference files to the tuning directoy. For instance to tune on DUC-2003:

ln -s DUC_data/clean_2003/references/task1_ref0.txt tuning/ref.0
ln -s DUC_data/clean_2003/references/task1_ref1.txt tuning/ref.1
ln -s DUC_data/clean_2003/references/task1_ref2.txt tuning/ref.2
ln -s DUC_data/clean_2003/references/task1_ref3.txt tuning/ref.3

Next copy the SDecoder template, cp SDecoder_cmd.tpl SDecoder_cmd.py and modify the SDecoder_cmd.py to point to the model and input text.

{"model" : "model.th",
 "src" : "/data/users/sashar/DUC_data/clean_2003/input.txt",
 "title_len" : 14}

Now you should be able to run Z-MERT and let it do its thing.

> cd tuning/; java -cp zmert/lib/zmert.jar ZMERT ZMERT_cfg.txt

When Z-MERT has finished you can run on new data using command:

> python SDecoder_test.py input.txt model.th

namas's People

Contributors

joelmarcey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

namas's Issues

Problem with the convolutional encoder

Can't validate the model using the convolutional encoder. Here is the log:

   input view (80x128) and desired view (264) do not match
   stack traceback:

[C]: in function 'error'

/root/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'

/root/torch/install/share/lua/5.1/nn/View.lua:79: in function 'func'

/root/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'

/root/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'

./summary/nnlm.lua:98: in function 'validation'

./summary/nnlm.lua:147: in function 'run_valid'

./summary/nnlm.lua:167: in function 'train'

/home/ubuntu/NAMAS/summary/train.lua:51: in function 'main'

/home/ubuntu/NAMAS/summary/train.lua:54: in main chunk

[C]: in function 'dofile'

/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

[C]: at 0x00405d50

`

ROUGE MERT

From ZMERT_cfg.txt it seems that MERT is applied using BLEU? However, my understand of the paper was that ROUGE was used and not BLEU. Is it possible to access the ZMERT ROUGE code?

About fblualib: fb.util

Hi, Thanks for the sharing first, It's really interesting...

And also, when i try to run it on my Ubuntu 14.04 system, there were following errors, I would really appreciate it if you can tell me what's going on

ayana@nlplinuxgpu:/data/disk1/private/ayana/NAMAS-master$ ./train_model.sh working_dir/ model.th
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/trepl/init.lua:363: module 'fb.util' not found:No LuaRocks module found for fb.util
no field package.preload['fb.util']
no file '/global-hadoop/home/ayana/.luarocks/share/lua/5.1/fb/util.lua'
no file '/global-hadoop/home/ayana/.luarocks/share/lua/5.1/fb/util/init.lua'
no file '/usr/local/share/lua/5.1/fb/util.lua'
no file '/usr/local/share/lua/5.1/fb/util/init.lua'
no file '/data/disk1/private/ayana/NAMAS-master/fb/util.lua'
no file '/global-hadoop/home/ayana/.luarocks/lib/lua/5.1/fb/util.so'
no file '/usr/local/lib/lua/5.1/fb/util.so'
no file './fb/util.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
no file '/global-hadoop/home/ayana/.luarocks/lib/lua/5.1/fb.so'
no file '/usr/local/lib/lua/5.1/fb.so'
no file './fb.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
/data/disk1/private/ayana/NAMAS-master/summary/train.lua:17: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

It seem to me like the fblualib is missing , I tried to install it several times, but it always fail at the folly and fbthrifh part.
I read the source code and i did not see any of them are required...
I am very new to all of these and ...
Thanks for your time and patience...

Hyper-parameter Settings In Paper vs Code

Hi, I have a quick question regarding hyper-parameter settings mentioned in the paper vs. those set in train_model.sh

In the paper:
D = 200, H = 400, C = 5, L = 3, and Q = 2

In train_model.sh:
-miniBatchSize 64
-embeddingDim 64
-bowDim 200
-hiddenSize 64
-epochs 20
-learningRate 0.1
-validArticleDir $OUT_DIR/valid.filter/article/
-validTitleDir $OUT_DIR/valid.filter/title/
-window $WINDOW
-printEvery 100
-encoderModel "attenbow"
-attenPool 5

Are these two sets of hyper-parameters the same? I thought embeddingDim (decoder embedding dimension) has the same size as bowDim (encoder embedding dimension). Also, shouldn't hiddenSize be 400 (H=400)? I am also a little confused as to what C, L and Q map to in the code. Any clarification would be appreciated. Cheers.

FBCUNN package required?

Hi,

Thanks a lot for your work and the wonderful opensource repo. Just a quick question, is this package 'fbcunn' required by your opensource tool as indicated in the README?

I ran a search and saw the line "require 'fbcunn' " it is commented out in the encoder.lua file, so does that mean your tool does not need 'fbcunn' package? 'fbcunn' package seems to only work on Ubuntu, which may have some limits... so that is why I am asking this question. Thanks!

How to use a different dataset

Hello, I am interested into implementing your project. I have +7M (headline,article) pairs locally (stored in a SQL). I would like to know which steps do I have to follow to use it instead of the gigaword dataset. I mean, I would like to know an example of the input format of the trainning model so I could develop a python script to convert my dataset into that format.

Thanks a lot

FBNN package required?

Hi,

First of all thanks for open sourcing this library and thanks for the clear readme. I appreciate it lots and lots.

I noticed that you require fbnn package insummary/nnlm.lua. I hear that it is not well maintained these days, and not only that, I do not see it used anywhere in the script. Can I just comment it out? Without it, it runs like a charm!

Thanks again and hope to hear from you soon.

Problem with preprocessing DUC data script

I tried to use the code setup.sh to preprocess DUC data
./DUC/setup.sh DUC_data/
I fixed the file path related bugs.
However, there are always some bugs reported:

File "DUC/make_DUC.py", line 102, in
sys.exit(main(sys.argv[1:]))
File "DUC/make_DUC.py", line 67, in main
assert len(matches) == 4, matches

Could you please help me? Thank you!

AWS Support

Hi - I think this may have issues running on AWS as AWS offers instances with GRID K520 with compute capability of only 3. Here's the output of the device query on g2.8xlarge instance type:

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 4, Device0 = GRID K520, Device1 = GRID K520, Device2 = GRID K520, Device3 = GRID K520

I am able to train the model, but not test it as I run into an issue similar to the one described here:

https://groups.google.com/forum/#!msg/torch7/aO7Q6A2RxHs/x5CtTh_mCQAJ

I get an error "invalid device function" which according to the thread results from only 3.5 and up being supported. Just wanted to know if there was a work around. Thanks!

module 'fb.torch.async_rng' not found

Hi, I'm trying to train the model on AWS instance (Bitfusion Boost Ubuntu 14 Torch 7), but getting this error:

$ ./test_model.sh workind_dir/ model.th
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:384: /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:384: /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:384: module 'fb.torch.async_rng' not found:No LuaRocks module found for fb.torch.async_rng
    no field package.preload['fb.torch.async_rng']
    no file '/home/ubuntu/.luarocks/share/lua/5.1/fb/torch/async_rng.lua'
    no file '/home/ubuntu/.luarocks/share/lua/5.1/fb/torch/async_rng/init.lua'
    no file '/home/ubuntu/torch/install/share/lua/5.1/fb/torch/async_rng.lua'
    no file '/home/ubuntu/torch/install/share/lua/5.1/fb/torch/async_rng/init.lua'
    no file './fb/torch/async_rng.lua'
    no file '/home/ubuntu/torch/install/share/luajit-2.1.0-beta1/fb/torch/async_rng.lua'
    no file '/usr/local/share/lua/5.1/fb/torch/async_rng.lua'
    no file '/usr/local/share/lua/5.1/fb/torch/async_rng/init.lua'
    no file '/home/ubuntu/code/NAMAS/fb/torch/async_rng.lua'
    no file 'fb/torch/async_rng.lua'
    no file '/home/ubuntu/.luarocks/lib/lua/5.1/fb/torch/async_rng.so'
    no file '/home/ubuntu/torch/install/lib/lua/5.1/fb/torch/async_rng.so'
    no file '/home/ubuntu/torch/install/lib/fb/torch/async_rng.so'
    no file './fb/torch/async_rng.so'
    no file '/usr/local/lib/lua/5.1/fb/torch/async_rng.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
    no file '/home/ubuntu/.luarocks/lib/lua/5.1/fb.so'
    no file '/home/ubuntu/torch/install/lib/lua/5.1/fb.so'
    no file '/home/ubuntu/torch/install/lib/fb.so'
    no file './fb.so'
    no file '/usr/local/lib/lua/5.1/fb.so'
    no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
    [C]: in function 'error'
    /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    summary/run.lua:17: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

I installed fbcunn and all the deps, as described in https://github.com/facebook/fbcunn/blob/master/INSTALL.md. I expect the missing module should be installed as a part of fblualibs (which I did install with fblualibs/build.sh), but looking into fblualib/build/Makefile (the makefile produced by the fblualibs/build.sh), there is nothing about fblualib/torch/AsyncRNG.cpp so it probably does not get build.

I tried to run cmake . under fblualibs/torch, but got this error:

$ cmake .
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/ubuntu/torch/install
-- Found Folly: /usr/local/include
-- Found Torch7 in /home/ubuntu/torch/install
CMake Error at CMakeLists.txt:45 (install):
  install TARGETS given no LIBRARY DESTINATION for module target "async_rng".


-- Configuring incomplete, errors occurred!
See also "/home/ubuntu/code/fblualib/fblualib/torch/CMakeFiles/CMakeOutput.log".

How should I install fb.torch.async_rng?
Thank you for any help.

Why does the test set contain 340000 sentences ?

I used the command cat $SPLITS/test.splits | xargs cat > test.data.txt to build test set . My test set file contain about 340000 sentences, however, the test set mentioned in your paper only includes 2000 sentences .
What about the reason?

Bug when filtering bad words

There is a bug in the logic when removing titles with 'bad words'. Instead of searching for a bad word in the tokenized title, the whole title string is used.
This leads to the removal of all instances where the title contains the bigram 'ap' (or any other bad word as a subsequence). So all titles with e.g. the word 'Japan' are removed from the training set (approximately 700K totally fine instances).
Then source and target vocabularies are created based on the training set. Because the target vocabulary doesn't contain 'Japan' and bad word filtering is not applied to the test set, there are instances in the test set where the input includes 'Japan' and the output has it as 'UNK'. Just so that you know why there are so many UNKs in the test summaries.

if any((bad in title.lower()

prepare4rouge-simple.pl is not found

prepare4rouge-simple.pl this file used in ./DUC/eval.sh script is not found.
At line 11 eval.sh says:

perl $ABS/DUC/prepare4rouge-simple.pl tmp_SYSTEM tmp_GOLD tmp_OUTPUT

Alternative dataset to Gigaword

Hi, I would like to try the experiment in your paper. However, I couldn't afford the annotated Gigaword dataset. Is there any similar corpus that I can use to train the model? Thank you!

beam_search.lua: nn.TemporalKMaxPooling throws error

line number 103 references TemporalKMaxPooling which I guess is not the part of nn package.
Any workaround ?

while running test script I got error as:

./summary/beam_search.lua:103: attempt to call field 'TemporalKMaxPooling' (a nil value)
stack traceback:
./summary/beam_search.lua:103: in function 'generate'
summary/run.lua:86: in function 'main'
summary/run.lua:121: in main chunk
[C]: in function 'dofile'
...dave/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

invalid device ordinal problem

Hello,
I came here looking for the invalid device ordinal problem appeared during the excution of the test.sh file.

Below are the console error messages I received :

THCudaCheck FAIL file=/root/torch/extra/cutorch/init.c line=698 error=10 : invalid device ordinal
/root/torch/install/bin/luajit: summary/run.lua:26: cuda runtime error (10) : invalid device ordinal at /root/torch/extra/cutorch/init.c:698
stack traceback:
[C]: in function 'setDevice'
summary/run.lua:26: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

I think the sentence below is the most important message.
/root/torch/extra/cutorch/init.c line=698 error=10 : invalid device ordinal

My system is Ubuntu 14.04 and I installed and tested samples of CUDA-7.5 with no problems.
I tried to reinstall the NVIDIA drivers and CUDA with no avail.
It is weird that usual(google-searched) invalid device ordinal problem was solved by blacklisting nouveau, however, in my case, it does not work.

Any help please?

about fbnn in nnlm.lua

Hi,there

Last time i popped a question about fblualib, I can't install it on my system.
Then i found out only the fbnn module in is related to that, So i just deleted it .
Now the program is working, like:

ayana@nlplinuxgpu:/data/disk1/private/ayana/NAMAS-master$ tail -f train.log
Encoder model: BoW + Attention
64 200

108838
64
[torch.LongStorage of size 2]

torch.CudaTensor
[Running Validation]
[perp: 113252.112532 validation: 11.637372 total: 117626]
[saving mlp: working_dir//models/model.th]
[Loss: inf Epoch: 1 Position: 64 Rate: 0.100000 Time: 0.089436]
[Loss: 10.816063 Epoch: 1 Position: 6464 Rate: 0.100000 Time: 10.614009]
[Loss: 11.075415 Epoch: 1 Position: 12864 Rate: 0.100000 Time: 10.651449]
[Loss: 10.293554 Epoch: 1 Position: 19264 Rate: 0.100000 Time: 10.650220]
[Loss: 9.971358 Epoch: 1 Position: 25664 Rate: 0.100000 Time: 10.652008]
[Loss: 9.690861 Epoch: 1 Position: 32064 Rate: 0.100000 Time: 10.649381]
[Loss: 9.491403 Epoch: 1 Position: 38464 Rate: 0.100000 Time: 10.650344]
[Loss: 9.084007 Epoch: 1 Position: 44864 Rate: 0.100000 Time: 10.652305]
[Loss: 8.958983 Epoch: 1 Position: 51264 Rate: 0.100000 Time: 10.651511]
[Loss: 8.976383 Epoch: 1 Position: 57664 Rate: 0.100000 Time: 10.653925]
[Loss: 8.875211 Epoch: 1 Position: 64064 Rate: 0.100000 Time: 10.653495]
[Loss: 8.884078 Epoch: 1 Position: 70464 Rate: 0.100000 Time: 10.655499]
[Loss: 8.820546 Epoch: 1 Position: 76864 Rate: 0.100000 Time: 10.658177]
[Loss: 8.710104 Epoch: 1 Position: 83264 Rate: 0.100000 Time: 10.661504]
[Loss: 8.732778 Epoch: 1 Position: 89664 Rate: 0.100000 Time: 10.656001]
[Loss: 8.655664 Epoch: 1 Position: 96064 Rate: 0.100000 Time: 10.667066]
.....

Is it working well?

Will there be a pytorch version?

I wonder if there is or will be a pytorch version for this project. That would be very helpful since many people like me are not familiar with Lua.

err becomes NaN

Hi Thanks for this helpful framework.
I am training this for longer document and summary pairs. After 2 epochs this err becomes nan.
local input, target = valid_data:next_batch(offset)
local out = self.mlp:forward(input)
local err = self.criterion:forward(out, target) * target:size(1)

After 2 epoch it is displaying me this:
err nan
NaN
{
1 : CudaTensor - size: 64x291
2 : CudaTensor - size: 64x291
3 : CudaTensor - size: 64x5
}
err nan
NaN
{
1 : CudaTensor - size: 64x291
2 : CudaTensor - size: 64x291
3 : CudaTensor - size: 64x5
}
err nan
NaN
{
1 : CudaTensor - size: 64x291
2 : CudaTensor - size: 64x291
3 : CudaTensor - size: 64x5
}
[EPOCH : 20 LOSS: 0.000000 TOTAL: 850461 BATCHES: 13683]

Any help would be great.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.