kenchan0226 / keyphrase-generation-rl Goto Github PK

View Code? Open in Web Editor NEW

106.0 10.0 15.0 88.29 MB

Code for the ACL 19 paper "Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards"

Home Page: https://arxiv.org/abs/1906.04106

License: MIT License

Python 100.00%

keyphrase-generation reinforcement-learning pytorch-implementation

keyphrase-generation-rl's Introduction

Keyphrase Generation via Reinforcement Learning

This repository contains the code for our ACL 19 paper "Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards".

Our implementation is built on the source code from seq2seq-keyphrase-pytorch. Some codes are adapted from this repository. The code for beam search is mainly adapted from OpenNMT-py.

If you use this code, please cite our paper:

@inproceedings{conf/acl/chan19keyphraseRL,
  title={Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards},
  author={Hou Pong Chan and Wang Chen and Lu Wang and Irwin King},
  booktitle={Proceedings of ACL},
  year={2019}
}

Dependencies

python 3.5+
pytorch 0.4

Dataset

The datasets can be downloaded from here, which are the tokenized version of the datasets provided by Rui Meng. Please unzip the files to the ./data directory. The kp20k_sorted directory contains the kp20k dataset, it consists of three pairs of source-target files, train_src.txt, train_trg.txt, valid_src.txt, valid_trg.txt, test_src.txt. We removed the duplicated documents in the KP20k training set according to the instructions in Rui Meng's Github. For each document, we sort all the present keyphrase labels according to their order of the first occurrence in the document. The absent keyphrase labels are then appended at the end of present keyphrase labels. Thanks to Mr. Wang Chen for his help on data preprocessing.

For the training of our reinforced models, we use an additional token <peos> to mark the end of present keyphrases in the target files. The kp20k dataset with the <peos> token on the target files is located in the kp20k_separated directory.

The cross_domain_sorted directory contains the test only datasets (inspec, krapivin, nus, and semeval). For example, the source and target file of nus dataset are cross_domain_sorted/word_nus_testing_context.txt and cross_domain_sorted/word_nus_testing_allkeywords.txt.

Formats

All the text should be tokenized and all the tokens should be separated by a space character.
All digits should be replaced by a <digit> tokens.
In source files, the title and the main body are separated by an <eos> token
in target files, the keyphrases are separated by an ; character. There is no space before and after the colon character, e.g., keyphrase one;keyphrase two. For the training of reinforced model, <peos> is used to mark the end of present ground-truth keyphrases, e.g., present keyphrase one;present keyphrase two;<peos>;absent keyprhase one;absent keyphrase two.

Training

Train a baseline model

Please download and unzip the datasets in the ./data directory.

Numericalize data.

The preprocess.py script numericalizes the three pairs of source-target files, and produce the following files train.one2one.pt, train.one2many.pt, valid.one2one.pt, valid.one2many.pt, test.one2one.pt, test.one2many.pt, vocab.pt. The *.one2one.pt files which split a sample (source, {kp1, kp2, ...}) into multiple training sample (source, kp1), (source, kp2), ... The *.one2many.pt files does not split the training sample.

Command: python3 preprocess.py -data_dir data/kp20k_sorted -remove_eos -include_peos

To use the TG-Net model, you need to copy the directory data/kp20k_sorted to data/kp20k_tg_sorted and run the following preprocessing script. python3 preprocess.py -data_dir data/kp20k_tg_sorted -remove_eos -include_peos -title_guided

Train a baseline model using maximum-likelihood loss

catSeq: python3 train.py -data data/kp20k_sorted/ -vocab data/kp20k_sorted/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -train_ml -one2many -one2many_mode 1 -batch_size 12 -seed 9527
catSeqD: python3 train.py -data data/kp20k_sorted/ -vocab data/kp20k_sorted/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -orthogonal_loss -lambda_orthogonal 0.03 -train_ml -one2many -one2many_mode 1 -use_target_encoder -batch_size 12 -seed 9527
catSeqCorr: python3 train.py -data data/kp20k_sorted/ -vocab data/kp20k_sorted/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -coverage_attn -review_attn -train_ml -one2many -one2many_mode 1 -batch_size 12 -seed 9527
catSeqTG: python3 train.py -data data/kp20k_tg_sorted/ -vocab data/kp20k_tg_sorted/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -title_guided -train_ml -one2many -one2many_mode 1 -batch_size 12 -batch_workers 3 -seed 9527

Train a reinforced model

Different from the baseline models, we use an additional token <peos> to mark the end of present keyphrases. See Section 3.2 of our paper.

Numericalize data.

Command: python3 preprocess.py -data_dir data/kp20k_separated -remove_eos -include_peos

To use the TG-Net model, you need to copy the directory data/kp20k_separated to data/kp20k_tg_separated and run the following preprocessing script. python3 preprocess.py -data_dir data/kp20k_tg_separated -remove_eos -include_peos -title_guided

Train ML

catSeq: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -train_ml -one2many -one2many_mode 1 -batch_size 12 -separate_present_absent -seed 9527
catSeqD: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -orthogonal_loss -lambda_orthogonal 0.03 -train_ml -one2many -one2many_mode 1 -use_target_encoder -batch_size 12 -separate_present_absent -seed 9527
catSeqCorr: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -coverage_attn -review_attn -train_ml -one2many -one2many_mode 1 -batch_size 12 -separate_present_absent -seed 9527
catSeqTG: python3 train.py -data data/kp20k_tg_separated/ -vocab data/kp20k_tg_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -title_guided -train_ml -one2many -one2many_mode 1 -batch_size 12 -batch_workers 3 -separate_present_absent -seed 9527

Train RL

catSeq-2RF1: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model [path_to_ml_pretrained_model] -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed 9527
catSeqD-2RF1: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -use_target_encoder -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model [path_to_ml_pretrained_model] -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed 9527
catSeqCorr-2RF1: python3 train.py -data data/kp20k_separated/ -vocab data/kp20k_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -coverage_attn -review_attn -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model [path_to_ml_pretrained_model] -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed 9527
catSeqTG-2RF1: python3 train.py -data data/kp20k_tg_separated/ -vocab data/kp20k_tg_separated/ -exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -title_guided -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model [path_to_ml_pretrained_model] -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -batch_workers 3 -seed 9527

Decode from a pretrained model

Following Yuan et al. 2018, we use greedy search to decode the keyphrases from a pre-trained model, but you increase the beam size by specifying the beam_size option.

catSeq on inspec dataset: python3 interactive_predict.py -vocab data/kp20k_sorted/ -src_file data/cross_domain_sorted/word_inspec_testing_context.txt -pred_path pred/%s.%s -copy_attention -one2many -one2many_mode 1 -model [path_to_model] -max_length 60 -remove_title_eos -n_best 1 -max_eos_per_output_seq 1 -beam_size 1 -batch_size 20 -replace_unk
catSeq-2RF1 on inspec dataset: python3 interactive_predict.py -vocab data/kp20k_separated/ -src_file data/cross_domain_separated/word_inspec_testing_context.txt -pred_path pred/%s.%s -copy_attention -one2many -one2many_mode 1 -model [path_to_model] -max_length 60 -remove_title_eos -n_best 1 -max_eos_per_output_seq 1 -beam_size 1 -batch_size 20 -replace_unk -separate_present_absent

For catseqCorr, and catseqCorr-2RF1, you need to add the options of -coverage_attn -review_attn. For catSeqD and catSeqD-2RF1, you need to add the options of -use_target_encoder. For catSeqTG, you need to add the options of -title_guided and change the vocab from kp20k_sorted to kp20k_tg_sorted (kp20k_separated to kp20k_tg_separated). For other datasets, you need to change the option -src_file to the path of source file on other test dataset, but you do not need to change the -vocab option.

Once the decoding finished, it creates a predictions.txt in the path specified by pred_path, e.g., pred/predict.kp20k.bi-directional.20180914-095220/predictions.txt. For each line in the prediction.txt contains all the predicted keyphrases for a source.

Compute evaluation score on prediction files

Command for computing the evaluation scores of a prediction file from a baseline model.

python3 evaluate_prediction.py -pred_file_path [path_to_predictions.txt] -src_file_path [path_to_test_set_src_file] -trg_file_path [path_to_test_set_trg_file] -exp kp20k -export_filtered_pred -disable_extra_one_word_filter -invalidate_unk -all_ks 5 M -present_ks 5 M -absent_ks 5 M

Since the prediction files of reinforced models has a special token <peos>, we need to use the following command.

Enriched evaluation set

Please download the enriched trg file on the kp20k testing set from here and extract it to ./data/kp20k_enriched. We use the token | to separate name variations, e.g., name variation 1 of keyphrase 1|name variation 2 of keyphrase 1;name variation 1 of keyphrase 2|name variation 2 of keyphrase 2.

Command for evaluating a baseline model: python3 evaluate_prediction.py -pred_file_path [path_to_predictions.txt] -src_file_path [path_to_kp20k_test_set_src_file] -trg_file_path data/kp20k_enriched/test_trg.txt -exp kp20k -export_filtered_pred -disable_extra_one_word_filter -invalidate_unk -all_ks 5 M -present_ks 5 M -absent_ks 5 M -use_name_variations

Command for evaluating a reinforced model: python3 evaluate_prediction.py -pred_file_path [path_to_predictions.txt] -src_file_path [path_to_kp20k_test_set_src_file] -trg_file_path data/kp20k_enriched/test_trg.txt -exp kp20k -export_filtered_pred -disable_extra_one_word_filter -invalidate_unk -all_ks 5 M -present_ks 5 M -absent_ks 5 M -prediction_separated -use_name_variations

Test set output

The output files of our catSeqTG-2RF1 model are available here.

Options

This section describe some common options for different python scripts. Please read the config.py for more details about the options.

The options for the training script:

-data []: path prefix to the "train.one2one.pt" and "train.one2many.pt" file path from preprocess.py, e.g., -data data/kp20k_filtered/
-vocab []: path prefix to the "vocab.pt" file path from preprocess.py, e.g., -vocab data/kp20k_filtered/
-exp []: name of the experiment for logging., e.g., kp20k
-exp_path []: path of experiment log/plot, e.g., -exp_path exp/%s.%s, the %s will be filled by the value in -exp and timestamp
-copy_attention: a flag for training a model with copy attention, we follow the copy attention in [See et al. 2017]
-coverage_attn: a flag for training a model with coverage attention layer, we follow the coverage attention in [See et al. 2017]
-coverage_loss: a flag for training a model with coverage loss in [See et al. 2017]
-lambda_coverage [1]: Coefficient of coverage loss, a coefficient to control the importance of coverage loss.
-review_attn: use the review attention in Chen et al. 2018a
-orthogonal_loss: a flag to include orthogonal loss
-lambda_orthogonal []: Lambda value for the orthogonal loss by Yuan et al.
-use_target_encoder: Use the target encoder by Yuan et al.
-lambda_target_encoder []: Lambda value for the target encoder loss by Yuan et al.
-train_ml: a flag for training a model using maximum likehood in a supervised learning setting.
-one2many: a flag for training a model using one2many mode.
-one2many_mode [0]: 1 means concatenated the keyphrases by <sep>; 2 means follows Chen et al. 2018a; 3 means reset the hidden state whenever the decoder emits a <EOS> token.
-delimiter_type [0]: only effective in one2many mode. If delimiter_type = 0, SEP_WORD=<sep>, if delimiter_type = 1, SEP_WORD=<eos>.
-separate_present_absent: whether to separate present keyphrase predictions and absnet keyphrase predictions by a <peos> token.
-goal_vector_mode [0]: Only effective in when using separate_present_absent. 0: do not use goal vector, 1: goal vector act as an input to the decoder, 2: goal vector act as an extra input to p_gen
-goal_vector_size [16]: Size of gaol vector
-manger_mode [1]: Only effective in when using separate_present_absent. 1: two trainable vectors as the goal vectors. May support different types of maanger in the future.
-replace_unk: Replace the unk token with the token of highest attention score.

The options for evaluate_prediction.py:

-pred_file_path []: path of the file exported by predict.py
-src_file_path []: path of the source file in the dataset, e.g., data/kp20k_filtered/test_src.txt
-trg_file_path []: path of the target file in the dataset, e.g., data/kp20k_filtered/test_trg.txt
-exp_path []: path for experiment log, which includes all the evaluation results
-exp []: name of the experiment for logging, e.g., kp20k
-export_filtered_pred: a flag for exporting all the filtered keyphrases to a file
-filtered_pred_path []: path of the file that store the filtered keyphrases
-invalidate_unk: filter out all the unk words in predictions before computing the scores
-disable_extra_one_word_filter: If you did not specify this option, it will only consider the first one-word prediction. Please use this option when using kp20k testing set.
-num_preds []: It will only consider the first -num_preds keyphrases in each line of the prediction file.
-replace_unk: replace the unk token with the token that received the highest attention score.
-prediction_separated: the predictions has a special <peos> token. For the evaluation of the reinforced model. 
-use_name_variations: the target file contains name valuation set.

Some common options for rl training:

-train_rl: a flag for training a model using reward in a reinforcement learning setting.
-baseline []: specify the baseline for the policy gradient algorithm, choices=["none", "self"], "self" means we use self-critical as the baseline
-reward_type []: 0: f1, 1: recall, 2: ndcg, 3: accuracy, 4: alpha-ndcg, 5: alpha-dcg, 6: AP, 7: F1 (all duplicates are considered as incorrect)
-topk []: only pick the -topk predictions when computing the reward. M means use all the predictions to compute the reward. G is used to specify RF1 reward. If the number of predictions less than ground-truth, it will set k to the number of ground-truth keyphrases. Otherwise, it will set k to the number of predictions. The option `-reward_type 7 -topk G` yields the RF1 reward in our paper. 
-pretrained_model []: path of the MLE pre-trained model
-replace_unk: replace the unk token with the token that received the highest attention score.
-max_length []: max length of the output sequence
-num_predictions []: only effective when one2many_mode=2 or 3, control the number of predicted keyphrases.

We can also add Guassian noise vector to perturb the hidden state of GRU after generated each a keyphrase to encourage exploration. I tried it, but the performance is not good. The followings are the options for the perturbation.

-init_perturb_std [0]: initial std of gaussian noise vector
-final_perturb_std [0]: terminal std of gaussian noise vector, only effective when perturb_decay=1.
-perturb_decay [0]: mode of decay for the std of gaussian noise, 0 means no decay, 1 means exponential decay, 2 means stepwise decay.
-perturb_decay_factor [0]: factor for the std decay, the effect depends on the value of -perturb_decay.
-perturb_baseline: a flag for perturb the hidden state of the baseline in policy gradient training.

We can also regularize the reward using the following two options. The baseline reward is not affected by the regularization. I tried it, but the performance is not good.

-regularization_type []: 0 means no regulaization, 1 means using percentage of unique keyphrases as regularization, 2 means using entropy of policy as regularization
-regularization_factor []: factor of regularization, regularized reward = (1-regularization_factor)*reward + regularization_factor*regularization

References

Abigail See, Peter J. Liu, Christopher D. Manning: Get To The Point: Summarization with Pointer-Generator Networks. ACL (1) 2017: 1073-1083

Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu Chi: Deep Keyphrase Generation. ACL (1) 2017: 582-592

Hai Ye, Lu Wang: Semi-Supervised Learning for Neural Keyphrase Generation. EMNLP 2018a: 4142-4153

Jun Chen, Xiaoming Zhang, Yu Wu, Zhao Yan, Zhoujun Li: Keyphrase Generation with Correlation Constraints. EMNLP 2018a: 4057-4066

Wang Chen, Yifan Gao, Jiani Zhang, Irwin King, Michael R. Lyu: Title-Guided Encoding for Keyphrase Generation. CoRR abs/1808.08575 (2018b)

Xingdi Yuan, Tong Wang, Rui Meng, Khushboo Thaker, Daqing He, Adam Trischler: Generating Diverse Numbers of Diverse Keyphrases. CoRR abs/1810.05241 (2018)

keyphrase-generation-rl's People

Contributors

Stargazers

Watchers

Forkers

iamxpy guoqunabc avinsit123 huangxiaolist convexsetgithub leejh1230 liuyijiang1994 hareeshbahuleyan kiminh amarnamarpan jihyukkim-nlp epapagia lightcome techthiyanes yuhengtech

keyphrase-generation-rl's Issues

RuntimeError: expected a non-empty list of Tensors

seq_lens here is the length of delimiter_target_encoder_states_2dlist , there will cause an error when all target is a single sentence (no <sep>) one batch.
seq_lens all zero and tensor_2d_list are all empty, this line raise RuntimeError: expected a non-empty list of Tensors

run the predict.py 产生这个问题RuntimeError: expected scalar type Long but found Float

Which model to use for decoding/evaluation?

Once training is finished, I see lot of model files but not sure which one to use as the final one. For example, if I have the following two models, which one I should use?

openkp.ml.one2many.cat.copy.bi-directional.epoch=2.batch=3569.total_batch=12000.model
openkp.ml.one2many.cat.copy.bi-directional.epoch=2.batch=7569.total_batch=16000.model

Also, curious what the batch and total_batch values mean here? I assume we need to use the model that is resulted from the latest epoch. So, I use model with epoch=2 than epoch=1.

Why do I predict that it's all pad

About integrated_data_preprocess.py

Would you please release the script to use integrated_data_preprocess.py？

The script for training catSeqTG-2RF1 freezes with no CPU or GPU utilization

Everytime I try to train the catSeqTG-2RF1 model from catSeqTG, the script suddenly stops in between at a random place. The last time, I have waited for a day before force stopping the script. While frozen, the script holds on to the Primary Memory space and the GPU memory space, but no processing or disk is used. I am using PyTorch 0.4 on Windows.
Is it due to some deadlock? I don't understand.
the following is my output:
until I ctrl+break out of the frozen state.
OUTPUT: INFO:root:====================== Model Parameters ========================= 12/06/2020 13:50:17 [INFO] train: ====================== Model Parameters ========================= INFO:root:Training a seq2seq model with copy mechanism 12/06/2020 13:50:17 [INFO] train: Training a seq2seq model with copy mechanism C:\Users\Arpan\anaconda3\envs\py36\lib\site-packages\torch\nn\modules\rnn.py:60: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1 "num_layers={}".format(dropout, num_layers)) Epoch 1; batch: 0; total batch: 0 INFO:root:Time for training: 863.4 ^C

about the semeval dataset

Hi, kenchan, I notice that the semeval test file is different from the others (e.g. they are already stemmed.) How to get the unstemmed version ?

about training rl model

I have trained catSeq model and its performance is as your reported. When I use
python3 train.py -data data/kp20k/kp20k_separated/rl/ -vocab data/kp20k/kp20k_separated/rl/ -exp_path=exp -exp catSeq_rl_kp20k -epochs 20 -model_path=model/catSeq_rl_9527 -copy_attention -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model=model/catSeq_9527/catSeq_kp20k.ml.one2many.cat.copy.bi-directional.epoch=3.batch=38098.total_batch=120000.model -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed=9527 to train a rl model, its loss is wrong from the beginning, and it always -0.000x. What do you think might be the problem？

load data

Hello

I would like to ask about the possibility to train using only the absent Kps.
So can you please let me know how can I change in the function load_data_and_vocab(opt, load_train=True) in case of "one to many" to load only absent KPs ?
Thanks

Why is the result of catSeqTG much lower than TG-net?

The authors of TG-net report their results as follows：

But catSeqTG's results are much lower than what they report：

There are so many duplicates in my prediction

I trained a catSeq model and used it to generate keyphrases, but the results contain many duplicates, I don't know why, can you give me some suggestion? The results are shown bellow.

The folder 'data/cross_domain_separated/' cannot be found

在Decode from a pretrained model的部分，當我嘗試評估catSeq-2RF1在 inspec 數據集上的表現時，遇到了FileNotFoundError，我仔細檢查了README，仍然沒搞清楚這個文件夾是哪裡來的。請問提供的數據集是否漏了該文件夾？

報錯信息：
FileNotFoundError: [Errno 2] No such file or directory: 'data/cross_domain_separated/word_inspec_testing_context.txt'

how to generate present keyphrase only?

I want to the model generate present keyphrases only, not generate absent keyphrase.
Is there a config or how can i change the code?

Inquiry about experiment results

Hi,
As mentioned in your paper that you used Macro-averaged scores, and the reported experiment results of present keyphrase prediction of catSeqD model on kp20k dataset is 0.285 (F1@5 metric) .

When I ran the catSeqD model with your code, I got a similar Macro-averaged score with you of 0.286 on the metric of F1@5, and the Micro-averaged score of 0.270.

But according the results reported in the paper One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases https://arxiv.org/abs/1810.05241
It seems that they used the Micro-averaged score.

And they got the score of 0.348 of catSeqD model on the metric of F1@5.

I am confused about the different results of the same model on the same dataset.
Is there anything wrong with this comparison ?

RuntimeError: cublas runtime error

Hi kenchan,

I tried to run the ML training code on my own dataset, which vocab size is 13911 as shown in follows. And in the command for ML training, I specified vocab_size=28000. However, after training on some batches, it would crash and report as the RuntimeError: cublas runtime error. Could you help me out there? Thanks in advance. The traceback information is as:

01/18/2021 21:22:01 [INFO] train: device    :    cuda:0
INFO:root:Loading vocab from disk: /users/tr.xiaow/kpRL/keyphrase-generation-rl-master/data/case73/
01/18/2021 21:22:01 [INFO] data_loader: Loading vocab from disk: /users/tr.xiaow/kpRL/keyphrase-generation-rl-master/data/case73/
INFO:root:#(vocab)=13911
01/18/2021 21:22:01 [INFO] data_loader: #(vocab)=13911
INFO:root:#(vocab used)=28000
01/18/2021 21:22:01 [INFO] data_loader: #(vocab used)=28000
INFO:root:Loading train and validate data from '/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/data/case73/'
01/18/2021 21:22:01 [INFO] data_loader: Loading train and validate data from '/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/data/case73/'
INFO:root:#(train data size: #(batch)=20900
01/18/2021 21:22:03 [INFO] data_loader: #(train data size: #(batch)=20900
INFO:root:#(valid data size: #(batch)=2359
01/18/2021 21:22:03 [INFO] data_loader: #(valid data size: #(batch)=2359
INFO:root:Time for loading the data: 1.9
01/18/2021 21:22:03 [INFO] train: Time for loading the data: 1.9
INFO:root:======================  Model Parameters  =========================
01/18/2021 21:22:03 [INFO] train: ======================  Model Parameters  =========================
INFO:root:Training a seq2seq model with copy mechanism
01/18/2021 21:22:03 [INFO] train: Training a seq2seq model with copy mechanism
/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/rnn.py:46: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
INFO:root:======================  Start Training  =========================
01/18/2021 21:22:07 [INFO] train_ml: ======================  Start Training  =========================
Epoch 1; batch: 0; total batch: 0
Epoch 1; batch: 4000; total batch: 4000
Epoch 1; batch: 8000; total batch: 8000
Epoch 1; batch: 12000; total batch: 12000
Epoch 1; batch: 16000; total batch: 16000
Epoch 1; batch: 20000; total batch: 20000
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed90000030e'
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
Traceback (most recent call last):
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed70000030c'
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0eda0000030b'
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed80000030d'
Epoch 2; batch: 3100; total batch: 24000
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0edc00000310'
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
Traceback (most recent call last):
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0edd00000312'
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0edb0000030f'
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ede00000311'
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [101,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [71,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [81,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [91,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [51,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:151: void THCudaTensor_scatterAddKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [61,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed700000313'
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed800000314'
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0ed900000315'
Traceback (most recent call last):
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 486, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 444, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/shutil.py", line 442, in _rmtree_safe_fd
    os.unlink(name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs00000000173a0eda00000316'
ERROR:root:message
Traceback (most recent call last):
  File "train.py", line 164, in main
    train_ml.train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, opt)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/train_ml.py", line 91, in train_model
    valid_loss_stat = evaluate_loss(valid_data_loader, model, opt)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/evaluate.py", line 57, in evaluate_loss
    decoder_dist, h_t, attention_dist, encoder_final_state, coverage, _, _, _ = model(src, src_lens, trg, src_oov, max_num_oov, src_mask, title=title, title_lens=title_lens, title_mask=title_mask)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/model.py", line 392, in forward
    self.decoder(y_t, h_t, memory_bank, src_mask, max_num_oov, src_oov, coverage, decoder_memory_bank, h_te_t, g_t)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/rnn_decoder.py", line 121, in forward
    context, attn_dist, coverage = self.attention_layer(last_layer_h_next, memory_bank, src_mask, coverage)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/attention.py", line 132, in forward
    scores = self.score(memory_bank, decoder_state, coverage)  # [batch_size, max_input_seq_len]
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/attention.py", line 63, in score
    encoder_feature = self.memory_project(memory_bank_)  # [batch_size*max_input_seq_len, decoder size]
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/functional.py", line 1354, in linear
    output = input.matmul(weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCBlas.cu:258
01/18/2021 21:29:43 [ERROR] train: message
Traceback (most recent call last):
  File "train.py", line 164, in main
    train_ml.train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, opt)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/train_ml.py", line 91, in train_model
    valid_loss_stat = evaluate_loss(valid_data_loader, model, opt)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/evaluate.py", line 57, in evaluate_loss
    decoder_dist, h_t, attention_dist, encoder_final_state, coverage, _, _, _ = model(src, src_lens, trg, src_oov, max_num_oov, src_mask, title=title, title_lens=title_lens, title_mask=title_mask)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/model.py", line 392, in forward
    self.decoder(y_t, h_t, memory_bank, src_mask, max_num_oov, src_oov, coverage, decoder_memory_bank, h_te_t, g_t)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/rnn_decoder.py", line 121, in forward
    context, attn_dist, coverage = self.attention_layer(last_layer_h_next, memory_bank, src_mask, coverage)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/attention.py", line 132, in forward
    scores = self.score(memory_bank, decoder_state, coverage)  # [batch_size, max_input_seq_len]
  File "/users/tr.xiaow/kpRL/keyphrase-generation-rl-master/pykp/attention.py", line 63, in score
    encoder_feature = self.memory_project(memory_bank_)  # [batch_size*max_input_seq_len, decoder size]
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/users/tr.xiaow/anaconda3/envs/kpRL/lib/python3.6/site-packages/torch/nn/functional.py", line 1354, in linear
    output = input.matmul(weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCBlas.cu:258

Xiao

ndcg_array = dcg_array / dcg_max_array

Hello
i would like to know after we use the command for computing the evaluation scores on prediction files there is any explanation for that ?
I got that erros
RuntimeWarning: invalid value encountered in true_divide
ndcg_array = dcg_array / dcg_max_array
RuntimeWarning: invalid value encountered in true_divide
alpha_ndcg_array = alpha_dcg_array / alpha_dcg_max_array
henlo
henlo
henlo

Thanks

Cross Domain Data Target Data

Hi,

I am currently reproducing this experiment, and I am now able to generate the result for kp20k dataset.
As for the cross domain test case, do you have the default answer for them as well? Or is there any misunderstanding about the method of evaluating the result for them?
For kp20k dataset, I use valid_src.txt as the input file for interactive_predict.py. After that, I use the result as the prediction file, valid_src.txt as the test src file, and valid_trg.txt as the target file.
For the case of cross domain test case, there is no valid_trg.txt in the file provided.

Confusion regarding the function, `check_present_keyphrases`

I was looking at the function, check_present_keyphrases. I am unable to understand the difference between match_by_word and match_by_string given that both the source and the keyphrases are stemmed!!

As per my understanding, both the matching condition basically checks whether a keyphrase appeared as a contiguous span in the source text. Please correct me if I am wrong.

RuntimeError: rnn: hx is not contiguous

The default enc_layers and dec_layers is 1, there is an error when I try to set to 2 or other bigger.

File "/home/xxl/project/keyphrase-generation-rl/pykp/model.py", line 392, in forward
    self.decoder(y_t, h_t, memory_bank, src_mask, max_num_oov, src_oov, coverage, decoder_memory_bank, h_te_t, g_t)
  File "/home/xx/model_serving_python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xx/project/keyphrase-generation-rl/pykp/rnn_decoder.py", line 111, in forward
    _, h_next = self.rnn(rnn_input, h)
  File "/home/xx/model_serving_python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xx/model_serving_python/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 211, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: rnn: hx is not contiguous

how to perform adaptive reward in RL training？

Hi, kenchan, thanks for your codes! I am confusing how to use adaptive reward in RL training as in your paper.
I find using -reward_type 7 equals to use F1 all the time and is not adaptive, do I misunderstand sth. ?

RuntimeError: Expected object of scalar type Byte but got scalar type Bool for sequence element

when I trained a reinforcement learning model, I met a mistake as below. I don't know which variable is wrong. can you give me some suggestions?

issue about rl training

Hi,
I have completed the ML training on my own dataset using Catseq and CatseqD model. And when I started RL training based on the trained ML model, it will report the following issue, could you help me out there? what might be the causes of this problem? Thanks a lot.

04/13/2020 23:09:17 [ERROR] train: message
Traceback (most recent call last):
File "train.py", line 166, in main
train_rl.train_model(model, optimizer_ml, optimizer_rl, criterion, train_data_loader, valid_data_loader, opt)
File "/keyphrase-generation-rl-master/train_rl.py", line 69, in train_model
batch_reward_stat, log_selected_token_dist = train_one_batch(batch, generator, optimizer_rl, opt, perturb_std)
File "/keyphrase-generation-rl-master/train_rl.py", line 200, in train_one_batch
src_str_list, opt.separate_present_absent, pykp.io.PEOS_WORD)
File "/keyphrase-generation-rl-master/pykp/reward.py", line 13, in sample_list_to_str_2dlist
word_list = prediction_to_sentence(sample['prediction'], idx2word, vocab_size, oov, eos_idx, unk_idx, replace_unk, src_word_list, sample['attention'])
File "/keyphrase-generation-rl-master/utils/string_helper.py", line 28, in prediction_to_sentence
word = idx2word[_pred]
KeyError: 40284

catSeq和catSeqTG的效果为什么这么接近，无论是否使用2RF_1？

TG-Net论文里的结果是TG-Net明显比CopyRNN要高，消融实验里给的是超过一个点。是否是TG-net不适用于catSeq这种结构？

Trained models

Hi Ken Chan,

Thanks for the awesome repository. I know that it has been a long time since the paper was out, but do you have any of the trained models by any chance?

-- Krishna

About number of generated keyphrases

关于您论文里的模型生成关键词数量的实验结果，有些地方我不是很明白，想要请教一下

关于oracle那一行，Avg #present 的2.837是统计的整个数据集，还是训练集，验证集，或是测试集？那如果是整个数据集的话，包含了训练集岂不是不太合理- -，因为感觉模型用训练集训练，再用训练集生成结果？
还有表中的众模型的结果是基于验证集还是测试集？包括MAE和平均数量。那统计过程中是去掉重复的关键词，还是未去重？
另外在ACL2020的一篇文章里《Exclusive Hierarchical Decoding for Deep Keyphrase Generation》他统计的oracle Avg #present =3.32，但是他也没有说这是基于整个数据集还是仅仅是测试集或验证集的统计结果。我自己统计了一下测试集和验证集Avg #present都是在3.4左右，比较接近ACL2020这篇文章。

希望得到您的建议

pretrained_model

Hello, I have the following problems:No such file or directory: '[path_to_ml_pretrained_model]' when I run the code. I don't have pretrain model. Is this model trained by myself or downloaded from somewhere.Thank you very much！

Using Pre-trained embedding

Hi, Kenchan,
I used the following command to use the pre-trained embedding to train the model, could you help me to check if it is right to use?
I have generated the Glove pretrained embedding, then when I submitted to train a catseq ML model using this cmd

python3 train.py -data data/kp20k/ -vocab data/kp20k/ -pre_word_vecs_enc "data/kp20k/embeddings.enc.pt" -pre_word_vecs_dec "data/kp20k/embeddings.dec.pt" 
-exp_path exp/%s.%s -exp kp20k -epochs 20 -copy_attention -train_ml -one2many -one2many_mode 1 -batch_size 12 -separate_present_absent -seed 9527

Question1: I was wondering, since we have designated the -pre_word_vecs_enc and -pre_word_vecs_dec, why there still needs to specify the dataset path-data ?

Question2: I'm kind of confused about whether RL training based on the ML pre-trained model needs to specify the options: -pre_word_vecs_enc and -pre_word_vecs_dec.

Thanks in advance.
Xiao