nlpyang / bertsum Goto Github PK
View Code? Open in Web Editor NEWCode for paper Fine-tune BERT for Extractive Summarization
License: Apache License 2.0
Code for paper Fine-tune BERT for Extractive Summarization
License: Apache License 2.0
Hi,
The program is running, but the results directory don't have some results, is there some wrong?
My commend is:
python train.py -mode validate -bert_data_path /home/test/WangHN/BertSum-master/bert_data/cnndm -model_path /home/test/WangHN/BertSum-master/models/bert_classifier -visible_gpus 0 -gpu_ranks 0 -batch_size 30000 -log_file /home/test/WangHN/BertSum-master/logs/Evaluation/bert_classifier -result_path /home/test/WangHN/BertSum-master/results/classifier/cnndm -test_all -block_trigram true
The information is:
[2019-05-08 20:19:55,407 INFO] Loading checkpoint from /home/test/WangHN/BertSum-master/models/bert_classifier/model_step_3000.pt
Namespace(accum_count=1, batch_size=30000, bert_config_path='../bert_config_uncased_base.json', bert_data_path='/home/test/WangHN/BertSum-master/bert_data/cnndm', beta1=0.9, beta2=0.999, block_trigram=True, dataset='', decay_method='', dropout=0.1, encoder='classifier', ff_size=512, gpu_ranks=[0], heads=4, hidden_size=128, inter_layers=2, log_file='/home/test/WangHN/BertSum-master/logs/Evaluation/bert_classifier', lr=1, max_grad_norm=0, mode='validate', model_path='/home/test/WangHN/BertSum-master/models/bert_classifier', optim='adam', param_init=0, param_init_glorot=True, recall_eval=False, report_every=1, report_rouge=True, result_path='/home/test/WangHN/BertSum-master/results/classifier/cnndm', rnn_size=512, save_checkpoint_steps=5, seed=666, temp_dir='../temp', test_all=True, test_from='', train_from='', train_steps=1000, use_interval=True, visible_gpus='0', warmup_steps=8000, world_size=1)
[2019-05-08 20:20:04,796 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.0.bert.pt, number of examples: 2001
gpu_rank 0
[2019-05-08 20:20:04,799 INFO] * number of parameters: 109483009
[2019-05-08 20:20:42,455 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.1.bert.pt, number of examples: 2001
[2019-05-08 20:21:21,366 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.2.bert.pt, number of examples: 2001
[2019-05-08 20:22:00,217 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.3.bert.pt, number of examples: 2001
[2019-05-08 20:22:39,093 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.4.bert.pt, number of examples: 2001
[2019-05-08 20:23:17,807 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.5.bert.pt, number of examples: 2000
[2019-05-08 20:23:56,463 INFO] Loading valid dataset from /home/test/WangHN/BertSum-master/bert_data/cnndm.valid.6.bert.pt, number of examples: 1362
[2019-05-08 20:24:22,847 INFO] Validation xent: 0.125946 at step 3000
should i wait for this commend end of run?
Hi,
I have some problems in preprocessing.
I download the data cnn_stories_tokenized and dm_stories_tokenized, however, it's all *.story. In your preprocess.py, it requires *.json as input, can you provide the code that transform *.story to *.json? I meet some problem in reading the sample json with you load_json function in data_builder.py. Appreciated!
hi,
Can i have your ROUGE1.5.5 file or download link?
Beacause my lead3 and other result is better than the result on your paper .
Is it normal ?
Thanks.
Hi,
There are 50 checkpoints in the file ,I need the summarization of the article to compare ,but the results file have nothing.
The training and evaluation script I believe are showing me only "xent", cross entropy loss (I'm assuming). Where can I find the rogue scores that were published in your paper? Thanks.
I am trying to mimic the code in Google Colab using a GPU, however, after preprocessing when I try run the BERT+Classifier model for the first time (with visible_gpus 0 etc.), I have an error that the '/logs/bert_classifier' file/directory doesn't exist as the logs folder is empty. Should there be anything there? Or is the issue that the code hasn't downloaded the bert model?
Thanks
Hi,
This link does not work:
https://drive.google.com/open?id=1lqQmKflLisi-JBzFY0DL4sxqKtzZMrxt
Is it safe to assume that BERTSUM will perform without issues on input text tokenized using a tokenizer other than Core-NLP as long as we implement sentence splitting?
Hi ,nlpyang!
I'm really looking forward to tensorflow version,I know you can handle it easily
Dear,
Thanks for your great jobs.
I have a question about the final vocabulary.
Should I maintain a vocabulary for the corpus, or use the exact vocabulary privided by the BertTokenizer.from_pretrained('bert-base-uncased').vocab
?
In other words, should I use the exact vocab in BerTokensizer when finetuning Bert?
I found that there are only 27615 English words in the vocab of BertTokenizer.
Best regards.
I read in the paper that for extractive summarization, you only take the top 3 sentence scores, There's no explicit mention of this in the code, is there any way i could modify this to get more sentences and experiment a little?
Read the paper and the gold label Yi trouble me . I hope you can explain it.Thanks
Hi,
The current link for downloading the processed data seems to be dead and returns a 404
2.In the paper's Table1: Test set results on the CNN/DailyMail dataset using ROUGE F1, you show rouge score of Oracle and other model. I want to know how do you calculate Oracle_ROUGE-1(52.59; 31.24; 48.87), and taking what as ref? And how do you calculate BERTSUM+Transformer_ROUGE-1(43.25; 20.24; 39.63), and taking what as ref?
Hello,
about the trigram blocking. Is the candidate c (as per your paper), a group of 3-grams randomly selected for each source sentence?
is this the code for the candidate?
for i, idx in enumerate(selected_ids):
_pred = []
if(len(batch.src_str[i])==0):
continue
for j in selected_ids[i][:len(batch.src_str[i])]:
if(j>=len( batch.src_str[i])):
continue
candidate = batch.src_str[i][j].strip()
if(self.args.block_trigram):
if(not _block_tri(candidate,_pred)):
_pred.append(candidate)
else:
_pred.append(candidate)
if ((not cal_oracle) and (not self.args.recall_eval) and len(_pred) == 3):
break
I m trying to understand this code. It seems 'j' here in the for loop is a character or is it a word? I was trying to match this with preprocessing step and trigram blocking.
def _block_tri(c, p):
If you can help me understand this code a bit it would be great.
is this code saying take the character of each source string and find repeating trigrams? is the '_pred' appending each word or character here?
Hi Yang , i wanted to see the pretrained model results,May you please provide access to the pretrained model,the permission to download is not open from the link mentioned in the README file
Hi,
I clone the git, then download the processed data, then run the cmd:
----------- cmd-----------
python train.py -mode train -encoder rnn -dropout 0.1 -bert_data_path ../bert_data/cnndm -model_path ../models/bert_classifier -lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -world_size 1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 64 -decay_method noam -train_steps 5120 -accum_count 2 -log_file ../logs/bert_classifier -use_interval true -warmup_steps 256
----------- cmd-----------
I got this error:
------------error------------
[2019-05-14 03:21:58,963 INFO] Start training...
[2019-05-14 03:21:59,150 INFO] Loading train dataset from ../bert_data_tmp/cnndm.train.0.bert.pt, number of examples: 2001
Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/notebooks/workspace/git/BertSum/src/models/trainer.py", line 158, in train
report_stats)
File "/notebooks/workspace/git/BertSum/src/models/trainer.py", line 323, in _gradient_accumulation
sent_scores, mask = self.model(src, segs, clss, mask, mask_cls)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/notebooks/workspace/git/BertSum/src/models/model_builder.py", line 96, in forward
sent_scores = self.encoder(sents_vec, mask_cls).squeeze(-1)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/notebooks/workspace/git/BertSum/src/models/encoder.py", line 129, in forward
memory_bank = self.dropout(memory_bank) + x
RuntimeError: The size of tensor a (512) must match the size of tensor b (768) at non-singleton dimension 2
------------error------------
Hello,can u tell me how to test?how to set the -test_from
can you show me the order?
Hi:
In the paper, the result is the best result in a checkpoints or the averaged results on the top-3 checkpoints? Becasue in my test the result for bertsum+transformer the best result is 43.23 and the averaged results is 43.1466.
The Option 1: download the processed data use combination_selection or greedy_selection?
Because if use data with Option 1 will have better rouge result than Option 2.
Thanks.
Traceback (most recent call last):
File "train.py", line 340, in
train(args, device_id)
File "train.py", line 272, in train
trainer.train(train_iter_fct, args.train_steps)
File "/home/wsy/xry/BertSum-master/src/models/trainer.py", line 142, in train
for i, batch in enumerate(train_iter):
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 131, in iter
for batch in self.cur_iter:
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 235, in iter
batch = Batch(minibatch, self.device, self.is_test)
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 27, in init
src = torch.tensor(self._pad(pre_src, 0))
File "/home/wsy/xry/BertSum-master/src/models/data_loader.py", line 14, in _pad
width = max(len(d) for d in data)
ValueError: max() arg is an empty sequence
First of all, many thanks for the code!
I am trying to convert the sample json file in the ../json_data directory to a Pytorch file in the ../bert_data directory (testing this out so I can use my own text in JSON format:
python preprocess.py -mode format_to_bert -oracle_mode greedy -n_cpus 4 -log_file ../logs/preprocess.log
However, the code doesn't seem to do much. I get the following back:
[('../json_data\\cnndm_sample.train.0.json', Namespace(dataset='', log_file='../logs/preprocess.log', lower=True, map_path='../data/', max_nsents=100, max_src_ntokens=200, min_nsents=3, min_src_ntokens=5, mode='format_to_bert', n_cpus=4, oracle_mode='greedy', raw_path='../json_data/', save_path='../bert_data/', shard_size=2000), '../bert_data/bert.pt_data\\c
nndm_sample.train.0.bert.pt')]
And it has been stuck on this for the last 2/3 hours. I would think that converting the .json files to .pt shouldn't take this long (also my CPU's are not utilized at all). Have you encountered this?
Since the original Bert model in training restricts the max length of sentence to 512. So in summarization, did you set any hand-crafted scheme to restrict the article length? Or, you just inject all the article token into the pre-trained Bert model.
under model_builder,
bert is initialized and vectors from bert are fed into the encoder. The encoder itself has positional embedding of vectors (under encoder.py). There are no positional embedding prior to this. Am I to assume the Bert model adds positional embedding to each sentence and we are not required before vectorization to bert?
File "D:/untitled/BertSum-master/src/train.py", line 349, in
step = int(cp.split('.')[-2].split('_')[-1])
IndexError: list index out of range
what should i do ?
thanks...
Hi, I'm currently trying to preprocess my own news articles so that I can use it with the pre-trained model. I'm currently trying to use Stanford NLP to preprocess the data and am then looking to use preprocess.py. Am I on the correct lines, and is this something I need to do if I want to generate summaries of my own articles?
Hi,sorry to trouble you:
:how can I get the original article?
For example, the reference summary ref.83.txt and can.83.txt, i cannot find the original article.
And the results I get are different from yours, my pyrouge configuration is correct, what's wrong with me? can you help me? Thanks!
This is classifier log file
May I ask why you use the Stanford Core-NLP tokenizer instead of the default BERT WordPiece tokenizer?
What if any issues would occur if bert-large was used? For example gpu requirements and training time? would it be too costly? Any reason why bert-base was used instead of bert-large?
Hello Yang. First, thank you very much for sharing your method and code. I have met a problem, when doing the model evaluation. Could you please help me figure it out?
I have run this code for testing the model on test set:
python train.py -mode test -bert_data_path ../bert_data/cnndm -model_path MODEL_PATH -visible_gpus -1 -gpu_ranks 0 -batch_size 30000 -log_file LOG_FILE -result_path RESULT_PATH -test_all -block_trigram true -test_from /Users/admin/Desktop/XXX/model_step_50000.pt
An Error is raised:
FileNotFoundError: [Error 2] No such file or directory: 'XXX/.pyrouge/settings.ini'
Have you met this kind of problem before? Or can you provide another ways to calculate ROUGE for the model? Thank you very much!
Hello,
I was wondering how long will take to train the model with 3 gpus? I m trying to calculate the cost and whether it is affordable for me to use an aws p3.16xlarge to train the model.
I see that load_pretrained_bert=True was used during training but not for validation or testing. Is there a particular reason for this? I m assuming 'load_pretrained_bert=True' is to load the pretrained bert. why not use for testing and validation as well?
can you explain what labels plays a role in bert tokenizer.
especially this code:
labels = labels[:len(cls_ids)]
i don't understand what the above code does and how do labels play a part in tokenization for bert?
HI,
My command
python3 preprocess.py -mode tokenize -raw_path ../raw_stories/ -save_path ../merged_stories_tokenized -log_file ../logs/cnndmtoken.log -n_cpus 50 -log_file ../logs/cnndmtoken.log
python3 preprocess.py -mode format_to_lines -raw_path ../merged_stories_tokenized -save_path ../json_data/cnndm -map_path ../urls -lower -log_file ../logs/cnndmtoken.log
But After format_to_lines tgt in my file is a empty list
a part of cnndm.train.0.json
portedly", "treated", "for", "appendicitis", "but", "was", "well", "enough", "to", "walk", "out", "of", "the", "clinic", "on", "his", "own", "@highlight", "took", "a", "private", "jet", "to", "the", "argentinian", "capital", "and", "is", "now", "under", "`", "observation", "'"]], "tgt": []},
Appreciated!
Hi,
In your paper or the data Option 1 . Did you remove the sentence that shorter than args.min_nsents 5(default) for lead3 or any mode in CNN/DailyMail?
Thank you very much.
Hi,
I am currently in the process of testing some recent approaches for extractive summarization. I just want to test the models on a collection of text that I have, but I still could not sort out what should I do just to summarize a new piece of text using your codebase. Any pointers?
Thanks!
Are you planning to publish your saved checkpoints for the models ? (particularly BERTSUM+Transformer)
Thanks in advance :)
Dear author,
May I ask why the number of data points in processed training data is only 287084, while the paper claims it uses 287227 training points?
Looking forward to your reply,
Thank you,
Following the readme, I download the processed data and try to train the model by myself.
I use single-GPU to downloading data. After that I rerun the code with multi-GPUs. The program can start executing, but after a while, whether it is a single GPU or multiple GPUs, the following problems always occur. The step of running at the time of interruption may be different between different experiments.
[2019-04-04 00:46:20,545 INFO] loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ../temp/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
[2019-04-04 00:46:23,405 INFO] Step 400/50000; xent: 3.30; lr: 0.0000008; 20 docs/s; 368 sec
[2019-04-04 00:47:08,724 INFO] Step 450/50000; xent: 3.33; lr: 0.0000009; 22 docs/s; 413 sec
[2019-04-04 00:47:51,472 INFO] Step 500/50000; xent: 3.19; lr: 0.0000010; 23 docs/s; 456 sec
[2019-04-04 00:48:35,173 INFO] Step 550/50000; xent: 3.22; lr: 0.0000011; 23 docs/s; 500 sec[2019-04-04 00:49:16,433 INFO] Loading train dataset from ../bert_data/cnndm.train.6.bert.pt, number of examples: 2001[2019-04-04 00:49:41,427 ERROR] Model name 'bert-base-uncased' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
File "train.py", line 335, in
train(args, device_id)
File "train.py", line 267, in train
trainer.train(train_iter_fct, args.train_steps)
File "/users4/zyfeng/gitcodes/BertSum/src/models/trainer.py", line 142, in train
for i, batch in enumerate(train_iter):
File "/users4/zyfeng/gitcodes/BertSum/src/models/data_loader.py", line 141, in iter
self.cur_iter = self._next_dataset_iterator(dataset_iter)
File "/users4/zyfeng/gitcodes/BertSum/src/models/data_loader.py", line 159, in _next_dataset_iterator
device=self.device, shuffle=self.shuffle, is_test=self.is_test)
File "/users4/zyfeng/gitcodes/BertSum/src/models/data_loader.py", line 175, in init
self.bert_data = BertData(args)
File "/users4/zyfeng/gitcodes/BertSum/src/models/data_loader.py", line 15, in init
self.sep_vid = self.tokenizer.vocab['[SEP]']
AttributeError: 'NoneType' object has no attribute 'vocab'
The following files is downloaded in temp dir.
26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.json
9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba.json
I am curious about this reason. I don't know what should I do ?
hi, my command line is "pythons train.py -mode train -encoder transformer -dropout 0.1 -bert_data_path ../news_bert/cnndm -model_path ../bert_model/bert_transformer -lr 2e-3 -visible_gpus 1 -gpu_ranks 1 -world_size 1 -report_every 50 -save_checkpoint_steps 1 -batch_size 3000 -decay_method noam -train_steps 1 -accum_count 2 -log_file ../logs/bert_transformer -use_interval true -warmup_steps 10000 -ff_size 2048 -inter_layers 2 -heads 8",
then I encountered a bug: the dataset during the training phase is loaded an infinite number of times.
[2019-06-12 16:39:12,359 INFO] Loading train dataset from ../news_bert/cnndm.train.1.bert.pt, number of examples: 1961
[2019-06-12 16:39:13,338 INFO] Loading train dataset from ../news_bert/cnndm.train.6.bert.pt, number of examples: 1970
[2019-06-12 16:39:14,270 INFO] Loading train dataset from ../news_bert/cnndm.train.15.bert.pt, number of examples: 1564
[2019-06-12 16:39:15,057 INFO] Loading train dataset from ../news_bert/cnndm.train.7.bert.pt, number of examples: 1962
[2019-06-12 16:39:16,033 INFO] Loading train dataset from ../news_bert/cnndm.train.3.bert.pt, number of examples: 1971
[2019-06-12 16:39:17,077 INFO] Loading train dataset from ../news_bert/cnndm.train.11.bert.pt, number of examples: 1959
[2019-06-12 16:39:18,029 INFO] Loading train dataset from ../news_bert/cnndm.train.13.bert.pt, number of examples: 1972
[2019-06-12 16:39:18,870 INFO] Loading train dataset from ../news_bert/cnndm.train.8.bert.pt, number of examples: 1967
[2019-06-12 16:39:19,761 INFO] Loading train dataset from ../news_bert/cnndm.train.14.bert.pt, number of examples: 1973
[2019-06-12 16:39:20,687 INFO] Loading train dataset from ../news_bert/cnndm.train.2.bert.pt, number of examples: 1970
[2019-06-12 16:39:21,623 INFO] Loading train dataset from ../news_bert/cnndm.train.9.bert.pt, number of examples: 1971
[2019-06-12 16:39:22,526 INFO] Loading train dataset from ../news_bert/cnndm.train.9.bert.pt, number of examples: 1971
[2019-06-12 16:39:23,377 INFO] Loading train dataset from ../news_bert/cnndm.train.14.bert.pt, number of examples: 1973
[2019-06-12 16:39:24,187 INFO] Loading train dataset from ../news_bert/cnndm.train.4.bert.pt, number of examples: 1963
[2019-06-12 16:39:25,108 INFO] Loading train dataset from ../news_bert/cnndm.train.7.bert.pt, number of examples: 1962
[2019-06-12 16:39:26,105 INFO] Loading train dataset from ../news_bert/cnndm.train.10.bert.pt, number of examples: 1970
[2019-06-12 16:39:27,056 INFO] Loading train dataset from ../news_bert/cnndm.train.11.bert.pt, number of examples: 1959
[2019-06-12 16:39:28,027 INFO] Loading train dataset from ../news_bert/cnndm.train.0.bert.pt, number of examples: 1960
[2019-06-12 16:39:28,954 INFO] Loading train dataset from ../news_bert/cnndm.train.5.bert.pt, number of examples: 1982
[2019-06-12 16:39:29,930 INFO] Loading train dataset from ../news_bert/cnndm.train.13.bert.pt, number of examples: 1972
[2019-06-12 16:39:30,946 INFO] Loading train dataset from ../news_bert/cnndm.train.3.bert.pt, number of examples: 1971
What can I do? Thanks!
Hi, I have downloaded the processed data and I am trying to run the command
python train.py -mode train -encoder classifier -dropout 0.1 -bert_data_path ../bert_data/cnndm -model_path ../models/bert_classifier -lr 2e-3 -visible_gpus 0,1,2 -gpu_ranks 0,1,2 -world_size 3 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -decay_method noam -train_steps 50000 -accum_count 2 -log_file ../logs/bert_classifier -use_interval true -warmup_steps 10000
However, I get the error:
RuntimeError: the distributed NCCL backend is not available; try to recompile the THD package with CUDA and NCCL 2+ support at /Users/distiller/project/conda/conda-bld/pytorch_1556653464916/work/torch/lib/THD/process_group/General.cpp:20
My machine does not have a GPU. Is there any way I can run this only on a CPU? Thanks!
Sorry for opening an issue on this, the doubt is sort of trivial.
What oracle mode was used for the processed data, combination or greedy?
According to your paper :
As you said, it follows Attention is all you need :
But in your case, what is the reason you choose 2e-3
as initial learning rate ?
If we follow the formula of Vaswani, we have :
d_model = 768 (because we use BERT-base)
d_model^-0.5 ~= 0.036
So where this 2e-3
comes from ?
Also, why choosing 10 000 warmup steps ?
Original Transformer paper used 4 000 warmup steps - 100 000 total steps.
BERT paper used 10 000 warmup steps - 1 000 000 total steps, but they didn't use noam decay method, just linear decay.
Thank you for sharing such a great codebase :)
I have a question about truncated article.
As mentioned in #14, article are truncated at 512 tokens. In some case, if the oracle sentences were located at the end of the article, this will produce samples with no gold labels.
So for these "empty" samples, the network will be trained to classify all of the article's sentence as not salient.
This process raises several questions :
Is it useful for the performance to keep such "empty" samples ?
Did you compare the performance of the actual network with a network trained without "empty" samples (even empirically) ?
It seems similar to SQuAD 2.0 : teaching the network that there is not always 3 salient sentences in the fed input (sometimes there is 2, sometimes 1, sometimes 0).
Yet at test time, you invariably pick the 3 best sentences, no matter their score (= no matter if the network decided that only 2/1/0 sentence was really salient).
It seems to be an important difference between training and inference. Is my intuition wrong ?
If I'm wrong, can you (quickly) explain where I misunderstood.
If I'm right, isn't it going to hurt the performance (maybe there is too little of these empty sample to really hurt the performance) ?
Thank you for your answer !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.