atulkum / pointer_summarizer Goto Github PK

View Code? Open in Web Editor NEW

896.0 896.0 245.0 1.1 MB

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

License: Apache License 2.0

Python 99.42% Shell 0.58%

attention-mechanism deep-learning nlp pointer-networks pytorch-implmention seq2seq-attn summarization

pointer_summarizer's People

Contributors

Stargazers

Watchers

Forkers

zhhengcs anmol21 die1100 poojithansl vidhishanair monireh2 tomgun132 leekwanmeng burkun datblue ymym3412 tianforks hyzcn tedrepo chmille3 charrnander zfang shubhampachori12110095 yourdady zhujunnan kushalarora ashishbaghudana yaminig zhongluwang sangho-vision grace-yinghuang katsumata420 congson1293 ilyagusev yongjie-lin xw0078 leejayyoon wp0517 lzw-pku bohanli aarti9 ryanai3 hrlinlp joyeel stephen-adams zsun1029 regressionist codebaseli wyu-du schen149 quietwoods tsuiwwwayne deppdepp aslongas manuhg darsh10 buger2333 zh215021 chenyangh zzzzxciid namisan t-web fajri91 xcfcode derenlei panjj1990 yiyang7 ninedaywang soupstandstop misoknisky shuandemorian tcrapse syyunn fw339wj jiaruipeng1994 hong8c stepgazaille antoinesimoulin alphanlp louiscastricato shy01010001 geminifox2019 thorphan shenjiawei19 rxma1805 t-ouchi htfhxx lazysjb xrc10 lichao88 senchfu soulbliss jbmanasa jbot19k-fork janciswang xidiandp martin6336 k-burner hellomaxwell akhazane vivid-k hanseokoh shenlanyilang ammieqi bxclib2

pointer_summarizer's Issues

Any pretrained models available

Could anyone provide with a pretrained model please

Why is the model outputting UNK tokens? Shouldn't it be able to point to unkown words from the input?

From: https://github.com/abisee/pointer-generator/blob/master/beam_search.py#L111

In decoding words, we change the token id to the unkown id if t<vocab.size(). So if the decoder is pointing to that particular token it produces [UNK] in output. Is it correct? Following the paper it seems that the decoder should be able to point to that token and copy it, instead of copying the unknown token. I think it's the whole purpose of the pointer-generator model to handle oovs. But from some experiments in decoding I see that the models often outputs some unknown tokens.
I tried replacing the 50k vocabulary to the full vocabolary but I get cuda device asserted errors.

run the train

Thanks for your code, but I have a question: when I run the train.py , one error : AttributeError: 'generator' object has no attribute 'next', i dont understand it , the system show it at the batcher.py 209

pointer generator model starts overfitting

I am trying to train pointer generator network . After training for 10k iterations it starts overfitting.
Any suggestions on why this might be happening.
Note- 1==50 iterations

strange view() operation in ReduceState module

We wanna reduce the forward and backward state, so we need to concatenate the forward state and backward state, then go through a nn.Linear module to let 2 * hidden_dim become hidden_dim.

In your code, h is 2 x B x hidden_dim, and you use view() operation directly on h and c. The results should be concatenating first example forward state and second example forward state, not concatenating first example forward state and first example backward state.

In my opinion, we should use h.transpose(0, 1).contiguous().view(-1, config.hidden_dim*2).

pointer_summarizer/training_ptr_gen/model.py

Line 75 in 454a2f6

hidden_reduced_h = F.relu(self.reduce_h(h.view(-1, config.hidden_dim * 2)))

Illegal division by zero at /My-path-to-RELEASE-1.5.5/ROUGE-1.5.5.pl line 2450.

I was training by chinese weibo data ,but when running decode.py,it failed to calculate rouge score .This is the issue which have the same problem but still remain to be sovled.
andersjo/pyrouge#5

Decoder has finished reading dataset for single_pass.
Now starting ROUGE eval...
Illegal division by zero at /usr/local/ROUGE-1.5.5/ROUGE-1.5.5.pl line 2450.
Traceback (most recent call last):
File "training_ptr_gen/decode.py", line 208, in
beam_Search_processor.decode()
File "training_ptr_gen/decode.py", line 106, in decode
results_dict = rouge_eval(self._rouge_ref_dir, self._rouge_dec_dir)
File "/mnt/jml/nlp/summarize/pointer_summarizer/data_util/utils.py", line 28, in rouge_eval
rouge_results = r.convert_and_evaluate()
File "/usr/local/lib/python3.5/dist-packages/pyrouge-0.1.3-py3.5.egg/pyrouge/Rouge155.py", line 367, in convert_and_evaluate
rouge_output = self.evaluate(system_id, rouge_args)
File "/usr/local/lib/python3.5/dist-packages/pyrouge-0.1.3-py3.5.egg/pyrouge/Rouge155.py", line 342, in evaluate
rouge_output = check_output(command, env=env).decode("UTF-8")
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/ROUGE-1.5.5/ROUGE-1.5.5.pl', '-e', '/usr/local/ROUGE-1.5.5/data', '-c', '95', '-2', '-1', '-U', '-r', '1000', '-n', '4', '-w', '1.2', '-a', '-m', '/tmp/tmpw5iw2_0n/rouge_conf.xml']' returned non-zero exit status 255

TypeError: a bytes-like object is required, not 'str'

i use py3 + pytorch0.4, i got this error of batcher:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/work/anaconda3/envs/susht/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/work/anaconda3/envs/susht/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/work/sushuting/pointer_summarizer/data_util/batcher.py", line 223, in fill_example_queue
abstract_sentences = [sent.strip() for sent in data.abstract2sents(abstract)] # Use the ~~and~~ tags in abstract to get a list of sentences.
File "/home/work/sushuting/pointer_summarizer/data_util/data.py", line 151, in abstract2sents
start_p = abstract.index(SENTENCE_START, cur)
TypeError: a bytes-like object is required, not 'str'

During the training and verification process, when "step = 0", the "coverage" is initialized differently. During training, the coverage is an all-zero tensor, but this is not the case during prediction.

What is the reason for this?

https://github.com/atulkum/pointer_summarizer/blob/a41b69ba4f7eb6ffdeaf99a35bf9c6607ca5db56/training_ptr_gen/model.py#L152

Meaning of "iterations" in readme

What do you mean by "100k iterations"? Is an iteration a batch update? How is that number related to the concept of epoch?

why using 2hidden_dim x 2hidden_dim linear layer in encoder?

I found no this layer in the tensorflow implement by the author.

When training around 8400 iters, got loss=nan

First thanks a lot for releasing the pytorch implementation of pg-network.
When I started training, the output is:

steps 100, seconds for 100 batch: 45.91 , loss: 8.439692
steps 200, seconds for 100 batch: 53.07 , loss: 8.152159
steps 300, seconds for 100 batch: 55.09 , loss: 7.645596
steps 400, seconds for 100 batch: 51.75 , loss: 7.824278
steps 500, seconds for 100 batch: 45.38 , loss: 7.567791

However, when it was about 8400 iters, the output is:

steps 8200, seconds for 100 batch: 57.81 , loss: 7.064022
steps 8300, seconds for 100 batch: 61.35 , loss: 6.479928
steps 8400, seconds for 100 batch: 56.97 , loss: nan
steps 8500, seconds for 100 batch: 61.95 , loss: nan
steps 8600, seconds for 100 batch: 59.06 , loss: nan

And after that it's all nan. I didn't modify the model or data or preprocess.
Could you give me some guide or hints to solve this problem?

TypeError: init() got an unexpected keyword argument 'initial_accumulator_value'

training log:
max_size of vocab was specified as 50000; we now have 50000 words. Stopping reading.
Finished constructing vocabulary of 50000 total words. Last word added: chaudhary
Traceback (most recent call last):
File "training_ptr_gen/train.py", line 150, in
train_processor.trainIters(config.max_iterations, args.model_file_path)
File "training_ptr_gen/train.py", line 121, in trainIters
iter, running_avg_loss = self.setup_train(model_file_path)
File "training_ptr_gen/train.py", line 57, in setup_train
self.optimizer = Adagrad(params, lr=initial_lr, initial_accumulator_value=config.adagrad_init_acc)
TypeError: init() got an unexpected keyword argument 'initial_accumulator_value'
i use pytorch0.3, tensorflow1.2.
Please help.
Thanks.

What is the version of tensorflow？

AttributeError: 'module' object has no attribute 'FileWriter'
AttributeError: 'module' object has no attribute 'logging'

Decode generating same summery

Hi,
Thanks for providing Pytorch implementation for summerization. I trained the model on custom data set for ~15000 iterations. On decoding, same summery is generated for all the stories. I am not sure where exactly is the problem in code, still debugging. Any help/suggestion is appreciated.
In beam_search code, there is a comment '#batch should have only one example' where as in BeamSearch constructor, bacther takes batch size same as beam_size(4). Is it correct ?

Thanks

transfomer_model

transfomer_model这个模型是没用吗

Out of Memory error when trying to load the model

So when I trained the model initially with coverage off for 500k iterations, now what i would like to do is use this same model and further fine tune it on the model for 500k iterations. What i've seen tho is the moment I mention the model_file_path the model crashes and gives an OOM error even though I have a tonne of memory. The model works just fine if i dont specify the model file path

The low rouge scole

Hello,
Thanks for your valuable code. I have a question. When I run the code for 500k iterations. I get the rouge result:
ROUGE-1:
rouge_1_f_score: 0.2648 with confidence interval (0.2574, 0.2725)
rouge_1_recall: 0.3672 with confidence interval (0.3572, 0.3771)
rouge_1_precision: 0.2162 with confidence interval (0.2095, 0.2229)

ROUGE-2:
rouge_2_f_score: 0.0944 with confidence interval (0.0881, 0.1005)
rouge_2_recall: 0.1315 with confidence interval (0.1227, 0.1396)
rouge_2_precision: 0.0772 with confidence interval (0.0717, 0.0824)

ROUGE-l:
rouge_l_f_score: 0.2348 with confidence interval (0.2277, 0.2419)
rouge_l_recall: 0.3249 with confidence interval (0.3149, 0.3346)
rouge_l_precision: 0.1920 with confidence interval (0.1854, 0.1980)

So, Can you help me ? Thanks a lot.

pre-trained word embedding

Hello, i would like to use pre-trained word embedding, which code will i change. Thank you

How to train the model from last checkpoint saved

Thanks for your contribution. Sir I have trained the model till 2,70,000 epoch. But due to some reason I have to stop the training there. When saw the loss it was still decreasing. So now I want to train the model again from last checkpoint i.e. 2,70,000. How can I do?

Need help for retraining and cross validation

Need help for retraining and cross validation and see if the ROUGE score matches exactly (or better) with the numbers reported in the paper.
I just train for 500k iteration (with batch size 8) with pointer generation enabled + coverage loss disabled and next 100k iteration (with batch size 8) with pointer generation enabled + coverage loss enabled.

It would be great if someone can help re-running these experiments and try to see if we can improve the result and match it with the paper.

You might need a better GPU though. (my current one is gtx 1070 8 gb)

Index of Out Bound Error in doing scatter_add when we have OOV words

pointer_summarizer/training_ptr_gen/model.py

Line 192 in 454a2f6

final_dist = vocab_dist_.scatter_add(1, enc_batch_extend_vocab, attn_dist_)

I am using another dataset to train the model, I observed that if we have OOV words, the value in enc_batch_extend_vocab could be larger than the length of vocab_dist_, which causing like index out of bound error.

Is it better to patch those wordID to zero if it is larger than the size of Vocab?

transformer_pointer

Hi, Thanks for providing implementation for pointer_summarizer. and I would like to ask, when will the “transformer+pointer” be completed?because I have encountered some problems in the process of implementation.

Training saturates early?

I'm using the same hypers but seeing this for my training curve. Why would this happen? Looks like the LR is too high but your curve with the same lr seems fine.

Decode custom texts with a pretrained model

I have some texts that I would like to summarize using this model. Each text is stored in a json file with the following format. It has an empty id and abstract field and the article field corresponds to an array of tokenized strings/sentences.

{"id": "", "abstract": [], "article": ["Hyperconvergence has come a long way in a relatively short time , and enterprises are taking advantage of the new capabilities .", "Hyperconverged infrastructure ( HCI ) combines storage , computing and networking into a single system ; hyperconverged platforms include a hypervisor for virtualized computing , software-defined storage , and virtualized networking .", "HCI platforms were initially aimed at virtual desktop infrastructure ( VDI ) , video storage , and other discrete workloads with predictable resource requirements .", "Over time , they have advanced to become suitable platforms for enterprise applications , databases , private clouds , and edge computing deployments .", "Learn more about hyperconvergence A couple of key developments have made HCI more appealing for more workloads .", "One is the ability to independently scale compute and storage capacity , via a disaggregated model .", "The other is the ability to create a hyperconverged solution using NVMe an open logical device interface specification for accessing non-volatile storage media attached via a PCI Express bus over fabrics .", "In general , there is a greater understanding of the value proposition of HCI , specifically for smaller enterprises that may not need [ or ] want a full-scale data center infrastructure , but want to retain some control over their environments , says Sebastian Lagana , research manager , infrastructure platforms and technologies , at research firm IDC .", "The increasing use of hybrid cloud environments by enterprises also lines up nicely with the software-defined data center story , which HCI is certainly a large part of , Lagana says .", "HCI has become a suitable platform for broader use due to a lot of the underlying improvements in the technology , Lagana says .", "At the same time , many enterprises have gone through an IT refresh cycle and HCI seems like a natural transition .", "Weve spoken with some HCI adopters and , in some cases , folks were talking to are upgrading multiple generation-old infrastructure running on old , sometimes now unsupported software , Lagana says .", "At that point , if the old server and/or storage technology theyre using is that far behind whats now available , it becomes a matter of the level of complexity theyre seeking in their new environment .", "HCI has the required horsepower while providing a user-friendly management interface , Lagana says .", "Could you run faster with a highly customized solution ?", "he says .", "Sure , but in many cases its not worth the extra effort when the HCI solution will suffice and provides good long-term scalability .", "Among the key benefits organizations can see from deploying HCI more broadly are greater consolidation and simplification of the IT infrastructure , which allows IT teams to better support business objectives , Lagana says .", "Other possible benefits include faster helpdesk response times , proactive understanding of potential hardware failures , the ability to quickly spin up new servers or test environments , faster disaster recovery and easier backup features .", "There are also the more mechanical benefits , Lagana says .", "Hardware consolidation provides power , cooling and facilities cost savings , which is easy to measure and is an easy sell to less tech-savvy budget holders , he says .", "Also , HCI and the underlying software makes it easier to maximize utilization of existing resources , which reduces longer-term storage and server expenses as well .", "HCI deployment scales as business expands Celtic Manor Collection , a resort hotel and conference center operator , has been using two clusters of Dell EMCs VXrail HCI appliance , beginning in September 2017 .", "Among the initial business drivers for deploying HCI was that Celtic Manor was embarking on a new joint venture to build an international convention center in Wales , says Chris Stanley , IT manager .", "The project required the flexibility to scale systems quickly , the ability to easily manage and maintain data center capacity with a small team , the ability to respond quickly to any outages in service , and resiliency to avoid any downtime for large-scale events at the convention center .", "Celtic Manor previously had an environment that included storage-area networks ( SAN ) and VMware ESXi servers , but it was taking a lot of resources to maintain , upgrade , and troubleshoot , Stanley says .", "The business was growingand still israpidly and bursting at the seams with data , he says .", "We needed a complete rethink to prepare the data center for the future and simplify management .", "Initially the company was deploying the clusters as separate data centers for different business entities .", "When we deployed our second cluster we quickly realized we could do more if the two were able to connect over the network together , Stanley says .", "As of today , we now have our core business systems split between the two clusters , with all off these having a recover point copy on the opposite cluster .", "So we now have full cluster failover if required , [ which ] gives us a lot of peace of mind as a business .", "HCI has become the core tech in our business , Stanley says .", "With our planned business expansion of several new hotels in the next two years , we have a template with predictive costs and scalability .", "The company uses HCI for its main enterprise applications , which run on large Oracle and SQL databases .", "These are using less resources than when they were in their previous environment , and we regularly monitor these to see if any servers are over provisioned , Stanley says .", "Celtic Manor is preparing to roll out VDI , with up to 450 endpoints added over the next 12 to 18 months .", "With our business growing , we are looking to potentially use the HCI clusters for cloud and remote deployment for our new hotels , Stanley says .", "VXrail has given us a solid flexible platform to grow our business .", "What has enabled an expanded role for HCI are developments in NVMe over fabrics , with CPUs having a smaller workload intensity , and greater amounts of input/output operations per second ( IOPS ) being achieved on a regular basis , Stanley says .", "With demands on data center performance growing to process and store vast amounts of data every second , it is great timing for the hyperconverged market to make its mark , Stanley says .", "Among the key benefits of HCI thus far are less time spent by the IT team on upgrading and maintaining the data center ; improved application performance ; and a 10 % reduction in data center power consumption .", "HCI powers county 's core apps and services Also expanding its use of HCI is the County of San Mateo , Calif. , which began using Nutanixs HCI platform in 2014 .", "We originally looked at the HCI solution to solve performance issues with our VDI deployment on VMwares Horizon platform , says Jon Walton , CIO .", "We had unsuccessfully tried to use EMC , Dell , and NetApp storage on blade servers , but kept running into high latency issues , especially as users logged into their sessions .", "After initial successes with VDI , county officials began to consider using the Nutanix HCI platform for all of its virtual workloads .", "The timing was perfect , as we were starting to virtualize more and more workloads , Walton says .", "In the last two years , the county has moved all its heavier workloads running Microsoft SQL and Oracle to dedicated Nutanix clusters .", "Most recently , it moved its countywide voice-over-IP implementation to two dedicated Nutanix clusters running Avaya Aura on VMware ESXi .", "There have been constant improvements on every level with HCI , Walton says .", "Shortly after we adopted Nutanix , they came out with one-click software upgrades .", "Through their HTML5 interface , we can upgrade every element of our virtual stackdisk firmware , BIOS , Nutanix AOS , Nutanix health check and VMware ESXiwith zero downtime and almost zero interaction .", "San Mateo has already converted 99 % of its Oracle and MS SQL applications to the HCI environment .", "It is also leveraging Nutanixs Protection Domain replication service for remote sites to provide high availability within county data centers , Walton says .", "With HCI , instead of spending all our time reacting to problems and resource constraints , we now have the time to research smart technology choices for the county , Walton says .", "Additionally , we no longer must rely on a small group of SMEs [ subject matter experts ] to provide expertise around storage and servers , as Nutanix takes care of it for us .", "County residents who rely on a variety of services have also seen benefits .", "They dont know or care what we run on , they just know it is fast and has had almost zero downtime in five-plus years , Walton says .", "Hyperconvergence market trends Demand for HCI and for data center convergence in general is on the rise .", "A recent report by research firm IDC shows that worldwide converged systems market revenue increased 10 % year over year to $ 3.5 billion during the second quarter of 2018 .", "HCI products helped to drive second quarter market expansion , the study said , thanks in part to their ability to reduce infrastructure complexity , promote consolidation , and allow IT teams to support an organization 's business objectives .", "Revenue from hyperconverged systems sales grew 78 % year over year during the second quarter , generating $ 1.5 billion worth of sales .", "This amounted to 41 % of the total converged systems market , the report said .", "IDC provides two ways to rank technology suppliers within the hyperconverged systems market , in terms of market share .", "One is by the brand of the hyperconverged platform and the other is by the owner of the software providing the core hyperconverged capabilities .", "For brand , those with the highest share are Dell , Nutanix , Cisco , and HPE .", "In terms of HCI software , the leaders are Nutanix , VMware , Dell , Cisco , and HPE .", "As for future developments in the hyperconvergence market , one of the growing trends is NVMe-based HCI , Lagana says .", "Were seeing flash as a major adoption driver , not just in HCI but in broader converged infrastructure and storage markets , and NVMe is the next step in that evolution , he says .", "Join the newsletter !", "Error : Please check your email address ."]}

To adjust the code so that my new format can be used, I tried to do the follow.

I modified example_generator in data_util/data.py as follows so that my all my json files in the data_path directory can be read.

def example_generator(data_path, single_pass):
  while True:
    print "Starting to generate examples"
    files = find_all_files(data_path)
    assert files, ('Error: No file at %s' % data_path)

    if single_pass:
      files = sorted(files)
    else:
      random.shuffle(files)

    for f in files:
      print "Example generated"
      with io.open(f, 'r', encoding='utf-8') as fp:
        content = json.load(fp)
        if 'article' not in content.keys():
          continue
        abstract = ''
        article = ' '.join(content['article'])
        yield abstract.lower(), article.lower()

    if single_pass:
      print "example_generator completed reading all datafiles. No more data."
      break


def find_all_files(data_path):
  print "Finding all files"
  os.chdir(data_path)
  files = []
  for f in glob.glob('*.json'):
    files.append(data_path + f)

  print "There are in total %d files" % len(files)
  return files

I modified text_generator in data_util/batcher.py so that it utilizes the tuple my example_generator yields instead of the tensorflow example object.

    def text_generator(self, example_generator):
      while True:
        example = example_generator.next()
        try:
          article_text = example[1]
          abstract_text = example[0]
        except ValueError:
          tf.logging.error('Failed to get article or abstract from example')
          continue

        if len(article_text) == 0:
          tf.logging.warning('Found an example with empty article text. Skipping it.')
          continue
        else:
          yield article_text, abstract_text

After the above two steps, I modified data_util/config.py so that the data paths are set correctly.

I utilized the vocab file located in the finished_files directory that came with the cnn daily mail dataset.

Lastly, I run start_decode.py with a model after 500k iterations and expected it would work.

However, I got this error with pointer generator turned off.

Traceback (most recent call last):
  File "/root/miniconda3/envs/py27/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/root/miniconda3/envs/py27/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/gluster/projects/pointer_summarizer_decode/training_ptr_gen/decode.py", line 210, in <module>
    beam_Search_processor.decode()
  File "/gluster/projects/pointer_summarizer_decode/training_ptr_gen/decode.py", line 84, in decode
    (batch.art_oovs[0] if config.pointer_gen else None))
  File "data_util/data.py", line 178, in outputids2words
AssertionError: Error: model produced a word ID that isn't in the vocabulary. This should not happen in baselin
e (no pointer-generator) mode

I tried to print out the variable output_ids in the function decode in training_ptr_gen/decode.py and got the following.

[9223372034707292159, 0, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 2, 9223372034707292159, 0, 9223372034707292159, 9223372034707292159, 2, 9223372034707292159, 2, 0, 9223372034707292159, 9223372034707292159, 9223372034707292159, 0, 9223372034707292159, 0, 9223372034707292159, 3]

As you can see, there are many repetitive 9223372034707292159's and I have no clue how they came about.

Is it possible at all to use a model trained on cnn-daily mail dataset to summarize some other third party texts stored in files with different formats?

If so, did I do something wrong to prepare for the examples?

Thank you so much for your help!

Decoder output

I am getting same output for all the batches

What does the variable (training) mean?

pointer_summarizer/training_ptr_gen/model.py

Line 152 in a6620ef

if not self.training and step == 0:

Hi!
I can not found the definition of the variable self.training.
Could you give some explanations.
Thanks a lot.

beam search log prob

I think final_dist from decoder is not log scale value, because it comes from the softmax. So before instantiate new Beam in beam_search() at decode.py, log(e.g torch.log ) needs to be applied.

Reason for Accepting Repeated Identical Examples in Decoding Phase

pointer_summarizer/data_util/batcher.py

Line 228 in 5eb298a

b = [ex for _ in xrange(self.batch_size)]

Is there a particular reason why we send a repeated examples as a batch in the decoding phase?
But not a batch of different examples?

Is it because for pack_padded_sequence

pointer_summarizer/training_ptr_gen/model.py

Line 55 in 5eb298a

packed = pack_padded_sequence(embedded, seq_lens, batch_first=True)

it only accepts sorted sequences?

Can I modify the "num_layers" parameter?

The num_layers in this code is just 1,which I think is too small.How about let this parameter larger?

Python3 support?

Any chance we'll get a Python 3 version? Thanks

url correction

Thanks for mentioning our paper "Automatic Fact-guided Sentence Modification" in the README

If you can just correct the URL: https://arxiv.org/abs/1909.13838

Thanks again for your clear code and implementation 😃

Error: -bash: log/training_log: Is a directory

In the start_train.sh, there is python training_ptr_gen/train.py >& log/training_log &
When I bash run start_train.sh, the terminal shows -bash: log/training_log: Is a directory.
So, I want to ask what's the meaning of >& log/training_log &
In your config.py. the log directory should exist in ptr_nw/log.
Thx.

Training time

any idea about the training time if trained in google Collab ?

@atulkum how long it took to train?

Motivation for x_context

I am curious about the motivation for this step, as I couldn't anything about it in See et al., though I might have just missed it.

what is '$MODEL' in 'start_decode.sh'?

Hello, I am running start_decode.sh, and I got the error for this line:
model_filename = sys.argv[1]
The error is list out of range, so I went and checked the sys.argv. by calling print sys.argv and got : ['training_ptr_gen/decode.py']. Nothing for sys.argv[1].
In the shell script :
MODEL= $1
python training_ptr_gen/decode.py $MODEL
But when I type echo $1, nothing is returned.
So I am wondering what is it supposed to be and why did I get this error?
Thanks!

question about eval

Hello, first of all, thank you for the code you wrote. After adding the coverage mechanism, after 500,000 iterations, the result of Rouges-1 rouges-2 Rouges-L was 0.5 lower than the original paper. May I ask why?
And iteration 70000 was better than iteration 500000.
500000: rouge_1_fscore: 0.3890, rouge_2_f_score: 0.1705, rouge_l_fscore: 0.3579.
70000: rouge_1_fscore: 0.3912, rouge_2_f_score: 0.1705, rouge_l_fscore: 0.3567

how to use the 'model_transformer'?

how to predict single line after training

I have trained model for chinese weibo data ,how could i predict single line?

Test time custom decoding!!

Thank you for your contribution sir.
I want to know few things

During test time, how to feed the .story file and get the summary?
At decoding step, how to know which .story file's summary is generated?
Can we load custom pretrained embedding to train the system?

Single word is repeated in entire summary while coverage is enable

hi,
first of all thanks for your pytorch implementation. i m getting a pretty good results when i put is_coverage=False in config. But while putting it true all the summaries just contain the single word which is repeated (atleast minimum number of decoder times). i am using the same dataset and i trained it for 250000 iteration for is_coverage=True and 200000 for is_coverage=False.

Most of the summaries are abstractive in nature they are just copying complete lines from the story . why as per model there needs to be some kind of abstraction.???
while use coverage as true all decoder outputs the same word means there is only single word repeated in entire summary.
Any kind of help is very helpful......thanks in advance.

RuntimeError: CUDA error: out of memory

RuntimeError: CUDA error: out of memory
Running out of memory have 2gb graphics card 940mx with cuda 9.2 is it not sufficient to train the model.
I tried decreasing the batch size to even 1 but then too its not working.
Please help.

Rouge Error

I am getting following error while I am running

'Rouge155' object has no attribute 'convert_and_evaluate'

what's function of the eval.py when i check the train.py ,it does't call the eval.py , save the model directly?

I can't understander what means, can you help me?
Question:

what's function of the eval.py?
In train.py, it seems not check valid loss ,so what's means about eval.py?
what does mean about the
""
calc_running_avg_loss()
""
if I remove it , can I also get the correct answer.

Ask about loading data

Can I ask why you didn't use dataloader in pytorch for handling data (examples and batch) with num_worker > 0?

Why do you need to explicitly start threads in class Batcher to load the queue?

Thank you for your help,

Adam optimiser starts giving NaN after few epochs after turning on coverage.

Did anyone else face this issue or has a theory behind it?

Training on prefixes

Hi!

Thanks for the excellent code base. I had a question about this line. Since you have a for-loop here, does that mean you are training to generate each prefix of the output. Let me know if I misunderstood something. Thanks!

Vector encode input extend vocab

Hi, I think there is a problem in your code at creating the enc_input_extend_vocab for each Example.
Let's say we have 200,000 words in our vocabulary.
In your code, each word in the articles which does not appear in the vocabulary (oov), you will assign this word with new index, which is index = len(vocabulary) + number_of_new_word_in_this_article
But you doesn't add this word to the vocabulary. This is the problem. Because let's say in the first example, we have 1 article_oov word is 'mother', you will assign it by 200,001. In the second example, you have 'father' is the article_oov, you will assign 'father' by 200,001 too. If somehow, 2 above examples are in the same batch, then index of 'father' and 'mother' are the same??

Rouge scores mismatch

The work you have put in is quite appealing.

We have used the model provided here under the section "Train with pointer generation + coverage loss enabled " to decode.
The ROUGE scores we obtained slightly vary from that posted here.

Our ROUGE scores
ROUGE-1:
rouge_1_f_score: 0.3680 with confidence interval (0.3658, 0.3701)
rouge_1_recall: 0.4234 with confidence interval (0.4208, 0.4261)
rouge_1_precision: 0.3471 with confidence interval (0.3446, 0.3496)

ROUGE-2:
rouge_2_f_score: 0.1485 with confidence interval (0.1464, 0.1507)
rouge_2_recall: 0.1706 with confidence interval (0.1682, 0.1731)
rouge_2_precision: 0.1407 with confidence interval (0.1385, 0.1429)

ROUGE-l:
rouge_l_f_score: 0.3327 with confidence interval (0.3306, 0.3349)
rouge_l_recall: 0.3827 with confidence interval (0.3802, 0.3853)
rouge_l_precision: 0.3139 with confidence interval (0.3116, 0.3164)

To get the expected scores in the README what could be the config parameters?

c_0 should be is a zero vector

In Abigail See's paper, it says that c_0 is a zero vector, but based on the following line, c_0 in your code is equivalent to a_0 or maybe I'm missing something.

pointer_summarizer/training_ptr_gen/model.py

Line 122 in 454a2f6

coverage = coverage + attn_dist

RAM issue

The program crashed after 380k epochs because it ran out allocated RAM (16GB).
I don't see why this happened. Any pointers?

atulkum / pointer_summarizer Goto Github PK

pointer_summarizer's People

Contributors

Stargazers

Watchers

Forkers

pointer_summarizer's Issues

Recommend Projects

Recommend Topics

Recommend Org