In the original paper of Sma

Agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have a similar feature request as this.(<a class="issue-link js-issue-link" data-err

How to develop beam search with a set of predefined responses? about seq2seq HOT 8 CLOSED

google commented on April 28, 2024

How to develop beam search with a set of predefined responses?

from seq2seq.

Comments (8)

dennybritz commented on April 28, 2024 1

But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.

I think the vocabulary issue has kind of been "solved" by using subword units / word pieces, for example.

You typically only need 16k-32k word pieces to cover almost all of the vocabulary.

As for the original question. It's an interesting feature, but it doesn't seem very common to me as it needs a set of predefined responses, which you have for very few tasks. It seem very specific to response retrieval (not generation). I think a large refactoring of the beam search may be necessary to support this.

from seq2seq.

chenghuige commented on April 28, 2024

Agree with @amirj this is interesting feature, and we might add more features so as to improve decoding speed.
I think trie based beam search will not improve speed much for general purpose, it is used mostly for selecting responses from a candidate set rather then generate sequence each step choosing one word from full vocabulary, so this can ensure response quality, the performance improve is compared with scoring every candidates then sort to select the most high probability responses.
So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step？）, out graph approach like im2txt did might be simpler.
For improving decoding, beam search speed, vocabulary size is the most key thing.
since something like below is costly for large vocabulary, especially xw_plus_b
logits = tf.nn.xw_plus_b(output, self.w, self.v)
logprobs = tf.nn.log_softmax(logits)
You have to calc for every word in vocabulary it's output, so to get the final probability.
When training tensorflow has provided sampled_softmax_loss to improve performance a lot.
But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.
May be for trie based method you can only consider small vocabulary for each step but doing that you can not get exact probability at each step.
Another approach is to use self-normalization to avoid cost calculations for each word in vocab, but you need to change the training cost function.
http://sebastianruder.com/word-embeddings-softmax/index.html#selfnormalisation

from seq2seq.

amirj commented on April 28, 2024

@dennybritz Thank you.

from seq2seq.

amirj commented on April 28, 2024

So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step？）, out graph approach like im2txt did might be simpler.

@chenghuige would you please elaborate more on how to developing it out graph?
im2txt leveraged out graph approach?

from seq2seq.

chenghuige commented on April 28, 2024

@amirj im2txt\inference_utils\caption_generator.py, here im2txt does out graph beam search, each step by sess.run(),
softmax, new_states, metadata = self.model.inference_step(sess,
input_feed,
state_feed)
but I think transfering softmax(big data) each time might make inference really slow.. any way you can fully control the process do beam search with trie.

from seq2seq.

nikita-solvvy commented on April 28, 2024

I have a similar feature request as this.(tensorflow/tensorflow#11602) At first glance it might very specific. But I think its very useful to determine scores for various output sequences.

from seq2seq.

whiskyboy commented on April 28, 2024

Have you solved this problem? I met the same problem and have no idea how to put a trie data structure into TensorFlow.

from seq2seq.

yuanzhigang10 commented on April 28, 2024

Have you solved this problem? I also met the same problem and have no idea how to put a trie data structure into TensorFlow.

from seq2seq.

How to develop beam search with a set of predefined responses? about seq2seq HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent