Git Product home page Git Product logo

Comments (8)

dennybritz avatar dennybritz commented on April 28, 2024 1

But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.

I think the vocabulary issue has kind of been "solved" by using subword units / word pieces, for example.

You typically only need 16k-32k word pieces to cover almost all of the vocabulary.

As for the original question. It's an interesting feature, but it doesn't seem very common to me as it needs a set of predefined responses, which you have for very few tasks. It seem very specific to response retrieval (not generation). I think a large refactoring of the beam search may be necessary to support this.

from seq2seq.

chenghuige avatar chenghuige commented on April 28, 2024

Agree with @amirj this is interesting feature, and we might add more features so as to improve decoding speed.
I think trie based beam search will not improve speed much for general purpose, it is used mostly for selecting responses from a candidate set rather then generate sequence each step choosing one word from full vocabulary, so this can ensure response quality, the performance improve is compared with scoring every candidates then sort to select the most high probability responses.
So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step?), out graph approach like im2txt did might be simpler.
For improving decoding, beam search speed, vocabulary size is the most key thing.
since something like below is costly for large vocabulary, especially xw_plus_b
logits = tf.nn.xw_plus_b(output, self.w, self.v)
logprobs = tf.nn.log_softmax(logits)
You have to calc for every word in vocabulary it's output, so to get the final probability.
When training tensorflow has provided sampled_softmax_loss to improve performance a lot.
But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.
May be for trie based method you can only consider small vocabulary for each step but doing that you can not get exact probability at each step.
Another approach is to use self-normalization to avoid cost calculations for each word in vocab, but you need to change the training cost function.
http://sebastianruder.com/word-embeddings-softmax/index.html#selfnormalisation

from seq2seq.

amirj avatar amirj commented on April 28, 2024

@dennybritz Thank you.

from seq2seq.

amirj avatar amirj commented on April 28, 2024

So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step?), out graph approach like im2txt did might be simpler.

@chenghuige would you please elaborate more on how to developing it out graph?
im2txt leveraged out graph approach?

from seq2seq.

chenghuige avatar chenghuige commented on April 28, 2024

@amirj im2txt\inference_utils\caption_generator.py, here im2txt does out graph beam search, each step by sess.run(),
softmax, new_states, metadata = self.model.inference_step(sess,
input_feed,
state_feed)
but I think transfering softmax(big data) each time might make inference really slow.. any way you can fully control the process do beam search with trie.

from seq2seq.

nikita-solvvy avatar nikita-solvvy commented on April 28, 2024

I have a similar feature request as this.(tensorflow/tensorflow#11602) At first glance it might very specific. But I think its very useful to determine scores for various output sequences.

from seq2seq.

whiskyboy avatar whiskyboy commented on April 28, 2024

Have you solved this problem? I met the same problem and have no idea how to put a trie data structure into TensorFlow.

from seq2seq.

yuanzhigang10 avatar yuanzhigang10 commented on April 28, 2024

Have you solved this problem? I also met the same problem and have no idea how to put a trie data structure into TensorFlow.

from seq2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.