Git Product home page Git Product logo

Comments (3)

alumae avatar alumae commented on August 19, 2024
  1. Currently, the server sends out intermediate hypotheses every 0.5 seconds (changeable using the traceback-period-in-secs decoder parameter). I think it's a good idea not to send out non-final hypothesis if it hasn't changed, I'll look at it.

  2. Yes, it breaks up speech based on silence, when the do-endpointing decoder parameter is set to True (see the sample_english_nnet2.yaml). There are many parameters that can be set to change how this endpointing is exactly done, check the endpoint* parameters of the decoder (I'm assuming you use the new DNN-based decoder).

  3. If you won't send EOS, decoder assumes that there is more audio coming and waits (until silence-timeout seconds pass, then the connection is closed by the server).

from kaldi-gstreamer-server.

mosherayman avatar mosherayman commented on August 19, 2024

thanks

i dont see the endpoint* parameters in the decoder, which file should I look in?

from kaldi-gstreamer-server.

alumae avatar alumae commented on August 19, 2024

If we use the Kaldi DNN-based decoder (https://github.com/alumae/gst-kaldi-nnet2-online), then
the properties specified in the configuration YAML file, nested under decoder, will be forwarded to the plugin. To see which properties are avialble, use gst-inspect-1.0 kaldinnet2onlinedecoder.

The properties that change the way endpointing is done are:

do-endpointing      : If true, apply endpoint detection, and split the audio at endpoints
endpoint-silence-phones: List of phones that are considered to be silence phones by the endpointing code.
endpoint-rule1-must-contain-nonsilence: If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback.
endpoint-rule1-min-trailing-silence: This endpointing rule requires duration of trailing silenceto be >= this value.
endpoint-rule1-max-relative-cost: This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is).
endpoint-rule1-min-utterance-length: This endpointing rule requires utterance-length (in seconds) to be >= this value.
endpoint-rule2-must-contain-nonsilence: If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback.
endpoint-rule2-min-trailing-silence: This endpointing rule requires duration of trailing silenceto be >= this value.
endpoint-rule2-max-relative-cost: This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is).
endpoint-rule2-min-utterance-length: This endpointing rule requires utterance-length (in seconds) to be >= this value.
endpoint-rule3-must-contain-nonsilence: If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback.
endpoint-rule3-min-trailing-silence: This endpointing rule requires duration of trailing silenceto be >= this value.
endpoint-rule3-max-relative-cost: This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is).
endpoint-rule3-min-utterance-length: This endpointing rule requires utterance-length (in seconds) to be >= this value.
endpoint-rule4-must-contain-nonsilence: If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback.
endpoint-rule4-min-trailing-silence: This endpointing rule requires duration of trailing silenceto be >= this value.
endpoint-rule4-max-relative-cost: This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is).
endpoint-rule4-min-utterance-length: This endpointing rule requires utterance-length (in seconds) to be >= this value.
endpoint-rule5-must-contain-nonsilence: If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback.
endpoint-rule5-min-trailing-silence: This endpointing rule requires duration of trailing silenceto be >= this value.
endpoint-rule5-max-relative-cost: This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is).
endpoint-rule5-min-utterance-length: This endpointing rule requires utterance-length (in seconds) to be >= this value.

You might have to dig into the Kaldi sources to understand what exactly different properties do. The defaults are however pretty good.

from kaldi-gstreamer-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.