Git Product home page Git Product logo

Comments (7)

alumae avatar alumae commented on June 23, 2024

Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3 under the decoder conf in the YAML file.

from kaldi-gstreamer-server.

Umar17 avatar Umar17 commented on June 23, 2024

Yes I am using chain file but frame-subsampling-factor option is in place. Attached is my yaml file.


use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#Additional functionality that you can play with:
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu

use-vad: False
silence-timeout: 60

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

logging:
version : 1
disable_existing_loggers: False
formatters:
simpleFormater:
format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
handlers:
console:
class: logging.StreamHandler
formatter: simpleFormater
level: DEBUG
root:
level: DEBUG
handlers: [console]


And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.

from kaldi-gstreamer-server.

Umar17 avatar Umar17 commented on June 23, 2024

I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency

from kaldi-gstreamer-server.

alumae avatar alumae commented on June 23, 2024

Can you give some numbers -- the actual difference in decoding time that you are seeing?

I assume you understand that -r 32000 option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.

from kaldi-gstreamer-server.

Umar17 avatar Umar17 commented on June 23, 2024

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343

Yes, I understand the byte rate and I experimented with -r 256000 as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.

from kaldi-gstreamer-server.

alumae avatar alumae commented on June 23, 2024

Try changing to traceback-period-in-secs: 0.25.

from kaldi-gstreamer-server.

Umar17 avatar Umar17 commented on June 23, 2024

Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000 and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.

from kaldi-gstreamer-server.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.