Hi, I just experimented online decoding with online2-tcp-nnet3-d

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster about kaldi-gstreamer-server HOT 7 CLOSED

Umar17 commented on June 23, 2024

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster

from kaldi-gstreamer-server.

Comments (7)

alumae commented on June 23, 2024

Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3 under the decoder conf in the YAML file.

from kaldi-gstreamer-server.

Umar17 commented on June 23, 2024

Yes I am using chain file but frame-subsampling-factor option is in place. Attached is my yaml file.

use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#Additional functionality that you can play with:
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu

use-vad: False
silence-timeout: 60

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

logging:
version : 1
disable_existing_loggers: False
formatters:
simpleFormater:
format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
handlers:
console:
class: logging.StreamHandler
formatter: simpleFormater
level: DEBUG
root:
level: DEBUG
handlers: [console]

And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.

from kaldi-gstreamer-server.

Umar17 commented on June 23, 2024

I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency

from kaldi-gstreamer-server.

alumae commented on June 23, 2024

Can you give some numbers -- the actual difference in decoding time that you are seeing?

I assume you understand that -r 32000 option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.

from kaldi-gstreamer-server.

Umar17 commented on June 23, 2024

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343

Yes, I understand the byte rate and I experimented with -r 256000 as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.

from kaldi-gstreamer-server.

alumae commented on June 23, 2024

Try changing to traceback-period-in-secs: 0.25.

from kaldi-gstreamer-server.

Umar17 commented on June 23, 2024

Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000 and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.

from kaldi-gstreamer-server.

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster about kaldi-gstreamer-server HOT 7 CLOSED

Comments (7)

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (7)

Numbers (in milliseconds) Audio length: 4923 Latency (with -r 32000): 5801 Latency (with -r 256000): 2965 Latency (online2-tcp-nnet3-decode-faster): 1343

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343