Comments (7)
Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3
under the decoder conf in the YAML file.
from kaldi-gstreamer-server.
Yes I am using chain file but frame-subsampling-factor
option is in place. Attached is my yaml file.
use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#Additional functionality that you can play with:
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu
use-vad: False
silence-timeout: 60
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'
logging:
version : 1
disable_existing_loggers: False
formatters:
simpleFormater:
format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
handlers:
console:
class: logging.StreamHandler
formatter: simpleFormater
level: DEBUG
root:
level: DEBUG
handlers: [console]
And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.
from kaldi-gstreamer-server.
I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency
from kaldi-gstreamer-server.
Can you give some numbers -- the actual difference in decoding time that you are seeing?
I assume you understand that -r 32000
option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.
from kaldi-gstreamer-server.
Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343
Yes, I understand the byte rate and I experimented with -r 256000
as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster
that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.
from kaldi-gstreamer-server.
Try changing to traceback-period-in-secs: 0.25
.
from kaldi-gstreamer-server.
Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000
and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.
from kaldi-gstreamer-server.
Related Issues (20)
- python kaldigstserver/client.py -r 32000 test/data/english_test.raw gives only THE. as output
- single word audio file gives multiple results, how to choose the correct result? HOT 1
- The pretrained Chinese model can not process audio file with 48khz sample rate HOT 1
- Error switching between Audio types
- Error when running python kaldigstserver/worker.py on sample chinese HOT 1
- How to get phone alignment and word alignment information
- decoder with CSJ -> worker: segmentation fault (core dumped) HOT 2
- Enable Multiple channels listening HOT 1
- setting up the server for http api call HOT 2
- INTEL MKL ERROR: /opt/intel/mkl/lib/intel64/libmkl_avx2.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8. Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. HOT 2
- server can not get EOS HOT 1
- when I run it in doocker and use the chinese model ,it have this question :2021-04-16 05:52:17 - INFO: __main__: 7404beee-0d39-4d67-963c-01c58da10193: Waiting for EOS from decoder 2021-04-16 05:52:18 - INFO: __main__: 7404beee-0d39-4d67-963c-01c58da10193: Waiting for EOS from decoder HOT 2
- How to run multiple models in a single machine
- How can I save the incoming audio stream to wav file ?
- Invalid parameters supplied to OnlineLdaInput
- cannot download tedlium_nnet_ms_sp_online.tgz HOT 3
- worker process killed when worker replications reach to 3
- Poor performance with nnet3 TDNN-F model
- Any updates of year 2023???
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaldi-gstreamer-server.