mycrazycracy / tf-kaldi-speaker Goto Github PK

Neural speaker recognition/verification system based on Kaldi and Tensorflow

License: Apache License 2.0

Python 68.78% Shell 27.68% MATLAB 3.48% sed 0.06%

speaker-verification speaker-recognition speech-processing machine-learning kaldi kaldi-asr tensorflow neural-network speaker-identification

tf-kaldi-speaker's People

Contributors

Stargazers

Watchers

Forkers

twistedmove vabezruchko hello-web itaipee hangxiu tianlongkong silvadirceu haoyunlf podz-inc forwiat wonderwrj zycv pandagst wantt ts0923 zhihaodu

tf-kaldi-speaker's Issues

Does the newest version support multi-GPU training?

Hi ,
Thanks for your great job. Does the newest version support multi-GPU training? I find the multi-gpu setup in the file of "run_train_nnet.sh".

Why you use PLDA for AMSoftmax?

Why you use PLDA for AMSoftmax/ArcSoftmax/ASoftmax, did you try using simple cosine similarity?

How to use angular softmax loss

Hi,

Can you please give a brief idea about how to configure angular softmax as a loss function in the script?

Training steps I am not able to understand

Error when commenting out "--rir-set-parameters"

It's strange that in the run.sh, we comment out the following lines

#  # Make a version with reverberated speech
#  rvb_opts=()
#  rvb_opts+=(--rir-set-parameters "0.5, RIRS_NOISES/simulated_rirs/smallroom/rir_list")                                         
#  rvb_opts+=(--rir-set-parameters "0.5, RIRS_NOISES/simulated_rirs/mediumroom/rir_list")

however, --rir-set-parameters is a required parameter for steps/data/reverberate_data_dir.py, thus commenting out these lines will cause error.
Can I know why we comment out them, and whether in your experiments you include the reverberation augmentation training data? Since I am having problem on reproducing your results, thus I want to make sure our training data is same. Thanks!

About Enrollment and Testing

I didn't find the process of enrolment and testing. How can I distinguish between these two parts? I want to separate the enrollment utterances from the testing utterances. What should I do?

what's the last add netvlad/ghostvlad result？

as commit show that the exp add netvlad / ghostvlad, what's the result of these pooling strategy with tdnn?

is there some files missing?

Hi,

thanks for the great work.

When I run the sre/v1 egs, I got the error in stage 2-8 such as :

python: can't open file 'python steps/data/augment_data_dir_new.py': [Errno 2] No such file or directory

python: can't open file 'utils/sample_validset_spk2utt.py': [Errno 2] No such file or directory

nnet/run_train_nnet.sh: line 63: /tf_gpu/bin/activate: No such file or directory

I have checked the such file in the originally path kaldi/egs/wsj/s5/. but none of these files exist.

Is that mean we need write the missing files to achieve the related function?

Thanks

Cheers

Text-dependent?

Thank you for sharing this great project.
What kind of additional features need to be added to support text-dependent speaker verification?
Thanks.

About chunk size

When I extracted embeddings (stage=8), I encountered a problem. When the length larger than the chunk size, it will be fall into a stop. In order to continue the extracting, I have to set the chunk size bigger to avoid segmentation. So is this a bug? And how can I deal with it?

About "GE2E loss"

I saw code for GE2E loss in the code which have already been commented. Do you guys try GE2E loss in the experiment ? If yes, what is the performance of GE2E loss ?

Inference script

Hi Dr.Liu:
Thank you very much for your sharing, I have seen your eer result(eer=0.02) is state of the art, but i have a few question for you. (1) I don't see the predict code, i just want to try the inference ; (2)How many days did you train on the voxceleb dataset?
Looking forward to your reply. Thank you!!

About SRE models

How could i request the pretrained SRE models ?

Error while running run.sh in the egs/voxceleb/v1

I am running run.sh, up to stage7 it worked well, in stage 7 I got below error

File "nnet/lib/train.py", line 8, in
from misc.utils import ValidLoss, load_lr, load_valid_loss, save_codes_and_config, compute_cos_pairwise_eer
ModuleNotFoundError: No module named 'misc'

Extracting embeddings error: ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0'

Yi Liu, Hello.

Thank you very much for your solution!

I trained with dataset voxcelev1&2 and xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_tdnn4_att. Everything works as expected. Training, extracting embeddings, eval works well.

But when i had tried to use pre-trained yours models on the same dataset for extracting embeddings (stage=8) i have got error:

ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0', which has shape '(?, ?, 30)'

Environment:
tensorflow-gpu==1.12
cuda==9.0.0
net = xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2

How to fix error? Thanks in advance!

Full log:

# nnet/wrap/extract_wrapper.sh --gpuid -1 --env tf_cpu --min-chunk-size 25 --chunk-size 10000 --normalize false --node tdnn6_dense /home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2 "ark:apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:- | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:- |" "ark:| copy-vector ark:- ark,scp:/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.ark,/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.scp"
# Started at Tue Aug  4 13:11:43 MSK 2020
#
INFO:tensorflow:Extract embedding from tdnn6_dense
2020-08-04 13:11:46.819647: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-04 13:11:48.224681: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-08-04 13:11:48.224811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: softs-server-07
2020-08-04 13:11:48.224871: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: softs-server-07
2020-08-04 13:11:48.225023: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 440.64.0
2020-08-04 13:11:48.225144: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 440.64.0
2020-08-04 13:11:48.225172: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 440.64.0
INFO:tensorflow:Extract embedding from node tdnn6_dense
WARNING:tensorflow:From /home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/model/pooling.py:23: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
copy-vector ark:- ark,scp:/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.ark,/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.scp
apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:-
select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-
INFO:tensorflow:[INFO] Key id00012-21Uxsk56VDQ-00001 length 859.
INFO:tensorflow:Reading checkpoints...
INFO:tensorflow:Restoring parameters from /home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/nnet/model-2610000
INFO:tensorflow:Succeed to load checkpoint model-2610000
Traceback (most recent call last):
  File "nnet/lib/extract.py", line 90, in <module>
    embedding = trainer.predict(feature)
  File "/home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/model/trainer.py", line 724, in predict
ERROR (select-voiced-frames[5.5.762~1-0062]:Write():kaldi-matrix.cc:1404) Failed to write matrix to stream

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-matrix.so(kaldi::MatrixBase<float>::Write(std::ostream&, bool) const+0x1a7) [0x7fed41a174ad]
select-voiced-frames(kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::Write(std::string const&, kaldi::MatrixBase<float> const&)+0x1d6) [0x40ef40]
select-voiced-frames(main+0x580) [0x40cf50]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

    embeddings = self.sess.run(self.embeddings, feed_dict={self.pred_features: features})
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
WARNING (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-holder-inl.h:57) Exception caught writing Table object. kaldi::KaldiFatalError
WARNING (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1057) Write failure to standard output
ERROR (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1515) Error in TableWriter::Write

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
select-voiced-frames(main+0x5d3) [0x40cfa3]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

    run_metadata_ptr)
WARNING (select-voiced-frames[5.5.762~1-0062]:Close():util/kaldi-table-inl.h:1089) Error closing stream: wspecifier is ark:-
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
ERROR (select-voiced-frames[5.5.762~1-0062]:~TableWriter():util/kaldi-table-inl.h:1539) Error closing TableWriter [in destructor].

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
select-voiced-frames(kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::~TableWriter()+0x59) [0x412893]
select-voiced-frames(main+0x82b) [0x40d1fb]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0', which has shape '(?, ?, 30)'
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:Write():kaldi-matrix.cc:1404) Failed to write matrix to stream

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-matrix.so(kaldi::MatrixBase<float>::Write(std::ostream&, bool) const+0x1a7) [0x7f618d1594ad]
apply-cmvn-sliding(kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::Write(std::string const&, kaldi::MatrixBase<float> const&)+0x29e) [0x40ad70]
apply-cmvn-sliding(main+0x335) [0x4091c2]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-holder-inl.h:57) Exception caught writing Table object. kaldi::KaldiFatalError
WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1057) Write failure to standard output
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1515) Error in TableWriter::Write

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
apply-cmvn-sliding(main+0x388) [0x409215]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Close():util/kaldi-table-inl.h:1089) Error closing stream: wspecifier is ark:-
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:~TableWriter():util/kaldi-table-inl.h:1539) Error closing TableWriter [in destructor].

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
apply-cmvn-sliding(kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::~TableWriter()+0x59) [0x412971]
apply-cmvn-sliding(main+0x5e0) [0x40946d]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
/bin/sh: line 1:  6753 Aborted                 apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:-
      6754                       | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/dataset/kaldi_io.py", line 387, in cleanup
    raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
SubprocessFailed: cmd apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:- | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-  returned 134 !

Exception KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
# Accounting: time=873 threads=1
# Ended (code 1) at Tue Aug  4 13:26:16 MSK 2020, elapsed time 873 seconds

Correct number of steps

Hello,

thanks for the great work, it is really useful!

I have a question about how to set the number of steps per epochs for KaldiDataRandomQueue.

As far as I know, an epoch means training the neural network with all the training data for one cycle. In an epoch, we use all of the data at least once. There are many steps in one epoch, and in one step, batch_size examples are processed.

But I don't see in the code for KaldiDataRandomQueue how you make sure to use all training data at least one for one epoch. So I'm having troubles to set the number of steps.

Please, Can you explain to me how I can make sure that the whole training set is seen and how to set the number of steps?

Thank you in advance.

About the GhostVLAD pooling

Thanks for realsing this useful implement, it greatly helps my work.
However, I notice that there are GhostVLAD pooling experiments in RESULTS.md file, but I did not find relevant function in pooling.py file. I currently need to do some tests on this popular pooling method.
Would you like to provide GhostVLAD pooling function? Truely appreciate your help.

jumpahead is eliminated in python3

the jumpahead function of random module is eliminated in python3, I removed it in the code, and found that the eer got degradation. I just want to ask is this function useful in model training? I think the os.urandom can already gives us good randomness

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.