Git Product home page Git Product logo

alibaba-mit-speech's Introduction

Alibaba-MIT-Speech

This is a PATCH file with the DFSMN related codes and example scripts for LibriSpeech task.

Apply Patch

The patch is built based on the Kaldi speech recognition toolkit with commit "04b1f7d6658bc035df93d53cb424edc127fab819".

You can apply this patch to your own kaldi branch by using the following commands: (Instead of applying the PATCH file, one can also directly clone the project at "https://github.com/tramphero/kaldi")

##Take a look at what changes are in the patch

git apply --stat Alibaba_MIT_Speech_DFSMN.patch

##Test the patch before you actually apply it

git apply --check Alibaba_MIT_Speech_DFSMN.patch

##If you don’t get any errors, the patch can be applied cleanly.

git am --signoff < Alibaba_MIT_Speech_DFSMN.patch

Run Example Scripts:

The training scripts and experimental results for the LibriSpeech task is available at kaldi/egs/librispeech/s5.

There are three DFSMN configurations with different model size: DFSMN_S, DFSMN_M, DFSMN_L.


#Training FSMN models on the cleaned-up data

#Three configurations of DFSMN with different model size: DFSMN_S, DFSMN_M, DFSMN_L

local/nnet/run_fsmn_ivector.sh DFSMN_S

local/nnet/run_fsmn_ivector.sh DFSMN_M

local/nnet/run_fsmn_ivector.sh DFSMN_L


The DFSMN_S is a small DFSMN with six DFSMN-components while DFSMN_L is a large DFSMN consist of 10 DFSMN-components.

For the 960-hours-setting, it takes about 2-3 days to train DFSMN_S only using one M40 GPU.

And the detailed experimental results are listed in the RESULTS file.

alibaba-mit-speech's People

Contributors

leiming99 avatar tramphero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alibaba-mit-speech's Issues

Error: in data_fbank/train_960_cleaned, recording-ids extracted from wav.scp and reco2dur file differ

I have run all the procedures in run.sh for several days and finally got 'train_960_cleaned' for training the deep fsmn. But when I start training deep fsmn by running 'local/nnet/run_fsmn.sh DFSMN_S', it gives error:

`steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to exp/nnet3_cleaned/ivectors_dev_other_hires using the extractor in exp/nnet3_cleaned/extractor.
steps/make_fbank.sh --nj 30 --cmd run.pl --fbank-config conf/fbank.conf data_fbank/train_960_cleaned exp/make_fbank/train_960_cleaned fbank/train_960_cleaned
steps/make_fbank.sh: moving data_fbank/train_960_cleaned/feats.scp to data_fbank/train_960_cleaned/.backup
utils/validate_data_dir.sh: Error: in data_fbank/train_960_cleaned, recording-ids extracted from wav.scp and reco2dur file
utils/validate_data_dir.sh: differ, partial diff is:
1,301545c1,281081
< 100-121669-0000-1
< 100-121669-0001-1
< 100-121669-0002-1
< 100-121669-0003-1
< 100-121669-0004-1
...

986-129388-0107
986-129388-0108
986-129388-0109
986-129388-0110
986-129388-0111
986-129388-0112
[Lengths are /tmp/kaldi.rudy/utts=301545 versus /tmp/kaldi.rudy/recordings.reco2dur=281081]`

It seems the number of records in file utts and file recordings.reco2dur is not the same, but validate_data_dir.sh expects them to be same. Does anyone know how to fix this? Any advice would be appreciated. Thanks!

where is fbank.cfg?

When to extract the fbank feature, the fbank.cfg is not in conf dirs, so how can I get it?

运行local/nnet/run_fsmn.sh DFSMN_L中的CE-training时出错

前台打印是这样的:
5777
gmm-info ./exp/tri6b_cleaned/final.mdl
5776
run.pl: job failed, log is in exp/tri7b_DFSMN_L/_train_nnet.log

log文件最后是这样的:

RUNNING THE NN-TRAINING SCHEDULER

steps/nnet/train_faster_scheduler.sh --train-tool nnet-train-fsmn-streams --train-tool-opts --minibatch-size=4096 --feature-transform exp/tri7b_DFSMN_L/final.feature_transform --learn-rate 0.00001 --momentum 0.9
--start_half_lr 5 exp/tri7b_DFSMN_L/nnet.init ark:copy-feats scp:exp/tri7b_DFSMN_L/train.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false
--utt2spk=ark:data_fbank/train_960_cleaned/utt2spk scp:data_fbank/train_960_cleaned/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | ark:copy-feats scp:exp/tri7b_DFSMN_L/cv.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | ark:ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_train_960_cleaned/ali..gz |" ark:- | ali-to-post ark:- ark:- | ark:ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali..gz |" ark:- | ali-to-post ark:- ark:- | exp/tri7b_DFSMN_L
CROSSVAL PRERUN AVG.LOSS 8.6614 (Xent),
ITERATION 01: TRAIN AVG.LOSS 1.2125, (lrate1e-05), CROSSVAL AVG.LOSS 0.7093, nnet accepted (nnet_iter01_learnrate0.00001_tr1.2125_cv0.7093)
ITERATION 02: steps/nnet/train_faster_scheduler.sh: line 104: 37799 Aborted $train_tool --cross-validate=false --randomize=true --verbose=$verbose $train_tool_opts --learn-rate=$learn_rate --momentum=$momentum --l1-penalty=$l1_penalty --l2-penalty=$l2_penalty ${feature_transform:+ --feature-transform=$feature_transform} ${frame_weights:+ "--frame-weights=$frame_weights"} ${utt_weights:+ "--utt-weights=$utt_weights"} "$feats_tr_portion" "$labels_tr" $mlp_best $mlp_next 2>> $log

Accounting: time=47244 threads=1

Ended (code 1) at Thu Dec 5 05:13:12 CST 2019, elapsed time 47244 seconds

请问有人遇到过这个问题么?应该怎么解决,谢谢

how to export the trained model

i have complete the training, but not sure how to check the model is produced, if i want to export the model, what files should be included.

thanks in advance.

git am --signoff < /data/glusterfs_speech_04/11085090/Alibaba-MIT-Speech/Alibaba_MIT_Speech_DFSMN.patch

When I run the command, I get the log:
Applying: add DFSMN related codes
/data/glusterfs_speech_04/11085090/kaldi/.git/rebase-apply/patch:300: trailing whitespace.

/data/glusterfs_speech_04/11085090/kaldi/.git/rebase-apply/patch:328: space before tab in indent.
steps/nnet/train_faster.sh --learn-rate $lrate --nnet-proto $proto
/data/glusterfs_speech_04/11085090/kaldi/.git/rebase-apply/patch:331: space before tab in indent.
--feat-type plain --splice 1
/data/glusterfs_speech_04/11085090/kaldi/.git/rebase-apply/patch:336: space before tab in indent.
$data_fbk/train_960_cleaned $data_fbk/dev_clean data/lang exp/tri6b_cleaned_ali_train_960_cleaned exp/tri6b_cleaned_ali_dev_clean $dir
/data/glusterfs_speech_04/11085090/kaldi/.git/rebase-apply/patch:343: space before tab in indent.
for set in $dataset
warning: squelched 115 whitespace errors
warning: 120 lines add whitespace errors.
There are many warnings, does it matter ?

nnet-train-fsmn-streams: command not found

hi, when running the run_fsmn_ivector.sh , the log/iter00.initial.log show "steps/nnet/train_faster_scheduler.sh: line 89: nnet-train-fsmn- streams: command not found" .how can I solve it?

what if i have no gpu, how long it will take to train this model in kaldi

This script is intended to be used with GPUs but you have not compiled Kaldi with CUDA
If you want to use GPUs (and have them), go to src/, and configure and make on a machine
where "nvcc" is installed.

i see the warning, maybe it will not block the trainning, but could i know how to shorten the training period if there is no gpu. i think my machine is well configured, it has 256G memory and 26 processor, but after two weeks training, it only complet half of the run.sh script. anybody could provide help?

run.sh error

run.sh error:
...
local/chain/run_tdnn.sh
local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment)
utils/data/perturb_data_dir_speed_3way.sh: data/train_960_cleaned_sp/feats.scp already exists: refusing to run this (please delete data/train_960_cleaned_sp/feats.scp if you want this to run)

After I deleted this file, this error happen again.
Could anybody help us?

Update patch error

Error when updating the patch error: Failed to patch: src/cudamatrix/cu-matrix.h:693
Error: src/cudamatrix/cu-matrix.h: Patch not applied
What are the reasons for this?thanks

when dfsmn support muti GPU

now DFSMN is use single GPU ,when ali privode muti gpu version
I use BMUF method for muti GPU ,but wer is not better than single GPU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.