aishoot / lstm_pit_speech_separation Goto Github PK

Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.

MATLAB 12.24% Python 14.64% Jupyter Notebook 72.13% Shell 0.99%

speech-separation audio-separation robust-speech-recognition permutation-invariant-training multi-speaker speech-enhancement

lstm_pit_speech_separation's People

Contributors

Stargazers

Watchers

Forkers

jhuiac 7senseslabs asrlytics wxb506 lihao0214 audiobucket rares14324 fdeng1983 nd1511 jonathanbranam joeblack22 aascode lvaleriu charansg jasdasdf ycangus2415 910882575 jasonaidm lym0302 mingyang1996 zhangwen464 xiaohanghang whmrtm alongwithyou hongpeng1992 ml2457 patrickying twistedmove omkar3388 zqy1 maisyzhang xinkez byfaith zhangtian0728 skevince antiquebill divyeshrajpura4114 premanand-dl maggie0830 wushichao fn246 normonisping simpleishappy nagasaibharath fncode246 titospadini ltcxjtu zcy618 dendisuhubdy zane678 xmpx speechdnn zhangguanqun mindcs2018 ilywangxf wangtao201919 kuonanhong userzhongjieli priya312 siang-chen xiongmaoxia 5l1v3r1 caozhengquan zhangxinaaaa road2018 spxnn yale-img wonderwrj tuyenbk dingguijin darlingwu ajinkyakulkarni14 kkang daniellin94144 zhangdamenggit peyman1229 aghaee1367 tomarskt pzab xuanphu108 wangpengxu2020 devin178 wgfi110 linshuijin123 npcmaci fragrantrookie ajitkumar15 kaedelingyu mfkiwl

lstm_pit_speech_separation's Issues

Where is the separation result coming from?

Hello, You have shown 3 different methods for separation (No. 3, 4, 5). But which of these is responsible for the result (7-separated_result_LSTM)?

Could not find results for multi-speaker audio files

Hi,
I was going through your repository. I could not find results of LSTM and BLSTM on the 2 speaker .wav (audio files) generated by you. Can you please add them?

Also, have you tried this algorithm with multiple speaker with added noise? If yes, can you share the results?

run_lstm.sh

Can you tell me if the io_funcs.kaldi_io package in the python file run_lstm.py is a written function? If yes, where is this function in the zip file?

io_funcs

Hi, when i run the run.sh file ,there is an error that no module called io_funcs, what's the reason?
All dependencies had been prepared.

more effective contact

Hello，I‘m interested on speech separation and pay attention to your related work, Is it convenient to add your we-chat?

list index out if range

hi I have created dataset and extracted features, when execute run_lstm.py it shows this error

Traceback (most recent call last):
File "run_lstm.py", line 454, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_lstm.py", line 295, in main
train()
File "run_lstm.py", line 166, in train
tr_tfrecords_lst, tr_num_batches = read_list_file("tr_tf", FLAGS.batch_size)
File "run_lstm.py", line 40, in read_list_file
utt_id = line.strip().split()[0]
IndexError: list index out of range

how can I resolve this error?

Preprocessing of Dataset to feed into LSTM

Can you please explain procedure or different steps to preprocess data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because in difference of no of frames is large.

Dataset structure

Hi
can you explain what should be the actual structure of the dataset? not sure i got it..
is it like in the matlab file you attached? i.e. cv/tr/tt folder, each folder contain the mixed/s1/s2 folder?
second, what file should i run? where is the run.sh file located, is this file including the tfrecord process, so it get a wav files and perform the processing inside? (sorry if it's elementary questions..)
thanks

Keras

Have you tried it in keras? It is a little difficult for me to understand the codes in tensorflow.
(Sorry to bother you)

How to set the weight of each wav in the mix_2_spk_tr.txt, mix_2_spk_cv.txt...

I am a newbie. Since I have no wsj0 datasets, I have to create mixtures on my own datasets.
In the mix_2_spk_tr/cv/tt.txt, each wav is assigned with a weight, just like:
wsj0/si_tr_s/01t/01to030v.wav 0.76421 wsj0/si_tr_s/20g/20ga010m.wav -0.76421
wsj0/si_tr_s/40e/40ec020o.wav 1.3218 wsj0/si_tr_s/20c/20ca010n.wav -1.3218
....
I have two questions about these files:

Are these files randomly selected?
Are these weight randomly set?
Thanks!

run.sh

Sorry to bother you again.Is you run.sh similar to that of snsun?
python -u local/gen_tfreords.py --gender_list local/wsj0-train-spkrinfo.txt data/wav/wav8k/min/$x/ lists/${x}_wav.lst data/tfrecords/${x}_psm/ &
This is the codes in snsun's run.sh. But I donnot where the _wav.lst file come from?Could you please tell me about it or send your run.sh to me? Thank you very much. My email: [email protected]

PESQ Problem

老铁，调用pesq函数的时候，出现了这样的问题：
“该 P 代码文件 E:\Matlab R2016a Workspace\composite\pesq.p 是在 MATLAB 7.5 版(R2007b)之前生成的，且不再受支持。请在 MATLAB R2007b 或更高版本中使用 pcode 重新生成该文件。”

但那个链接里的作者又没有提供源码，你是怎么解决的呀？能把相关的代码发我一份吗？（[email protected]）

Preprocessing of Dataset to feed into LSTM

Can you please explain procedure or different steps to pre-process data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because difference of no of frames is much large.

dataset

hello my friend I have two important questions
finally I could run your amazing code... as far as I know for doing that we need 4 kind of lists

1)dataset lists(mix_2_spk_tr and etc)
2) gender lists
3) wav lists
4) tfrecords lists
for small scale and just for run the code I generated these lists by hand and the man_wav_list.py script but here is my two big problems :

1- how can I produce above lists specially dataset lists by script? do you have any script to do that?

for example in mix_2_spk_tr we have pretty much line of mixing different wav files in different SNRs to generate mixing's train dataset, my question is :

the mixing code automatically convert the wav files to target SNRs or before it we have to do that to make that list? for example we have this in first line of mix_2_spk_tr :
/home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/40na010x.wav 1.9857 /home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/01xo031a.wav -1.9857

this script create_wav_2speakers.m automatically produce wavs with these SNRs (1.9857 and -1.9857 ) and then mix them for making the SNR or before that we have to produce such kind of wavs then run that script for making dataset?