aishoot / lstm_pit_speech_separation Goto Github PK
View Code? Open in Web Editor NEWTwo-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.
Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.
Hello, You have shown 3 different methods for separation (No. 3, 4, 5). But which of these is responsible for the result (7-separated_result_LSTM)?
Hi,
I was going through your repository. I could not find results of LSTM and BLSTM on the 2 speaker .wav (audio files) generated by you. Can you please add them?
Also, have you tried this algorithm with multiple speaker with added noise? If yes, can you share the results?
Can you tell me if the io_funcs.kaldi_io package in the python file run_lstm.py is a written function? If yes, where is this function in the zip file?
Hi, when i run the run.sh file ,there is an error that no module called io_funcs, what's the reason?
All dependencies had been prepared.
Hello,I‘m interested on speech separation and pay attention to your related work, Is it convenient to add your we-chat?
hi I have created dataset and extracted features, when execute run_lstm.py it shows this error
Traceback (most recent call last):
File "run_lstm.py", line 454, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_lstm.py", line 295, in main
train()
File "run_lstm.py", line 166, in train
tr_tfrecords_lst, tr_num_batches = read_list_file("tr_tf", FLAGS.batch_size)
File "run_lstm.py", line 40, in read_list_file
utt_id = line.strip().split()[0]
IndexError: list index out of range
how can I resolve this error?
Can you please explain procedure or different steps to preprocess data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because in difference of no of frames is large.
Hi
can you explain what should be the actual structure of the dataset? not sure i got it..
is it like in the matlab file you attached? i.e. cv/tr/tt folder, each folder contain the mixed/s1/s2 folder?
second, what file should i run? where is the run.sh file located, is this file including the tfrecord process, so it get a wav files and perform the processing inside? (sorry if it's elementary questions..)
thanks
Have you tried it in keras? It is a little difficult for me to understand the codes in tensorflow.
(Sorry to bother you)
I am a newbie. Since I have no wsj0 datasets, I have to create mixtures on my own datasets.
In the mix_2_spk_tr/cv/tt.txt, each wav is assigned with a weight, just like:
wsj0/si_tr_s/01t/01to030v.wav 0.76421 wsj0/si_tr_s/20g/20ga010m.wav -0.76421
wsj0/si_tr_s/40e/40ec020o.wav 1.3218 wsj0/si_tr_s/20c/20ca010n.wav -1.3218
....
I have two questions about these files:
Sorry to bother you again.Is you run.sh similar to that of snsun?
python -u local/gen_tfreords.py --gender_list local/wsj0-train-spkrinfo.txt data/wav/wav8k/min/$x/ lists/${x}_wav.lst data/tfrecords/${x}_psm/ &
This is the codes in snsun's run.sh. But I donnot where the _wav.lst file come from?Could you please tell me about it or send your run.sh to me? Thank you very much. My email: [email protected]
老铁,调用pesq函数的时候,出现了这样的问题:
“该 P 代码文件 E:\Matlab R2016a Workspace\composite\pesq.p 是在 MATLAB 7.5 版(R2007b)之前生成的,且不再受支持。请在 MATLAB R2007b 或更高版本中使用 pcode 重新生成该文件。”
但那个链接里的作者又没有提供源码,你是怎么解决的呀?能把相关的代码发我一份吗?([email protected])
Can you please explain procedure or different steps to pre-process data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because difference of no of frames is much large.
hello my friend I have two important questions
finally I could run your amazing code... as far as I know for doing that we need 4 kind of lists
1)dataset lists(mix_2_spk_tr and etc)
2) gender lists
3) wav lists
4) tfrecords lists
for small scale and just for run the code I generated these lists by hand and the man_wav_list.py script but here is my two big problems :
1- how can I produce above lists specially dataset lists by script? do you have any script to do that?
the mixing code automatically convert the wav files to target SNRs or before it we have to do that to make that list? for example we have this in first line of mix_2_spk_tr :
/home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/40na010x.wav 1.9857 /home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/01xo031a.wav -1.9857
this script create_wav_2speakers.m automatically produce wavs with these SNRs (1.9857 and -1.9857 ) and then mix them for making the SNR or before that we have to produce such kind of wavs then run that script for making dataset?
请问io_funcs、local、model是您自己写的包吗?如果是,应该如何下载?
我注意到压缩包里已经有了上述包所含的几个文件,但我并未找到kaldi_io文件
How can I use the VCTK-dataset to train the model? Should I alter the structure of VCTK-dataset downloading from the origin webpage? Thanks for your reply.
I've trained the model using VCTK-dataset and decoded it.How to use the trained model to separate a mixed audio file ?. Do we need to write a new code or tweak existing code? Can you provide pointers for the same. TIA.
Hello, big brother, can this model be used for Chinese segmentation? Do we need to change the data set?thanks!
There are two ways to create the mixtures, version 1 and version 2. Are there any differences? which version should I follow?
你好。我对uPIT有一个盲点,一直搞不清楚,想向你请教一下。uPIT是针对整条语句的,那它是把一条语音作为一个样本送进网络进行训练的吗? 还是我的理解是错的呢?谢谢你哈。
hi i have wv version of wsj0, how can i change this format to the wav without any information loss in the files? you know wv is a specific compressed format and i we have to change it to the wav purly, do you have any suggestion?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.