Git Product home page Git Product logo

lstm_pit_speech_separation's People

Contributors

aishoot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lstm_pit_speech_separation's Issues

Could not find results for multi-speaker audio files

Hi,
I was going through your repository. I could not find results of LSTM and BLSTM on the 2 speaker .wav (audio files) generated by you. Can you please add them?

Also, have you tried this algorithm with multiple speaker with added noise? If yes, can you share the results?

run_lstm.sh

Can you tell me if the io_funcs.kaldi_io package in the python file run_lstm.py is a written function? If yes, where is this function in the zip file?

io_funcs

Hi, when i run the run.sh file ,there is an error that no module called io_funcs, what's the reason?
All dependencies had been prepared.

more effective contact

Hello,I‘m interested on speech separation and pay attention to your related work, Is it convenient to add your we-chat?

list index out if range

hi I have created dataset and extracted features, when execute run_lstm.py it shows this error

Traceback (most recent call last):
File "run_lstm.py", line 454, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_lstm.py", line 295, in main
train()
File "run_lstm.py", line 166, in train
tr_tfrecords_lst, tr_num_batches = read_list_file("tr_tf", FLAGS.batch_size)
File "run_lstm.py", line 40, in read_list_file
utt_id = line.strip().split()[0]
IndexError: list index out of range

how can I resolve this error?

Preprocessing of Dataset to feed into LSTM

Can you please explain procedure or different steps to preprocess data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because in difference of no of frames is large.

Dataset structure

Hi
can you explain what should be the actual structure of the dataset? not sure i got it..
is it like in the matlab file you attached? i.e. cv/tr/tt folder, each folder contain the mixed/s1/s2 folder?
second, what file should i run? where is the run.sh file located, is this file including the tfrecord process, so it get a wav files and perform the processing inside? (sorry if it's elementary questions..)
thanks

Keras

Have you tried it in keras? It is a little difficult for me to understand the codes in tensorflow.
(Sorry to bother you)

How to set the weight of each wav in the mix_2_spk_tr.txt, mix_2_spk_cv.txt...

I am a newbie. Since I have no wsj0 datasets, I have to create mixtures on my own datasets.
In the mix_2_spk_tr/cv/tt.txt, each wav is assigned with a weight, just like:
wsj0/si_tr_s/01t/01to030v.wav 0.76421 wsj0/si_tr_s/20g/20ga010m.wav -0.76421
wsj0/si_tr_s/40e/40ec020o.wav 1.3218 wsj0/si_tr_s/20c/20ca010n.wav -1.3218
....
I have two questions about these files:

  1. Are these files randomly selected?
  2. Are these weight randomly set?
    Thanks!

run.sh

Sorry to bother you again.Is you run.sh similar to that of snsun?
python -u local/gen_tfreords.py --gender_list local/wsj0-train-spkrinfo.txt data/wav/wav8k/min/$x/ lists/${x}_wav.lst data/tfrecords/${x}_psm/ &
This is the codes in snsun's run.sh. But I donnot where the _wav.lst file come from?Could you please tell me about it or send your run.sh to me? Thank you very much. My email: [email protected]

PESQ Problem

老铁,调用pesq函数的时候,出现了这样的问题:
“该 P 代码文件 E:\Matlab R2016a Workspace\composite\pesq.p 是在 MATLAB 7.5 版(R2007b)之前生成的,且不再受支持。请在 MATLAB R2007b 或更高版本中使用 pcode 重新生成该文件。”

但那个链接里的作者又没有提供源码,你是怎么解决的呀?能把相关的代码发我一份吗?([email protected]

Preprocessing of Dataset to feed into LSTM

Can you please explain procedure or different steps to pre-process data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep
Attractor Network"
, but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because difference of no of frames is much large.

dataset

hello my friend I have two important questions
finally I could run your amazing code... as far as I know for doing that we need 4 kind of lists

1)dataset lists(mix_2_spk_tr and etc)
2) gender lists
3) wav lists
4) tfrecords lists
for small scale and just for run the code I generated these lists by hand and the man_wav_list.py script but here is my two big problems :

1- how can I produce above lists specially dataset lists by script? do you have any script to do that?

  1. for example in mix_2_spk_tr we have pretty much line of mixing different wav files in different SNRs to generate mixing's train dataset, my question is :

the mixing code automatically convert the wav files to target SNRs or before it we have to do that to make that list? for example we have this in first line of mix_2_spk_tr :
/home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/40na010x.wav 1.9857 /home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/01xo031a.wav -1.9857

this script create_wav_2speakers.m automatically produce wavs with these SNRs (1.9857 and -1.9857 ) and then mix them for making the SNR or before that we have to produce such kind of wavs then run that script for making dataset?

python package

请问io_funcs、local、model是您自己写的包吗?如果是,应该如何下载?
我注意到压缩包里已经有了上述包所含的几个文件,但我并未找到kaldi_io文件

Using VCTK-dataset

How can I use the VCTK-dataset to train the model? Should I alter the structure of VCTK-dataset downloading from the origin webpage? Thanks for your reply.

Using the trained model

I've trained the model using VCTK-dataset and decoded it.How to use the trained model to separate a mixed audio file ?. Do we need to write a new code or tweak existing code? Can you provide pointers for the same. TIA.

Model suitability

Hello, big brother, can this model be used for Chinese segmentation? Do we need to change the data set?thanks!

data path set

hello, I have problems when I try to train the model, could you tell me the exact meanings of the following variables:
image
and what do they mean? thanks! I am looking forward to your reply!
Q19X41QSU6PYFPGSX(7PKCR

uPIT

你好。我对uPIT有一个盲点,一直搞不清楚,想向你请教一下。uPIT是针对整条语句的,那它是把一条语音作为一个样本送进网络进行训练的吗? 还是我的理解是错的呢?谢谢你哈。

wv version of wsj0

hi i have wv version of wsj0, how can i change this format to the wav without any information loss in the files? you know wv is a specific compressed format and i we have to change it to the wav purly, do you have any suggestion?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.