khaotik / danet-tensorflow Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of "Speaker-independent Speech Separation with Deep Attractor Network"
License: MIT License
Tensorflow implementation of "Speaker-independent Speech Separation with Deep Attractor Network"
License: MIT License
We are using this implementation for training a model using our custom dataset with the default network configuration. But it seems that the training loss is constant even after 1600 epochs.
Suggested that provide a download link of TIMIT and WSJ0 datasets,thank you!
Hello, I am disturbing you. I used my timit dataset to find that the trained model could not separate mixed speech. May I venture to ask, do you have a trained model for reference? Thank you very much!
Thank you for your nicely implemented DaNet!
However, I ran into a couple of questions when testing your code. Would you please kindly help me figure them out?
After installing the TIMIT dataset, I ran the timit_1.sh script, but the result using the demo drown from the test set seemed not very good. The model I used is anchor and bilstm-orig. So I guess the timit_1.sh is not meant to be used in such settings?
When you read raw files from timit using scipy.io.wavfile, the format is 16-bit PCM., If you cast the data type into float type and do some processing and then write back the wav, the scipy.io.wave will see that as a 32-bit floating-point type and most of the data will blow the file up(bigger than one). It seems that there will also be a mismatch between training and testing using other format of wavefiles because of the format problems (wavefile.read also has such problems). Not sure if the problem is affected by scipy versions, I've tested them both on scipy 0.9 and 1.0.
As far as I can see, you implementation looks different from the original paper in the following ways
Your input data to the encoder are of variant time steps, which depend on the length of the raw signals. The original paper use trunks of frames of length 100, much shorter than the typical input lengths in your implementation, that might help LSTM to remember things better.
Your data generator may mix up the signals from the same speaker, that might potentially undermine the network to separate the signal based on the tones of the speakers.
The embedding encoder of the original paper has a tanh activation function before spitting out the embedding vectors and your implementation is a linear activation function.
Your implementation really helps me a lot. Looking forward to your reply!!!
Thanks very much for your code! That really help my a lot. But when I use my dataset, I find that my batch loss cannot decrease. And the demo cannot separate even a little.
Hope you can reply me! Thanks much for your time!
I tried with TIMIT and with my own dataset and it works fine (some modifications to do for TIMIT on OSX). At the moment there is no convincing result after a hundred epoch. What is the value of SNR validation to get to have good results?
Hello, it's very nice to see such good programs. I am the beginner and I am trying to do the same experiments in this paper, too. But when I run your programs there are some problems. I want to use timit datasets, firstly, but when I run as the 'readme' said, the timit datasets does not install at all. I don't know what operations are wrong. Could you please help me? If it's possible could you please give an e-mail address to me? Thank you so much~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.