yongxuustc / dnn-for-speech-enhancement Goto Github PK
View Code? Open in Web Editor NEWDNN-for-speech-enhancement
DNN-for-speech-enhancement
Dear Dr. Xu:
Can you share the source code of "Wav2LogSpec.exe“? It would help a lot for us to know how you transfer the .raw file to .lsp file. And if possible, can you also share the source code of "LogSpec2raw_16bit_withoutXF.exe" in the decoding process?
Any help from you would be greatly appreciated!
Hi Yong,
if I try to execute .exe have this: The code execution cannot proceed because mclmcrt710.dll was not found.
OS: Win 10 Pro
What if I want to train and test on 44.1kHz wav files, how do I modify the code to do that? It works for 16kHz, but I am interested in higher sampling rates for my applications. Thanks!
can not download zip2
when i run step1_DNNenh_for16kHz.exe ,it report error, my os is WINDOWS7.
Hi Dr.Xu,
Is it convenient for you to share the pretrain model with me?
Hi, Dr. Xu, I meet a problem when I used the training demo. Would you please give me some help?
According to the "README.md", I used the Quicknet tool to generate pfile, including "fea.pfile", "targ.pfile" and "fea.norm". And I only used two pair of wavs to test like this:
feacalc sa1.wav sa2.wav -output targ.pfile
feacalc sa1_car_snr1.wav sa2_car_snr1.wav -output fea.pfile
pfile_norm -i fea.pfile -o fea.norm
And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the message "pfile tail is Not correct." .
Is it wrong the way I prepared the pfiles? How can I prepare my pfiles? and I don't quite understand tips in "how_to_get_pfile.txt" and there are two files ".len" and ".scp" should be prepared first. What the exactly format of pfiles? Would you please give a small example?
I am very grateful.
Hi Dr.Xu,
I am writing your code in python and using random initializing for layers. I also read your papers. I could find the number of epochs 50 and learning rate 0.1 for the 10 first epochs,then decreased by 10% after each epoch .
Hi Dr.Xu,
How do i generate the Corresponding my_own_noisy_speech.lsp file with my own noisy speech data in testing?
Hi Dr. Xu,
Considering the HTK file format, in think the code in "le2be_for_all_files_func.m" should be modified like this
function []=le2be_for_all_files_func(infile, outfile)
% infile='clean_FBI_22123A.08';
% outfile='clean_FBI_22123A.08_be';
fn = fopen(infile, 'r','ieee-le');
fid = fopen(outfile,'wb','ieee-be');
Y = fread(fn, 2, 'long');
fwrite(fid,Y,'long');
Y = fread(fn, 2, 'short');
fwrite(fid,Y,'short');
Y = fread(fn, inf, 'float');
fwrite(fid,Y,'float');
fclose(fn); %%%关闭当前文件句柄,否则最后会提示打开了太多文件
fclose(fid);
end
Following the steps you provide, i can get the .pfile now.
Hi Dr. Xu,
Your paper mentioned the input and target file should be normalized to zero mean and unit variance.
Now i use the qnnorm command to prepare the ".fea_norm" file needed by .pl file. Am i right?
But, in the Interface.cc file, i found you use the following sentence to normalize the input file.
dataori[2+j +i*(para->fea_dim +2)] -= mean[j];
dataori[2+j +i*(para->fea_dim +2)] = dVar[j];
using following sentence to normalize the target file:
targori[2+j +i(para->layersizes[numlayers-1] +2)] -= 0;
targori[2+j +i*(para->layersizes[numlayers-1] +2)] *= 1;
It seems the target file does not be normalized with the above two sentence.
Could you tell me how to normalize the target file?
I checked the makefile for this project, and found that cuda is a dependency. I don't have a GPU, but only CPU. So is there a more general version of this project?
Dear Dr. Xu:
I am using your model to train with my data. However, it seems that the framesBeforeSent[] does not read the correct number. In my understanding, the number in framesBeforeSent[] should be the number of frames before each sentence and should correspond with the sum of the number in lens file. However, I get big numbers like 1101260349 in framesBeforeSent[] although I only have 6379 frames. And because of that, the program ran into an endless loop in :
while(cur_chunk_frames >= para->traincache ){
next_st = cur_frame_id -(cur_chunk_frames - para->traincache);
if(next_st < total_frames){
chunk_frame_st[count_chunk] = next_st;
count_chunk++;
cur_chunk_frames = (cur_frame_id - next_st > para->fea_context -1)?(cur_frame_id - next_st - para->fea_context +1):0;
}
The reason that I feel frameBeforeSent was misread is that it cause the error "tails in target pfile and data pfile is not consistent". When I use my noisy.pfile and clean.pfile to train, this error pop up. However, when I check my noisy.pfile and clean.pfile, the tails are the same.
I tried to fix this bug by changing the calculation of offset in read_tail(fp_data, offset, total_sents, framesBeforeSent) but failed since I am not quite familiar with the data structure of pfile. So, can you help fix this bug?
Any help from you would be greatly appreciated!!
'perl' is not recognized as an internal or external command,
operable program or batch file.
Error using fread
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in readhtk_new (line 12)
nframes = fread(fid,1,'int32');
Error in step1_DNNenh_for16kHz (line 61)
[htkdata,nframes,sampPeriod,sampSize,paramKind]=readhtk_new(tline,'le');
I cleared the wav_lsp folder and kept the timit_noisy_SNR5.wav file to test and then get that error above. Please help. Thanks
Hi Dr.Xu,
In the step1_DNNenh_for 16kHz.m file, i found you use relu function to do decoding, however, i don't found relu function in the training code, if i want to use relu function as active function, should i add it myself?
Hi Dr Xu,
Now i use to wave files "clean1.wav" and "clean2.wav" plus a pink noise file "pink.wav" to synthesize two traing files "tran1.wave" and "train2.wav".Then use the steps you provided to generate the “train.pfile” and the "clean.pfile"; use pfile_norm -i train.pfile -o train.norm to gernerate .norm file .
And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the error message "tails in target pfile and data pfile is not consistent." .
Could you just quickly point me to the direction how to fix this error.
your paper mentioned that "The type of the hidden units is sigmoid, and the output unit is linear." So i think i should change the following code in the BP_GPU.cu file
else{
DevSigmoid(streams[0],cur_layer_size, cur_layer_x, cur_layer_y);
}
as
else{
DevLinearOutCopy(streams[0],n_frames, cur_layer_units, cur_layer_x, cur_layer_y);
cudaMemcpy(dev[0].out,cur_layer_y,n_framescur_layer_unitssizeof(float),cudaMemcpyDeviceToDevice);
}
Am i right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.