seanwood / gcc-nmf Goto Github PK
View Code? Open in Web Editor NEWReal-time GCC-NMF Blind Speech Separation and Enhancement
License: MIT License
Real-time GCC-NMF Blind Speech Separation and Enhancement
License: MIT License
Mentioned in discussion of #2.
hi
Did u measure the CPU usage for this application and also memory
I am trying to make the real-time gcc-nmf work, I have the correct data paths, etc, the script created the pretrained files. The following error appears after starting the script after:
C:\gccNMF>python demo5.py
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29
WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29
Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29
WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29
Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 509, in prepare
'parents_main', file, path_name, etc
File "C:\gccNMF\demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
Traceback (most recent call last):
File "demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 492, in save_string
self.write(BINSTRING + pack("<i", n) + obj)
IOError: [Errno 22] Invalid argument
what is wrong? thanks!
Hi,
is there a reason why the synthesis window is not applied?
See also the attached sketch based on the lowLatencySpeechEnhancement.ipynb
example.
fix = False
fix = True
from matplotlib.pyplot import *
from numpy import *
from numpy.fft import rfft, irfft
# Apply the fix
fix = False
# Preprocessing params
fftSize = 1024
# Asymmetric windowing params
analysisWindowSize = fftSize
synthesisWindowSize = 128
asymmetricHopSize = synthesisWindowSize // 4 if fix else (synthesisWindowSize * 3) // 4
m = synthesisWindowSize // 2
k = analysisWindowSize
d = 0
# Symmetric windowing params
symmetricWindowSize = fftSize
symmetricHopSize = asymmetricHopSize # to better compare results
# Generate test signal
stereoSamples = ones((1, fftSize*10))
numChannels, numSamples = stereoSamples.shape
def getAsymmetricAnalysisWindow(k, m, d):
risingSqrtHann = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )
window = zeros(k)
window[:d] = 0
window[d:k-m] = risingSqrtHann[:k-m-d]
window[k-m:] = fallingSqrtHann[-m:]
return window
def getAsymmetricSynthesisWindow(k, m, d):
risingSqrtHannAnalysis = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
risingNoramlizedHann = hanning(2*m+1)[:m] / risingSqrtHannAnalysis[k-2*m-d:k-m-d]
fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )
window = zeros(k)
window[:-2*m] = 0
window[-2*m:-m] = risingNoramlizedHann
window[-m:] = fallingSqrtHann[-m:]
return window
def performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, hopSize):
# Setup variables to save speech enhancement results
numFrequencies = len(rfft(zeros(len(analysisWindow))))
numFrames = (numSamples-len(synthesisWindow)) // hopSize
if fix:
gainFactor = hopSize / sum(analysisWindow * synthesisWindow)
else:
gainFactor = hopSize / float(len(synthesisWindow)) * 2
targetEstimateSamplesOLA = zeros_like(stereoSamples)
inputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')
outputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')
for frameIndex in range(numFrames):
# compute FFT
frameStart = frameIndex * hopSize
frameEnd = frameStart + analysisWindowSize
stereoSTFTFrame = rfft( stereoSamples[:, frameStart:frameEnd] * analysisWindow )
inputSpectrogram[..., frameIndex] = stereoSTFTFrame
outputSpectrogram[..., frameIndex] = stereoSTFTFrame
# reconstruct time domain samples
recStereoSTFTFrame = irfft(stereoSTFTFrame)
if fix:
# apply synthesis window as well
recStereoSTFTFrame *= synthesisWindow
# overlap-add to output samples
targetEstimateSamplesOLA[:, frameStart:frameEnd] += recStereoSTFTFrame
targetEstimateSamplesOLA *= gainFactor
return inputSpectrogram, outputSpectrogram, targetEstimateSamplesOLA
analysisWindow = getAsymmetricAnalysisWindow(k, m, d)
synthesisWindow = getAsymmetricSynthesisWindow(k, m, d)
symmetricWindow = sqrt(hanning(symmetricWindowSize))
symmetricResults = performOnlineSpeechEnhancement(symmetricWindow, symmetricWindow, symmetricHopSize)
asymmetricResults = performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, asymmetricHopSize)
title('fixed' if fix else 'orig')
plot(symmetricResults[-1][-1], label='symmetric', color='b', alpha=0.5)
plot(asymmetricResults[-1][-1], label='asymmetric', color='r', alpha=0.5)
legend()
show()
I am interested in making trainSet.npy for 'onlineSpeechEnhancement'
What is the way to make trainSet for prelearning Dictionary?
How can i make trainset with other wav files?
Thank you.
”Due to differences in TDOA estimation for the 1m and 5cm microphone separation settings, including increased spatial aliasing and lower spatial resolution in the 5cm case,we also compared results averaged over these two settings separately. While we found somewhat decreased scores and increased variance in the 5cm case, the results were generally comparable.“
In the task of offline voice separation, do you need to make changes in the code for 5cm and 1m voices? Why I ran your code and the 1m data can be separated correctly, but all 5cm voices cannot get the correct results. Only one or two peak points can be obtained (the truth is that there should be three sources).
My English is not very good, these English are from Google Translate, please forgive me
I am running python 2.7 64-bit with latest scipy and numpy.
The first demo works with the multiple speakers, but for the speech enhancement task, I get error:
argMaxGCCNMF = argmax(gccNMF, axis=1)
NameError: name 'argmax' is not defined
how to fix this? Thank you!
In the processFrames method of GCCNMFProcessor class, the targetTDOAIndex is set after the calling of getTFMask method, which means that the mask is got using the target TDOA of last 6 frames, instead of the latest 6 frames including the current frame.
My English is not very good, please forgive me.
hi
I try to run the file "runGccNMF.py", it shows "No module named 'gccNMF'".
I'm not sure how to fix this problem, if you can, please tell me the method to solve this problem.
best
Hi,
I'm new to audio enhancement, and there seems a lot to learn, any Audio sample or pre trained model for fast evaluation?
Thanks
Hi seanwood @seanwood , I'm a junior in BSS and thank you for your useful and effective open-source GCC_NMF. I happened to came across an unknown parameter when applying istft and reconstructing waveform:
def getTargetSignalEstimates(targetSpectrogramEstimates, windowSize, hopSize, windowFunction,numSamples):
numTargets, numChannels, numFreq, numTime = targetSpectrogramEstimates.shape
stftGainFactor = hopSize / float(windowSize) * 2
......
return array(targetSignalEstimates) * stftGainFactor
Of course the outcome is good, but I really don't know why multiplying 2*hop_size/n_fft. Could you please give an explanation? Thank you.
Hello,
I have a question to fix my problem. When I run 'runRealtimeGCCNMF.py' and hit the play button, a variable 'realGCC' in 'gccNMFProcessor.py' at 'processFrames' has only NaN value for default input.
'dev_Sq1_Co_A_mix.wav'
How can I fix it?
Thank you
Add ability to run system at full speed, safe to file, no audio payback
First problem is that the low latency algorithm works in terms of working, but the output for both symmetric and asymmetric is silence, no sound? I copied the exact code from the python books.
Second problem is with the low latency and online speech enhancement algorithms, compared to the first two algorithms, that output the correct WAV format, these 2 last ones output all good, except that the bit depth is doubled for some reason? So instead of input signed 16-bit WAV, I get float 32bit WAV output? how to fix this?
thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.