seanwood / gcc-nmf Goto Github PK

Real-time GCC-NMF Blind Speech Separation and Enhancement

License: MIT License

Python 100.00%

speech-separation speech-enhancement gcc-nmf nmf real-time real-time-processing speech speech-processing cross-correlation generalized-cross-correlation

gcc-nmf's People

Contributors

Stargazers

Watchers

Forkers

lihao0214 jmjnz windstudent 18307612949 fy378968174 uncledickhe twistedmove jordi-adell maggie0830 zhaoforever runngezhang nieshaoshuai james-lh snsun chenxinglili aishinchi huotuichang1 miftanurfarid parseb ltcxjtu cxywzx necotis quantumgame yunzqq wxb506 suwoncjh xingdonw xinkez audiobucket doctorboshi kirillrnd nd1511 aheba mobil787 lym0302 simonbiggs saccadic alongwithyou chenxiaoxi12 lesliekuo del18687058912 xdcesc templeblock andreacarlesimo agangzz jyt1234 whu933314 xushoucai byfaith dung-n-tran timewaitsnoone ishine wangyang2014 joelibaceta yingmuying mingmchen fandyanf bubing zhuleiustc whiteweak sucrerouge rpersie ronggan asipresearch xuhaoi speechdnn psyxusheng zcy618 dendisuhubdy zane678 spxnn maxmax2016 rmithyx orangebaowang userzhongjieli haibit chenhuansky diaodiaolzq meadow163 senpin judyzhou95 xiongmaoxia xianruiwang zhangwen464 yinliu-91 5l1v3r1 jackli95 feizi normonisping kurhula road2018 okrio chowho tuyenbk xf739645524 lewistrong dingguijin wjliu0215 moplast zijuzhang

gcc-nmf's Issues

Offline speech enhancement notebook very slow

Mentioned in discussion of #2.

cpu and memory usage

hi
Did u measure the CPU usage for this application and also memory

real-time gcc-nmf error?

I am trying to make the real-time gcc-nmf work, I have the correct data paths, etc, the script created the pretrained files. The following error appears after starting the script after:

C:\gccNMF>python demo5.py
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 509, in prepare
'parents_main', file, path_name, etc
File "C:\gccNMF\demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.

        This probably means that you are on Windows and you have
        forgotten to use the proper idiom in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce a Windows executable.

Traceback (most recent call last):
File "demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 492, in save_string
self.write(BINSTRING + pack("<i", n) + obj)
IOError: [Errno 22] Invalid argument

what is wrong? thanks!

Synthesis window in lowLatencySpeechEnhancement.ipynb

Hi,

is there a reason why the synthesis window is not applied?

See also the attached sketch based on the lowLatencySpeechEnhancement.ipynb example.

fix = False

your version
clearly visible modulation in the OLA output
output gain != 1

fix = True

adequate hop size
unity gain
synthesis window after irfft

from matplotlib.pyplot import *
from numpy import *
from numpy.fft import rfft, irfft

# Apply the fix
fix = False

# Preprocessing params
fftSize = 1024

# Asymmetric windowing params
analysisWindowSize = fftSize
synthesisWindowSize = 128

asymmetricHopSize = synthesisWindowSize // 4 if fix else (synthesisWindowSize * 3) // 4
m = synthesisWindowSize // 2
k = analysisWindowSize
d = 0

# Symmetric windowing params
symmetricWindowSize = fftSize
symmetricHopSize = asymmetricHopSize # to better compare results

# Generate test signal
stereoSamples = ones((1, fftSize*10))
numChannels, numSamples = stereoSamples.shape

def getAsymmetricAnalysisWindow(k, m, d):
    risingSqrtHann = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:d] = 0
    window[d:k-m] = risingSqrtHann[:k-m-d]
    window[k-m:] = fallingSqrtHann[-m:]

    return window

def getAsymmetricSynthesisWindow(k, m, d):
    risingSqrtHannAnalysis = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    risingNoramlizedHann = hanning(2*m+1)[:m] / risingSqrtHannAnalysis[k-2*m-d:k-m-d]
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:-2*m] = 0
    window[-2*m:-m] = risingNoramlizedHann
    window[-m:] = fallingSqrtHann[-m:]

    return window

def performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, hopSize):
    # Setup variables to save speech enhancement results
    numFrequencies = len(rfft(zeros(len(analysisWindow))))
    numFrames = (numSamples-len(synthesisWindow)) // hopSize

    if fix:
        gainFactor = hopSize / sum(analysisWindow * synthesisWindow)
    else:
        gainFactor = hopSize / float(len(synthesisWindow)) * 2

    targetEstimateSamplesOLA = zeros_like(stereoSamples)
    inputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')
    outputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')

    for frameIndex in range(numFrames):
        # compute FFT
        frameStart = frameIndex * hopSize
        frameEnd = frameStart + analysisWindowSize
        stereoSTFTFrame = rfft( stereoSamples[:, frameStart:frameEnd] * analysisWindow )
        inputSpectrogram[..., frameIndex] = stereoSTFTFrame
        outputSpectrogram[..., frameIndex] = stereoSTFTFrame

        # reconstruct time domain samples
        recStereoSTFTFrame = irfft(stereoSTFTFrame)

        if fix:
            # apply synthesis window as well
            recStereoSTFTFrame *= synthesisWindow

        # overlap-add to output samples
        targetEstimateSamplesOLA[:, frameStart:frameEnd] += recStereoSTFTFrame

    targetEstimateSamplesOLA *= gainFactor

    return inputSpectrogram, outputSpectrogram, targetEstimateSamplesOLA

analysisWindow = getAsymmetricAnalysisWindow(k, m, d)
synthesisWindow = getAsymmetricSynthesisWindow(k, m, d)

symmetricWindow = sqrt(hanning(symmetricWindowSize))

symmetricResults = performOnlineSpeechEnhancement(symmetricWindow, symmetricWindow, symmetricHopSize)
asymmetricResults = performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, asymmetricHopSize)

title('fixed' if fix else 'orig')
plot(symmetricResults[-1][-1], label='symmetric', color='b', alpha=0.5)
plot(asymmetricResults[-1][-1], label='asymmetric', color='r', alpha=0.5)
legend()
show()

Preprocessing for chimeTrainSet.npy

I am interested in making trainSet.npy for 'onlineSpeechEnhancement'

What is the way to make trainSet for prelearning Dictionary?

How can i make trainset with other wav files?

Thank you.

about offline speech separation

”Due to differences in TDOA estimation for the 1m and 5cm microphone separation settings, including increased spatial aliasing and lower spatial resolution in the 5cm case,we also compared results averaged over these two settings separately. While we found somewhat decreased scores and increased variance in the 5cm case, the results were generally comparable.“
In the task of offline voice separation, do you need to make changes in the code for 5cm and 1m voices? Why I ran your code and the 1m data can be separated correctly, but all 5cm voices cannot get the correct results. Only one or two peak points can be obtained (the truth is that there should be three sources).
My English is not very good, these English are from Google Translate, please forgive me

Add save functionality to Real-time GCC-NMF demo.

argmax error

I am running python 2.7 64-bit with latest scipy and numpy.

The first demo works with the multiple speakers, but for the speech enhancement task, I get error:
argMaxGCCNMF = argmax(gccNMF, axis=1)
NameError: name 'argmax' is not defined

how to fix this? Thank you!

Is there a logical error?

In the processFrames method of GCCNMFProcessor class, the targetTDOAIndex is set after the calling of getTFMask method, which means that the mask is got using the target TDOA of last 6 frames, instead of the latest 6 frames including the current frame.
My English is not very good, please forgive me.

gccNMF model

hi
I try to run the file "runGccNMF.py", it shows "No module named 'gccNMF'".
I'm not sure how to fix this problem, if you can, please tell me the method to solve this problem.

best

Any Audio demo？

Hi,
I'm new to audio enhancement， and there seems a lot to learn， any Audio sample or pre trained model for fast evaluation?
Thanks

Why gain factor when reconstructing the signal?

Hi seanwood @seanwood , I'm a junior in BSS and thank you for your useful and effective open-source GCC_NMF. I happened to came across an unknown parameter when applying istft and reconstructing waveform:

def getTargetSignalEstimates(targetSpectrogramEstimates, windowSize, hopSize, windowFunction,numSamples):
    numTargets, numChannels, numFreq, numTime = targetSpectrogramEstimates.shape
    stftGainFactor = hopSize / float(windowSize) * 2
......
    return array(targetSignalEstimates) * stftGainFactor

Of course the outcome is good, but I really don't know why multiplying 2*hop_size/n_fft. Could you please give an explanation? Thank you.

Add ability to set SNR dynamically

runRealtimeGCCNMF

Hello,

I have a question to fix my problem. When I run 'runRealtimeGCCNMF.py' and hit the play button, a variable 'realGCC' in 'gccNMFProcessor.py' at 'processFrames' has only NaN value for default input.
'dev_Sq1_Co_A_mix.wav'

How can I fix it?

Thank you

Command line interface: add full speed mode

Add ability to run system at full speed, safe to file, no audio payback

two small problems?

First problem is that the low latency algorithm works in terms of working, but the output for both symmetric and asymmetric is silence, no sound? I copied the exact code from the python books.

Second problem is with the low latency and online speech enhancement algorithms, compared to the first two algorithms, that output the correct WAV format, these 2 last ones output all good, except that the bit depth is doubled for some reason? So instead of input signed 16-bit WAV, I get float 32bit WAV output? how to fix this?

thanks!