Git Product home page Git Product logo

gcc-nmf's People

Contributors

seanwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcc-nmf's Issues

real-time gcc-nmf error?

I am trying to make the real-time gcc-nmf work, I have the correct data paths, etc, the script created the pretrained files. The following error appears after starting the script after:

C:\gccNMF>python demo5.py
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
INFO:root:GCCNMFConfig: loading configuration params...
INFO:root:TDOA
INFO:root: targetTDOAEpsilon: 5.0
INFO:root: targetTDOANoiseFloor: 0.0
INFO:root: numSpectrogramHistory: 128
INFO:root: microphoneSeparationInMetres: 0.1
INFO:root: numTDOAs: 64
INFO:root: numTDOAHistory: 128
INFO:root: targetTDOABeta: 2.0
INFO:root: gccPHATNLAlpha: 2.0
INFO:root: gccPHATNLEnabled: False
INFO:root:NMF
INFO:root: dictionarySize: 64
INFO:root: dictionaryType: Pretrained
INFO:root: numHUpdates: 0
INFO:root: dictionarySizes: [64, 128, 256, 512, 1024]
INFO:root:Audio
INFO:root: deviceIndex: None
INFO:root: sampleRate: 44100
INFO:root: numChannels: 2
INFO:root:STFT
INFO:root: blockSize: 512
INFO:root: windowSize: 1024
INFO:root: hopSize: 512
INFO:root:GCCNMFPretraining: Loading pretrained W (size 64): ./pretrainedW\W_64.
npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 128): ./pretrainedW\W_12
8.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 256): ./pretrainedW\W_25
6.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 512): ./pretrainedW\W_51
2.npy
INFO:root:GCCNMFPretraining: Loading pretrained W (size 1024): ./pretrainedW\W_1
024.npy
INFO:root:RealtimeGCCNMF: Starting with audio path: ./test.wav
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be remove
d in the next release (v0.10). Please switch to the gpuarray backend. You can g
et more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

WARNING:theano.sandbox.cuda:The cuda backend is deprecated and will be removed i
n the next release (v0.10). Please switch to the gpuarray backend. You can get
more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpua
rray%29

Using gpu device 0: GeForce GTX 860M (CNMeM is enabled with initial size: 70.0%
of memory, cuDNN 5005)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 509, in prepare
'parents_main', file, path_name, etc
File "C:\gccNMF\demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 258, in init
cmd = get_command_line() + [rhandle]
File "C:\Python27\lib\multiprocessing\forking.py", line 358, in get_command_li
ne
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.

        This probably means that you are on Windows and you have
        forgotten to use the proper idiom in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce a Windows executable.

Traceback (most recent call last):
File "demo5.py", line 5, in
RealtimeGCCNMF()
File "C:\gccNMF\runRealtimeGCCNMF.py", line 50, in init
self.initProcesses(params)
File "C:\gccNMF\runRealtimeGCCNMF.py", line 91, in initProcesses
self.audioProcess.start()
File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Python27\lib\multiprocessing\forking.py", line 277, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 655, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 425, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 568, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 492, in save_string
self.write(BINSTRING + pack("<i", n) + obj)
IOError: [Errno 22] Invalid argument

what is wrong? thanks!

Synthesis window in lowLatencySpeechEnhancement.ipynb

Hi,

is there a reason why the synthesis window is not applied?

See also the attached sketch based on the lowLatencySpeechEnhancement.ipynb example.

fix = False

  • your version
  • clearly visible modulation in the OLA output
  • output gain != 1

fix = True

  • adequate hop size
  • unity gain
  • synthesis window after irfft

orig

fixed

from matplotlib.pyplot import *
from numpy import *
from numpy.fft import rfft, irfft

# Apply the fix
fix = False

# Preprocessing params
fftSize = 1024

# Asymmetric windowing params
analysisWindowSize = fftSize
synthesisWindowSize = 128

asymmetricHopSize = synthesisWindowSize // 4 if fix else (synthesisWindowSize * 3) // 4
m = synthesisWindowSize // 2
k = analysisWindowSize
d = 0

# Symmetric windowing params
symmetricWindowSize = fftSize
symmetricHopSize = asymmetricHopSize # to better compare results

# Generate test signal
stereoSamples = ones((1, fftSize*10))
numChannels, numSamples = stereoSamples.shape

def getAsymmetricAnalysisWindow(k, m, d):
    risingSqrtHann = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:d] = 0
    window[d:k-m] = risingSqrtHann[:k-m-d]
    window[k-m:] = fallingSqrtHann[-m:]

    return window

def getAsymmetricSynthesisWindow(k, m, d):
    risingSqrtHannAnalysis = sqrt( hanning(2*(k-m-d)+1)[:2*(k-m-d)] )
    risingNoramlizedHann = hanning(2*m+1)[:m] / risingSqrtHannAnalysis[k-2*m-d:k-m-d]
    fallingSqrtHann = sqrt( hanning(2*m+1)[:2*m] )

    window = zeros(k)
    window[:-2*m] = 0
    window[-2*m:-m] = risingNoramlizedHann
    window[-m:] = fallingSqrtHann[-m:]

    return window

def performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, hopSize):
    # Setup variables to save speech enhancement results
    numFrequencies = len(rfft(zeros(len(analysisWindow))))
    numFrames = (numSamples-len(synthesisWindow)) // hopSize

    if fix:
        gainFactor = hopSize / sum(analysisWindow * synthesisWindow)
    else:
        gainFactor = hopSize / float(len(synthesisWindow)) * 2

    targetEstimateSamplesOLA = zeros_like(stereoSamples)
    inputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')
    outputSpectrogram = zeros( (2, numFrequencies, numFrames), 'complex64')

    for frameIndex in range(numFrames):
        # compute FFT
        frameStart = frameIndex * hopSize
        frameEnd = frameStart + analysisWindowSize
        stereoSTFTFrame = rfft( stereoSamples[:, frameStart:frameEnd] * analysisWindow )
        inputSpectrogram[..., frameIndex] = stereoSTFTFrame
        outputSpectrogram[..., frameIndex] = stereoSTFTFrame

        # reconstruct time domain samples
        recStereoSTFTFrame = irfft(stereoSTFTFrame)

        if fix:
            # apply synthesis window as well
            recStereoSTFTFrame *= synthesisWindow

        # overlap-add to output samples
        targetEstimateSamplesOLA[:, frameStart:frameEnd] += recStereoSTFTFrame

    targetEstimateSamplesOLA *= gainFactor

    return inputSpectrogram, outputSpectrogram, targetEstimateSamplesOLA

analysisWindow = getAsymmetricAnalysisWindow(k, m, d)
synthesisWindow = getAsymmetricSynthesisWindow(k, m, d)

symmetricWindow = sqrt(hanning(symmetricWindowSize))

symmetricResults = performOnlineSpeechEnhancement(symmetricWindow, symmetricWindow, symmetricHopSize)
asymmetricResults = performOnlineSpeechEnhancement(analysisWindow, synthesisWindow, asymmetricHopSize)

title('fixed' if fix else 'orig')
plot(symmetricResults[-1][-1], label='symmetric', color='b', alpha=0.5)
plot(asymmetricResults[-1][-1], label='asymmetric', color='r', alpha=0.5)
legend()
show()

Preprocessing for chimeTrainSet.npy

I am interested in making trainSet.npy for 'onlineSpeechEnhancement'

What is the way to make trainSet for prelearning Dictionary?

How can i make trainset with other wav files?

Thank you.

about offline speech separation

”Due to differences in TDOA estimation for the 1m and 5cm microphone separation settings, including increased spatial aliasing and lower spatial resolution in the 5cm case,we also compared results averaged over these two settings separately. While we found somewhat decreased scores and increased variance in the 5cm case, the results were generally comparable.“
In the task of offline voice separation, do you need to make changes in the code for 5cm and 1m voices? Why I ran your code and the 1m data can be separated correctly, but all 5cm voices cannot get the correct results. Only one or two peak points can be obtained (the truth is that there should be three sources).
My English is not very good, these English are from Google Translate, please forgive me

argmax error

I am running python 2.7 64-bit with latest scipy and numpy.

The first demo works with the multiple speakers, but for the speech enhancement task, I get error:
argMaxGCCNMF = argmax(gccNMF, axis=1)
NameError: name 'argmax' is not defined

how to fix this? Thank you!

Is there a logical error?

In the processFrames method of GCCNMFProcessor class, the targetTDOAIndex is set after the calling of getTFMask method, which means that the mask is got using the target TDOA of last 6 frames, instead of the latest 6 frames including the current frame.
My English is not very good, please forgive me.

gccNMF model

hi
I try to run the file "runGccNMF.py", it shows "No module named 'gccNMF'".
I'm not sure how to fix this problem, if you can, please tell me the method to solve this problem.

best

Any Audio demo?

Hi,
I'm new to audio enhancement, and there seems a lot to learn, any Audio sample or pre trained model for fast evaluation?
Thanks

Why gain factor when reconstructing the signal?

Hi seanwood @seanwood , I'm a junior in BSS and thank you for your useful and effective open-source GCC_NMF. I happened to came across an unknown parameter when applying istft and reconstructing waveform:

def getTargetSignalEstimates(targetSpectrogramEstimates, windowSize, hopSize, windowFunction,numSamples):
    numTargets, numChannels, numFreq, numTime = targetSpectrogramEstimates.shape
    stftGainFactor = hopSize / float(windowSize) * 2
......
    return array(targetSignalEstimates) * stftGainFactor

Of course the outcome is good, but I really don't know why multiplying 2*hop_size/n_fft. Could you please give an explanation? Thank you.

runRealtimeGCCNMF

Hello,

I have a question to fix my problem. When I run 'runRealtimeGCCNMF.py' and hit the play button, a variable 'realGCC' in 'gccNMFProcessor.py' at 'processFrames' has only NaN value for default input.
'dev_Sq1_Co_A_mix.wav'

How can I fix it?

Thank you

two small problems?

First problem is that the low latency algorithm works in terms of working, but the output for both symmetric and asymmetric is silence, no sound? I copied the exact code from the python books.

Second problem is with the low latency and online speech enhancement algorithms, compared to the first two algorithms, that output the correct WAV format, these 2 last ones output all good, except that the bit depth is doubled for some reason? So instead of input signed 16-bit WAV, I get float 32bit WAV output? how to fix this?

thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.