covarep / covarep Goto Github PK

A Cooperative Voice Analysis Repository for Speech Technologies

Home Page: http://covarep.github.io/covarep

License: Other

MATLAB 96.37% C 1.51% Python 2.13%

covarep's Introduction

                                Covarep
        A Cooperative Voice Analysis Repository for Speech Technologies
                            Version (after 1.4.1)
                    http://covarep.github.io/covarep



Covarep is an open-source repository of advanced speech processing algorithms
and is stored as a GitHub project (https://github.com/covarep/covarep) where
researchers in speech processing can store original implementations of published
algorithms.

Over the past few decades a vast array of advanced speech processing algorithms
have been developed, often offering significant improvements over the existing
state-of-the-art. Such algorithms can have a reasonably high degree of
complexity and, hence, can be difficult to accurately re-implement based on
article descriptions. Another issue is the so-called 'bug magnet effect' with
re-implementations frequently having significant differences from the original
ones. The consequence of all this has been that many promising developments
have been under-exploited or discarded, with researchers tending to stick to
conventional analysis methods.

By developing Covarep we are hoping to address this by encouraging authors to
include original implementations of their algorithms, thus resulting in a
single de facto version for the speech community to refer to.

We envisage a range of benefits to the repository:

1) Reproducible research: Covarep will allow fairer comparison of algorithms
in published articles.

2) Encouraged usage: the free availability of these algorithms will encourage
researchers from a wide range of speech-related disciplines to exploit them
for their own applications.

3) Feedback: as a GitHub project users will be able to offer comments on
algorithms, report bugs, suggest improvements etc.

Scope
    We welcome contributions from a wide range of speech processing areas,
    including (but not limited to): Speech analysis, synthesis, conversion,
    transformation, enhancement, glottal source/voice quality analysis, etc.

Contribute!
    We believe that the Covarep repository has a great potential benefit to the
    speech research community and we hope that you will consider contributing
    your published algorithms to it. If you have any questions, comments issues
    etc regarding Covarep please contact us on one of the email addresses below.
    Please forward this email to others who may be interested.

Please also have a look at the webiste http://covarep.github.io/covarep and the
Covarep.pdf document in the documentation directory for more information.

Octave
    Most of the functions in Covarep are Octave compatible.
    However, it is necessary to install the following packages:
        tsa, optimization, signal.

How to cite
    If you publish experiment results obtained by using Covarep, please cite
    the repository using the following publication:
      G. Degottex, J. Kane, T. Drugman, T. Raitio and S. Scherer, "COVAREP - A
      collaborative voice analysis repository for speech technologies", In
      Proc. IEEE International Conference on Acoustics, Speech and Signal
      Processing (ICASSP), Florence, Italy 2014.

    Also, within the text of your paper, please mention the version used.
    E.g. "... we compared with methods X, Y, Z available in [Covarep](v1.0.1)."


Maintainers
    Gilles Degottex <[email protected]>
        University of Crete, Heraklion, Greece

    John Kane <[email protected]>
        Trinity College Dublin, Dublin, Ireland
        
    Thomas Drugman <[email protected]>
        University of Mons, Mons, Belgium
        
    Tuomo Raitio <[email protected]>
        Aalto University, Espoo, Finland

    Stefan Scherer <[email protected]>
        University of Southern California, Los Angeles, USA

covarep's People

Contributors

Stargazers

Watchers

Forkers

psibre ftesser gangchen-speech qrzhou sathishpc sibghatullahsheikh jofrhwld pcallier mfreixes lanbk52 stevenlol zhizhengwu suzinia dresen biruntha gciccarelli3 twoertwein sandepp123 pineking laic openube jordi-adell guilk jordicompany chalearn lilimeng mashrin chenxiao60 xyc1120310104 aitorbajo zhigaochen cirograciapons zhangzhaofeng nivedita qianwenyuan haroun3amri fox-glue mixcoder runngezhang iohara t-takeyama echoyuzhou codacola james-lh jackustc du5l5ljason frankgcheung saquibntt molinli qiushang38 007v mohammedibra96 amoghmatt cveaux hyli666 goiosunsw anirudh27 nd1511 yvonne09 wangy319 rockycamp alexis-michaud tianchi03 donghaiyw sddai anatoly2008 yangforest waynewiser mengchy librence xu-shihao mwlodarczak kuonanhong bob-hu jpcortes84 felixdollack dchhan1 amahnan drphilthy meghana27n joy20182018 molanischen 201528014227051 xiaoqiangzhang203 swekia gongpinghuang j-hedtke zuowanbushiwo vmontazeri yaqianhuang feifan-wang smalltigerlee hyeyoung-koh sudarsanakadiri cupidzhongke applejenny66 ksnof satvik-dixit qiuchili deza-to

covarep's Issues

Update Covarep project name in all functions

Add code for conventional glottal parameterisation (e.g., NAQ, QOQ, H1-H2, PSP etc.)

SRH_pitch.m breaks if audio has lower sample rate than 16kHz

Currently the frequency check only tests if sample rate is above 16kHz and then down-samples the audio to 16kHz. I suggest to change this to capture frequencies that are "not equal" to 16000 and change them.

Line 77: if fs~=16000

Add Cepstral Peak Prominence feature

Add CPP and its time/quefrency smoothing variants to the glottalsource code

Add LF modelling algorithm (dyProg-LF)

Need the wiki of github to add more documentation ?

Add option in sin_analysis for using zero-padding at boundaries (and not dropping analysis instants)

Add DCE-MFA envelope analysis (Y. Shiga, S. King)

Add Functions of Phase Distortion computation

formant_CGDZP breaks if fed with constant input at beginning of wav-file

recently found an issue with formant_CGDZP when it breaks at:

120 formantPeaks=[formantPeaks ; peakIndex];

and

170 while(possibleValues(1)==0) %discard zero entries

at 120 the crash happens due to mismatching when concatenating
at 170 the crash happens due to access attempt in empty array.

i made a fix at work and will commit soon when double checked that it works.
cheers

input: add zeros (maybe about 500 are sufficient) at beginning of file and it will break

Add spec2mfcc and mfcc2spec

env_te.m dftlen, order and winlen can be inconsistent

From Chin Chii Yeh: "I think that line 99 and 100 should be placed above line 93, so that the size of win and ccp and cc will be the same when min(order, dftlen/2-1)=dftlen/2-1. If not there might be error in line 128."

Add Peakdet (A. Michaud)

Add FChT

Add Phase Distortion Variance

Make code more `vectorised'

In particular in the glottalsource files, a lot of the code could be made more efficient by making it more vectorised.

Add DECOM and Oq estim (N. Henrich)

Need the wiki integrated in the Github website in order to provide more documentation ?

Instructions page for contribution on the website

Pitch SRH: Returned values can be higher than the given F0max

@ThomasDrugman, can we clip the values directly inside the pitch_srh.m code ?
Or this behavior was expected by you ?

Fix names of spec2mfcc mfcc2hspec, because this is not the usual MFCCs.

See discussion at fb3a151.

Octave: Error in loading system_net_creak

When loading system_net_creak in octave, the following errors pop up:

warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nnweight\dotprod.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nnweight\dotprod.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nnweight\dotprod.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nnnetinput\netsum.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nnnetinput\netsum.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nntransfer\tansig.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nntransfer\logsig.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nntransfer\tansig.m
warning: skipping over ''
warning: load: can't find the file C:\Program Files\MATLAB\R2011a\toolbox\nnet\nnet\nntransfer\logsig.m
warning: skipping over ''
warning: no constructor for class network
warning: load: element has been converted to a structure

Since system_net_creak isn't successfully loaded, creaky_voice_do_detection fails at line 87

> creak_pp=sim(net,X);
error: Structure doesn't seem to be a neural network

Problems running COVAREP_feature_extraction

I'm trying to extract the COVAREP feature from a database at 8KHz. The files were collected exactly the same way, however I'm facing some problems running the pitch_srh and gci_sedreams of methods for a few of them (less than 5% I would say).

In pitch_srh I get the following error:
Attempted to access Spec(4005); index out of bounds because numel(Spec)=4000.

Error in pitch_srh>SRH_EstimatePitch (line 154)
SRHs(freq)=(Spec(freq)+Spec(2_freq)+Spec(3_freq)+Spec(4_freq)+Spec(5_freq))-(Spec(round(1.5_freq))+Spec(round(2.5_freq))+Spec(round(3.5_freq))+Spec(round(4.5_freq)));

Error in pitch_srh (line 92)
[F0s,SRHVal,time] = SRH_EstimatePitch(res',fs,f0min,f0max,hopsize);

Error in COVAREP_feature_extraction (line 107)
[srh_f0,srh_vuv,~,srh_time] = pitch_srh(x,fs,F0min,F0max, ...

In gci_sdreams I get the following error:
Improper assignment with rectangular empty matrix.

Error in gci_sedreams (line 179)
gci(Ind)=start+posi-1;

Error in COVAREP_feature_extraction (line 117)
GCI = gci_sedreams(x,fs,F0med,1); % SEDREAMS GCI detection

Do you think this could have anything to do with the 8K sampling frequency?

Thanks!
José

Add HMPD vocoder

Fix HOWTO_formant.m

Add Phase Distortion Deviation (PDD) computation

SRH_pitch.m has serious issues with long files

memory goes out of reasonable bounds.
i will fix it with a temporary workaround:

specMat = zeros(fs, size(frameMatWinMean,2));
for i = 1:size(frameMatWinMean,2)
specMat(:,i) = abs( fft(frameMatWinMean(:,i),fs) )';
end

in addition i will add clear commands to delete unnecessary matrices whenever possible.

Do we want to include basic baselines for comparison, such as FFT, LP, etc?

findpeaks.m function overloaded by VOICEBOX

There is a findpeaks inbuilt function in matlab and also one in VOICEBOX. Both functions output peak amplitude and index location but in different order. I propose renaming the one in VOICEBOX findpeaks_VB.m to clear this up

COVAREP_feature_extraction.m - spec2mfcc not defined

Function COVAREP_feature_extraction.m, line 160:
MCEP(m,:) = spec2mfcc(hspec2spec(Ete), fs, MCEP_ord)';
The function 'spec2mfcc' is not defined anymore. I think I solved the problem using:
MCEP(m,:) = hspec2fwcep(Ete, fs, MCEP_ord)';

Precise "Only published works can be added in Covarep"

Add regression and integrity tests

To test if the methods results do not vary from one version to the next.

Check English of the basic texts: Readme, Disclaimer, etc.

Add Rd estimation using Mean Squared Phase

Fix license of IAIF

Write DISCLAIMER

Octave: Make HOWTO_spectra Octave-compatible: spectrogram doesn't exist in Octave

Need the wiki integrated in the Github website in order to provide more documentation ?

Add a single script to compute all features at once and gather them in a single structure

pitch_srh breaks on short audio

I've been feeding get_kd_creak_features() individual audio files of vowels, segmented out of a larger recording. I've found that with some audio files with duration less than 100 ms, pitch_srh() returns an index error. I've put up a sample audio file that has this problem: http://jofrhwld.github.io/assets/test_11000.wav

> [x,fs] = wavread("test_11000.wav");
> [H2H1,res_p,ZCR,F0,F0mean,enerN,pow_std,creakF0] = get_kd_creak_features(x,fs);
error: SRH_EstimatePitch: A(I): index out of bounds; value 1 out of bound 0
error: called from:
error:   covarep/glottalsource/pitch_srh.m at line 171, column 10
error:   covarep/glottalsource/pitch_srh.m at line 87, column 24
error:   covarep/glottalsource/creaky_voice_detection/private/get_kd_creak_features.m at line 73, column 11

I wasn't sure if this was an issue with duration or samples, and I got the same error on the same bit of audio sampled at 16000.

Octave: Check that all Octave-compatible functions work when 'signal' and 'tsa' packages are installed on ones local machine. Once this is checked remove Octave_fcns directory as this is now obsolete

Do we want to include any machine Learning algorithms commonly used in speech processing?

Add a list of available methods

I suggest to add one on the website in order to advertise the project.
A text-version duplicate of this list could be present in the doc directory also.