schmiph2 / pysepm Goto Github PK

Python implementation of performance metrics in Loizou's Speech Enhancement book

License: GNU General Public License v3.0

Python 100.00%

speech-enhancement speech-quality speech-intelligibility pesq stoi fwsnrseg snrseg llr wss performance-measures python

pysepm's Introduction

pysepm - Python Speech Enhancement Performance Measures (Quality and Intelligibility)

Python implementation of objective quality and intelligibilty measures mentioned in Philipos C. Loizou's great Speech Enhancement Book. The Python implementations are checked with the MATLAB implementations attached to the book (see Link)

Install with pip

Install pysepm:

pip3 install https://github.com/schmiph2/pysepm/archive/master.zip

Examples

Please find a Jupyter Notebook with examples for all implemented measures in the examples folder.

Implemented Measures

Speech Quality Measures

Segmental Signal-to-Noise Ratio (SNRseg)
Frequency-weighted Segmental SNR (fwSNRseg)
Log-likelihood Ratio (LLR)
Weighted Spectral Slope (WSS)
Perceptual Evaluation of Speech Quality (PESQ), (python-pesq implementation by ludlows)
Composite Objective Speech Quality (composite)
Cepstrum Distance Objective Speech Quality Measure (CD)

Speech Intelligibility Measures

Short-time objective intelligibility (STOI), (pystoi implementation by mpariente)
Coherence and speech intelligibility index (CSII)
Normalized-covariance measure (NCM)

Dereverberation Measures (TODO)

Bark spectral distortion (BSD)
Scale-invariant signal to distortion ratio (SI-SDR)

pysepm's People

Contributors

Stargazers

Watchers

pysepm's Issues

I had a problem when reading the wav file

log：
/Users/wangshaoce/test_project/pysepm/test.py:6: WavFileWarning: Chunk (non-data) not understood, skipping it.
fs, clean_speech = scipy.io.wavfile.read('1_speech_16000_Hz.wav')
/Users/wangshaoce/test_project/pysepm/test.py:7: WavFileWarning: Chunk (non-data) not understood, skipping it.
fs, noisy_speech = scipy.io.wavfile.read('1_noisySpeech_16000_Hz.wav')
this is my code：
import pysepm
import scipy.io.wavfile
import wavio
import sys
sys.path.append("../")
fs, clean_speech = scipy.io.wavfile.read('1_speech_16000_Hz.wav')
fs, noisy_speech = scipy.io.wavfile.read('1_noisySpeech_16000_Hz.wav')

fs, enhanced_speech = scipy.io.wavfile.read('1_processed_16000_Hz.wav')

pysepm.pesq(clean_speech, noisy_speech, fs)

pysepm.pesq(clean_speech, enhanced_speech, fs)

Google didn't solve my problem
I look forward to your reply
best wishes

BSD

Hi,

thank you for the library.
I am working on rewriting of some of measures to C#. I would like to ask, if the BSD measures already works or not. I have seen in TODO list but I found the code in repository.

Thx
pauquail

nan values for most metrics

I am looking at the DNS-2020 test dataset:

https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master/datasets/test_set/synthetic/no_reverb

And I want to see the results of as many metrics as possible:

def get_evaluation(clean_speech, noisy_speech, sr = 16000):
    
    Y0 = pysepm.fwSNRseg(clean_speech, noisy_speech, sr)
    Y1 = pysepm.SNRseg(clean_speech, noisy_speech, sr)
    Y2 = pysepm.llr(clean_speech, noisy_speech, sr)
    Y3 = pysepm.wss(clean_speech, noisy_speech, sr)
    Y4 = pysepm.cepstrum_distance(clean_speech, noisy_speech, sr)
    Y5 = pysepm.stoi(clean_speech, noisy_speech, sr)
    Y6, Y7, Y8 = pysepm.csii(clean_speech, noisy_speech, sr)
    _, Y9 = pysepm.pesq(clean_speech, noisy_speech, sr)
    Y10, Y11, Y12 = pysepm.composite(clean_speech, noisy_speech, sr)
    Y13 = pysepm.ncm(clean_speech, noisy_speech, sr)
    
    return [Y0,  Y1,  Y2,  Y3,  Y4,
            Y5,  Y6,  Y7,  Y8,  Y9,
            Y10, Y11, Y12, Y13]

But my output looks like this:

[nan,
  8.621018369535282,
  0.1498153037618391,
  nan,
  1.9050031356095016,
  0.9807254867015377,
  nan,
  nan,
  nan,
  2.3495934009552,
  nan,
  nan,
  nan,
  nan]

What may be the issue?

The two things I measure are different from what I think

Hello, I am studying about voice improvement.

If you look at your notbook, you are comparing clean and noisy, clean and enhanced.

But I've always compared noisy (noise + voice) to enhanced (denoised). (Of course, only pesq indicators were analyzed.)

I wonder if your comparison is correct. I think I made a mistake just after conducting the research.

Problem when using

Here is a part of my code.

And this is the error.

Here is the data detail.

Composite function output

Hi, great package.
I just have one question, I guess the composite function outputs SIG, BAK and OVL measurements, but in what order?

Problems about Composite Objective Quality Measure

Thanks for this convenient python implementation for quality measures.
However,I am confused about the composite terms. Can you point out where I can get more information about the linear coefficients of CSIG,CBAK and COVL?Because the coefficients do not correspond to those reported in this paper https://ieeexplore.ieee.org/document/4389058 (table 7).

confusing about the meaning of the evaluation result

In the test example jupyter notebook, for some metrics the enhanced result is smaller than noisy one's, while some's are larger; e.g, PESQ should be the larger the better, but it's not the case in the demo.

and the scale/range of the result isn't mentioned in readme, I know some lie in 0 to 5, but not familiar with others.

problem about install

I install pysepm just like what you say in readme. However, when I import pysepm, python tells me that not find module. I am worry about it, could you help me?

The unit of Frequency-weighted Segmental SNR

May I ask that "dB" is the unit of Frequency-weighted Segmental SNR?
Thanks

pesq can be used as a loss?

hi, can pesq be used as a loss? i.e. if pesq is differentiable and pass gradient flow?
for example, is si-snr is implemented by pytorch, it can be used as a loss, while it can evaluate the performance.

llr calculation error

Thank you for making a useful library.
During the experiment, I found an error in the llr calculation of composite funtion.
I attached samples.
https://drive.google.com/drive/folders/1FqArlwPXnLHQ53Kn99oP24IS7XoaBCNl?usp=sharing

238.wav is well calculated by CSIG, CBAK, COVL
However, CSIG and COVL are not properly calculated(outputs are 1.0) for the 101.wav
As a result of the analysis, the llr_mean of the composite function had an 'inf' value.

So I made some corrections by comparing with matlab composite function.

I modified line181 in qualityMeasures.py as below.
clean_speech_framed=extract_overlapped_windows(clean_speech+eps,winlength,winlength-skiprate,hannWin)
processed_speech_framed=extract_overlapped_windows(processed_speech+eps,winlength,winlength-skiprate,hannWin)

Still the problem has not been solved the issue and lines 151-154 have been modified as follows.
if np.abs(E[i]) < eps:
rcoeff[i]=(R[i+1] - sum_term) / (E[i])
else:
rcoeff[i]=(R[i+1] - sum_term) / (E[i])

Then, We have a value that is almost similar to the result obtained through matlab, but it still does not exactly match (to the second decimal place).

Since my modifications have not been verified, I think further modifications will be necessary.