I compared the mfcc of librosa with

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It is consistent. You need switch dimension though. <div class="snippet-clipboard-

inconsistency with librosa about python_speech_features HOT 3 CLOSED

jameslyons commented on June 26, 2024

inconsistency with librosa

from python_speech_features.

Comments (3)

jameslyons commented on June 26, 2024

If you put both sets of features in a classifier, you'll find that you get almost identical results. This is because the value of the mfccs are not intrinsically meaningful. By scaling and shifting one of the feature sets tou should get the other, though there may be some small non linear transformations also. Asking which is correct doesn't make sense, if the features work in a classifier, then thry are correct.

See also #29

from python_speech_features.

yxma2015 commented on June 26, 2024

@chananshgong
I also think so，

If you put both sets of features in a classifier, you'll find that you get almost identical results. This is because the value of the mfccs are not intrinsically meaningful. By scaling and shifting one of the feature sets tou should get the other, though there may be some small non linear transformations also. Asking which is correct doesn't make sense, if the features work in a classifier, then thry are correct.

Moreover，the implementation in Librosa is Center alignment. I guess you had ignored this.

from python_speech_features.

h-jia commented on June 26, 2024

It is consistent. You need switch dimension though.

import librosa
import python_speech_features
import matplotlib.pyplot as plt
from scipy.signal.windows import hann
import seaborn as sns

n_mfcc = 13
n_mels = 40
n_fft = 512 
hop_length = 160
fmin = 0
fmax = None
sr = 16000
y, sr = librosa.load(librosa.util.example_audio_file(), sr=sr, duration=5,offset=30)

mfcc_librosa = librosa.feature.mfcc(y=y, sr=sr, n_fft=n_fft,
                                    n_mfcc=n_mfcc, n_mels=n_mels,
                                    hop_length=hop_length,
                                    fmin=fmin, fmax=fmax, htk=False)

mfcc_speech = python_speech_features.mfcc(signal=y, samplerate=sr, winlen=n_fft / sr, winstep=hop_length / sr,
                                          numcep=n_mfcc, nfilt=n_mels, nfft=n_fft, lowfreq=fmin, highfreq=fmax,
                                          preemph=0.0, ceplifter=0, appendEnergy=False, winfunc=hann)

see: https://stackoverflow.com/questions/60492462/mfcc-python-completely-different-result-from-librosa-vs-python-speech-features

from python_speech_features.

Recommend Projects