musikalkemist / audiosignalprocessingforml Goto Github PK

View Code? Open in Web Editor NEW

1.1K 28.0 378.0 80.34 MB

Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"

License: MIT License

Jupyter Notebook 100.00%

audiosignalprocessingforml's Introduction

AudioSignalProcessingForML

Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"

audiosignalprocessingforml's People

Contributors

Stargazers

Watchers

Forkers

hzitoun stjordanis maxcodextc daywatch mryelameli turabihasan kaychen1224 hitesh-hitu manojkl ntlong1099 yvinsam jhonpineda maybeee18 tyhuffman u7karshs mikful guillermoovejerosanchez tonirajamaki protoaster tadieb satanu01 duyducvo4444 suhang17 ysc150799 tund fagan2888 srinivasgutta7 rishabhc711 florent-dueme takuya-wakiyama thumimku waqarulhaq129 rishi49 anni1123 hawa920 amressamai ganlay20 qambitions uncommon29 brandonbergeron ovishake1607066 mikhailtochilin yoonminjoo sandeshdevadiga akons97 cathydongxueyan zhaobingtech sachokk jchen0099 syedazkarul yjhoon2 ashishpatel26 k-farruh mromar507 godisloveforme dwtguh songtaoshi xianchao-wu stel-nik jimmy-inl tejas-droid deeplatte tverdnik hichemmaiza congtang712 allholy mahima8178 catichenor anitalp darkshadow9799 veevargas kaka7 ecobb kaelthasss gfvvz dieptran43 visilvestre auzxb onkarsus13 luckyseven1122 yesidc aditya-mettu patiwwb ufolei wongeuntrue sazzad15-1779 drsound888 ncellepola jstuartpieri fyin-stats dw-liedji dipakdhurimlnn yafaa mohamad-najib anmolag190153 moulidharb blairlee tranthuy99 hoaf13 crazycharles6

audiosignalprocessingforml's Issues

Logmel Spectrogram- feature extraction

Hi, am doing speech recognition for micro controller. Am new to this and trying to modify the code which is written for Acoustic Scene Classification where they have used 30sec wav audio dataset.

Now, I need to use 1sec dataset for speech recognition but am not getting proper value after feature extraction.

Below are the codes which am using for log mel spectrogram. Can help me pls?

"""LogMel Feature Extraction example."""

import numpy as np
import sys
import librosa
import librosa.display
import scipy.fftpack as fft

SR = 16000
N_FFT = 1024
N_MELS = 30

def create_col(y):
assert y.shape == (1024,)

# Create time-series window
fft_window = librosa.filters.get_window('hann', N_FFT, fftbins=True)
assert fft_window.shape == (1024,), fft_window.shape

# Hann window
y_windowed = fft_window * y
assert y_windowed.shape == (1024,), y_windowed.shape

# FFT
fft_out = fft.fft(y_windowed, axis=0)[:513]
assert fft_out.shape == (513,), fft_out.shape

# Power spectrum
S_pwr = np.abs(fft_out)**2

assert S_pwr.shape == (513,)

# Generation of Mel Filter Banks
mel_basis = librosa.filters.mel(SR, n_fft=N_FFT, n_mels=N_MELS, htk=False)
assert mel_basis.shape == (30, 513)

# Apply Mel Filter Banks
S_mel = np.dot(mel_basis, S_pwr)
S_mel.astype(np.float32)
assert S_mel.shape == (30,)

return S_mel

def feature_extraction(y):
assert y.shape == (32, 1024)

S_mel = np.empty((30, 32), dtype=np.float32, order='C')
for col_index in range(0, 32):
    S_mel[:, col_index] = create_col(y[col_index])

# Scale according to reference power
S_mel = S_mel / S_mel.max()
# Convert to dB
S_log_mel = librosa.power_to_db(S_mel, top_db=80.0)
assert S_log_mel.shape == (30, 32)

return S_log_mel

librosa.waveplot()

librosa.display.waveplot() no longer supported by librosa, use librosa.display.waveshow()

https://github.com/musikalkemist/AudioSignalProcessingForML/blob/master/8-%20Implementing%20the%20amplitude%20envelope/Implementing%20the%20amplitude%20envelope.ipynb

librosa.feature.rms()

librosa no longer supporting first parameter as y
use y= to address the right parameter
librosa.feature.rms(y=[your loaded file])

How to get the array of envelope in every second?

Hi! I am excited in this calculation. Currently, the calculation is to calculate the whole piece, but I am wondering is there any way to get the array of envelopes in every second? I mean in the first second, the amplitude_envelope is [[0.00000000e+00, 2.32199546e-02, 4.64399093e-02, ..., 4.00544218e+01, 4.00776417e+01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00], and second is..., and so on.

Thanks!

Doubt in feature extraction

Hi,
I am working on the feature extraction (Mel-spectrogram) for training the ML model for Sound event detection.
I have a doubt regarding the steps required for the feature extraction process. Is it mandatory/required to convert the Amplitude to dB scale (librosa.power_to_db) or I can proceed without it? As after calculating Mel-power Spectrogram I am getting -ve values also which is creating a problem in Log-compression as -ve values give NaN.

Note: I am normalizing a file with its own mean and std.

musikalkemist / audiosignalprocessingforml Goto Github PK

audiosignalprocessingforml's Introduction

AudioSignalProcessingForML

audiosignalprocessingforml's People

Contributors

Stargazers

Watchers

Forkers

audiosignalprocessingforml's Issues

Logmel Spectrogram- feature extraction

librosa.waveplot()

librosa.feature.rms()

How to get the array of envelope in every second?

Doubt in feature extraction

Error: Cannot Clone Repo, tried on W10

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent