environmental-sound-classification's Introduction

Environmental-Sound-Classification

Environmental-Sound-Classification using ESC-10 dataset

Dependencies

Python
Keras
Librosa
sounddevice
SoundFile
scikit-learn

Dataset

Uses ESC-10 dataset for sound classification. It is a labeled set of 400 environmental recordings (10 classes, 40 clips per class, 5 seconds per clip). It is a subset of the larger ESC-50 dataset.

Setup

In this repository, I trained Convolution Neural Network, Multi Layer Perceptron and SVM for sound classification. I achieved classification accuracy of approx ~80%. MFCC (mel-frequency cepstrum) feature is used to train models. Other features like short term fourier transform, chroma, melspectrogram can also be extracted.

The dataset is downloaded and is kept inside "dataset" folder. It has 10 different classes each containing 10 .ogg files. You can visualize the dataset by running visualize_data.py. This script takes a .ogg file as input and converts it into .wav form. The waveform is visualized in the form of a plot.

python visualize_data.py -o "dataset/001 - Dog bark/1-30226-A.ogg"

A sample wav file for each class has been generated and kept within sample_wav folder for reference.

To train and classify, execute main.py as -

python main.py cnn  // for training CNN
python main.py mlp  // for training MLP
python main.py svm  // for training SVM

Internally main.py uses extract_features.py and nn.py (or svm.py) to create and train model.

Once training is done, the trained models are automatically saved in h5 format.

Wave plot for class - baby

Wave plot for class - dog

Recommend Projects

zoltanszalontay / environmental-sound-classification Goto Github PK