1st Place Public Leaderboard Solution for ERC2019

Overview

Although the competition is for recognizing emotion from audio data. You can treat this codebase for a baseline for audio classification in general, I did not make any assumption about the provided data. This is is pipeline

Key features

Preprocessing by converting the audio to Mel Spectrogram.

I used librosa with this config:

sampling_rate = 16000
duration = 2 # sec
hop_length = 125 * duration
n_mels = 128

Basically, each 128x128 image represents 2 second of audio.

Fully CNN for audio classification

Recently we won a gold medal in Kaggle's Freesound Audio Tagging 2019 and thus most of the architectures were borrowed from there.

The final submission was an ensemble of 4 models. 3 of them were Classifier_M0, Classifier_M2 and Classifier_M3 from our technical report [1]:

Here's what Classifier_M3 looks like:

The other model came from the 7th place solution

Mixup + SpecAugment (SpecMix)

The most important part of this solution is the augmentation method, as the dataset is very small and pretraining is not allowed.

Augmenting options spectrogram are very limited due to the nature of the data (they are not ordinary images e.g rotating a spectrogram makes no sense). In this work I ultilized Mixup [2] and SpecAugment [3]. This repo by Eric Bouteillon showed a nice explantion of the method:

Training

Preprocessing

To reproduce the Mels data, run the following command:

python preprocess.py --train_df_path <path-to>/train_label.csv --train_dir <path-to>/Train --test_dir <path-to>/Public_Test --train_output_path ./data/mels_train.pkl --test_output_path ./data/mels_test.pkl

Training

To reproduce the models, run the following commands:

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m0 --logdir models_m0 --output_name preds_m0.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m2 --logdir models_m2 --output_name preds_m2.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m3 --logdir models_m3 --output_name preds_m3.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model dcase --logdir models_dcase --output_name preds_dcase.npy

nlebang / pytorch-audio-emotion-recognition Goto Github PK

pytorch-audio-emotion-recognition's Introduction

1st Place Public Leaderboard Solution for ERC2019

Overview

Key features

Preprocessing by converting the audio to Mel Spectrogram.

Fully CNN for audio classification

Mixup + SpecAugment (SpecMix)

Training

Preprocessing

Training

pytorch-audio-emotion-recognition's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent