Git Product home page Git Product logo

pytorch-audio-emotion-recognition's Introduction

1st Place Public Leaderboard Solution for ERC2019

Overview

Although the competition is for recognizing emotion from audio data. You can treat this codebase for a baseline for audio classification in general, I did not make any assumption about the provided data. This is is pipeline

Key features

Preprocessing by converting the audio to Mel Spectrogram.

I used librosa with this config:

  • sampling_rate = 16000
  • duration = 2 # sec
  • hop_length = 125 * duration
  • n_mels = 128

Basically, each 128x128 image represents 2 second of audio.

Fully CNN for audio classification

Recently we won a gold medal in Kaggle's Freesound Audio Tagging 2019 and thus most of the architectures were borrowed from there.

The final submission was an ensemble of 4 models. 3 of them were Classifier_M0, Classifier_M2 and Classifier_M3 from our technical report [1]:

Here's what Classifier_M3 looks like:

The other model came from the 7th place solution

Mixup + SpecAugment (SpecMix)

The most important part of this solution is the augmentation method, as the dataset is very small and pretraining is not allowed.

Augmenting options spectrogram are very limited due to the nature of the data (they are not ordinary images e.g rotating a spectrogram makes no sense). In this work I ultilized Mixup [2] and SpecAugment [3]. This repo by Eric Bouteillon showed a nice explantion of the method:

Training

Preprocessing

To reproduce the Mels data, run the following command:

python preprocess.py --train_df_path <path-to>/train_label.csv --train_dir <path-to>/Train --test_dir <path-to>/Public_Test --train_output_path ./data/mels_train.pkl --test_output_path ./data/mels_test.pkl

Training

To reproduce the models, run the following commands:

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m0 --logdir models_m0 --output_name preds_m0.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m2 --logdir models_m2 --output_name preds_m2.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model m3 --logdir models_m3 --output_name preds_m3.npy

python train_full.py --train_df_path <path-to>/train_label.csv --test_dir <path-to>/Public_Test/ --model dcase --logdir models_dcase --output_name preds_dcase.npy

pytorch-audio-emotion-recognition's People

Contributors

suicao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.