Git Product home page Git Product logo

music_genre_classification_pytorch's Introduction

Musical Genre Classification

We discuss the application of convolutional neural networks for the task of music genre classification. We focus in the small data-set case and how to build CNN architecture. We start with data augmentation strategy in music domain, and compare well-known architecture in the 1D, 2D, sample CNN with law data and augmented-data. Moreover, we suggest best performance CNN architecture in small-data music-genre classification. Then, we compare normalization method and optimizers. we will be discussed to see how to obtain the model that fits better in the music genre classification. Finally, we evaluate its performance in the GTZAN dataset, used in a lot of works, in order to compare the performance of our approach with the state of the art.

Dataset

We use the GTZAN dataset which has been the most widely used in the music genre classification task. The dataset contains 30-second audio files including 10 different genres including reggae, classical, country, jazz, metal, pop, disco, hiphop, rock and blues. For this homework, we are going to use a subset of GTZAN with only 8 genres. You can download the subset from this link.

Once you downloaded the dataset, unzip and move the dataset to your home folder. After you have done this, you should have the following content in the dataset folder.

Data augmentation

Data augmentation is the process by which we create new synthetic training samples by adding small perturbations on our initial training set. The objective is to make our model invariant to those perturbations and enhance its ability to generalize. In order to this to work adding the perturbations must conserve the same label as the original training sample.

  • Add Noise
  • Shift
  • Speed Change
  • Pitch Shift
  • Pitch and Speed
  • Multiply Value
  • Percussive

Result

The model with the best validation accuracy is the 4Layer CNN with 77%. The test accuracy of this model is 83.39%. Sample_rate 22050 used in feature engineering, fft size 1024, win size 1024, hop size 512, num mels 128, feature length 1024. We also recorded 26 epochs based on early stop criteria. Stochastic gradient descent was used, and learning rate 0.01, momentum 0.9, weight decay 1e-6, using nesterov showed the best performance.

Model Train Acc Valid Acc Train Acc(Augmented) Valid Acc(Augmented) Test Acc
5L-1D CNN 0.97 0.55 0.99 0.70
AlexNet 0.98 0.63 0.99 0.72
VGG11 0.99 0.68 0.99 0.76
VGG13 0.97 0.68 0.99 0.74
VGG16 0.99 0.69 0.99 0.75
VGG19 0.98 0.67 0.99 0.74
GooLeNet 0.75 0.57 0.99 0.65
ResNet34 0.99 0.63 0.99 0.70
ResNet50 0.99 0.61 0.99 0.69
DenseNet 0.98 0.66 0.99 0.76
Sample CNN Basic Block 0.13 0.13 0.15 0.13
4L-2D CNN 0.93 0.62 0.95 0.77 83.39
4L-2D CNN + GRU 0.92 0.64 0.99 0.76 81.55

Experiments

Requirements

Before you run baseline code you will need PyTorch. Please install PyTorch from this link. We will use PyTorch 1.0 because it is the first official version.

  • Python 3.7 (recommended)
  • Numpy
  • Librosa
  • PyTorch 1.0

Learning code

First, you augmentation data

$ audio_augmentation.py

Second, Feature extraction using mel-spectogram

$ feature_extraction.py

Check hparams.py and change a parameters, and take a train_test

$ train_test.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.