Identifying Record Producers from Audio Data

Motivation
Data Understanding
Future Improvements
Built With
Acknowledgements
Contact Me

Motivation

Music producers can have a big influence over the sound of an album. This model can be used for two main purposes:

Music Discovery: Services like Pandora and Spotify leverage their ability to find music users will like. Production "sound" is another dimension that users may enjoy exploring when searching for new music.
Music Publishing: The creation and maintenance of a database of music production and ownership credits has been an historically difficult task. When streaming services pay royalties to record labels often creative collaborators do not get paid properly because of missing documentation. This is a step toward "fingerprinting" the creators of a song.

Data Understanding

Data Sources

Spotify API - Contains audio files and song metadata.
Wikipedia - Record producer labeling.

Audio Processing

Identifying a record producer lies in the timbre of a sound. Timbre can be thought of as the "quality" or "identity" of a sound. It's what allows us to tell a flute from a trumpet even if they are playing the same notes. Timbre can be found in the higher-frequency overtones of a sound.

Audio mp3 clips 30-seconds long from 1000 songs (10 producers, 100 songs each) were converted to .WAV files and run through a highpass filter to accentuate the timbre frequencies. For each clip, the Mel-Frequency Cepstral Coefficients (MFCCs) were calculated.

MFCCs, very generally, are a set of values that correspond to the timbre of a sound.

More technically, MFCCs are calculated by first taking the Fast Fourier Transform (FFT) of a waveform to convert from amplitude-time space to frequency-time space. Then, each frequency power spectrum of the FFT is treated as its own wavelet and is decomposed further using the Discrete Cosine Transform (DCT). The resulting values are the Mel-Frequency Cepstral Coefficients. The figure below shows an example of the audio processing.

Modeling

After processing, each song has about 24,000 MFCCs (20 in the frequency dimension, 1200 in the time dimension). Principal Component Analysis (PCA) was used to reduce the dimensionality to 12 sonic eigenvectors.

A K-Nearest Neighbors (KNN) algorithm was used to identify the most likely producers for any new song. The figure below shows how an example of how the KNN algorithm works.

Evaluation

The model was tested on a 300-song testing set. The multiclass accuracy for 10 balanced classes of producers was 44% compared to a baseline of 10%.

The data were then plotted on a 2D t-SNE plot to show the relative clustering of songs.

An interactive t-SNE plot can be found here.

Future Improvements

Deconvolution of Variables:
- Artist/Album/Instrumentation
- More accurate labeling
Scale:
- More songs/producers
- Parallelize and deploy on AWS/Spark
Feature Engineering:
- More Audio Processing/Reverse Engineering
- Remove music structure by breaking songs into beats
Modeling:
- Neural Networks with Tensorflow/Keras

veevargas / record-producers Goto Github PK

record-producers's Introduction

Identifying Record Producers from Audio Data

Table of Contents

Motivation

Data Understanding

Data Sources

Audio Processing

Modeling

Evaluation

Future Improvements

Built With

record-producers's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent