Git Product home page Git Product logo

record-producers's Introduction

Identifying Record Producers from Audio Data

Table of Contents

Motivation

Music producers can have a big influence over the sound of an album. This model can be used for two main purposes:

  • Music Discovery: Services like Pandora and Spotify leverage their ability to find music users will like. Production "sound" is another dimension that users may enjoy exploring when searching for new music.
  • Music Publishing: The creation and maintenance of a database of music production and ownership credits has been an historically difficult task. When streaming services pay royalties to record labels often creative collaborators do not get paid properly because of missing documentation. This is a step toward "fingerprinting" the creators of a song.

Data Understanding

Data Sources

Audio Processing

Identifying a record producer lies in the timbre of a sound. Timbre can be thought of as the "quality" or "identity" of a sound. It's what allows us to tell a flute from a trumpet even if they are playing the same notes. Timbre can be found in the higher-frequency overtones of a sound.

Audio mp3 clips 30-seconds long from 1000 songs (10 producers, 100 songs each) were converted to .WAV files and run through a highpass filter to accentuate the timbre frequencies. For each clip, the Mel-Frequency Cepstral Coefficients (MFCCs) were calculated.

MFCCs, very generally, are a set of values that correspond to the timbre of a sound.

More technically, MFCCs are calculated by first taking the Fast Fourier Transform (FFT) of a waveform to convert from amplitude-time space to frequency-time space. Then, each frequency power spectrum of the FFT is treated as its own wavelet and is decomposed further using the Discrete Cosine Transform (DCT). The resulting values are the Mel-Frequency Cepstral Coefficients. The figure below shows an example of the audio processing.

Modeling

After processing, each song has about 24,000 MFCCs (20 in the frequency dimension, 1200 in the time dimension). Principal Component Analysis (PCA) was used to reduce the dimensionality to 12 sonic eigenvectors.

A K-Nearest Neighbors (KNN) algorithm was used to identify the most likely producers for any new song. The figure below shows how an example of how the KNN algorithm works.

Evaluation

The model was tested on a 300-song testing set. The multiclass accuracy for 10 balanced classes of producers was 44% compared to a baseline of 10%.

The data were then plotted on a 2D t-SNE plot to show the relative clustering of songs.

An interactive t-SNE plot can be found here.

Future Improvements

  • Deconvolution of Variables:
    • Artist/Album/Instrumentation
    • More accurate labeling
  • Scale:
    • More songs/producers
    • Parallelize and deploy on AWS/Spark
  • Feature Engineering:
    • More Audio Processing/Reverse Engineering
    • Remove music structure by breaking songs into beats
  • Modeling:
    • Neural Networks with Tensorflow/Keras

Built With

record-producers's People

Contributors

veevargas avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.