Git Product home page Git Product logo

vocal-music-separation's Introduction

Vocal and music seperation using a CNN

Description

This CNN attempts to separate the vocals from the music. It does so by training on the amplitude data of the audio file and tries to estimate where the voiced parts are. Vocal separation is done by generating a binary mask of the time-frequency bins that the network thinks contain the vocals and applying it to the original file.

Requirements

  • Python 3
  • Tensorflow (Tested with tensorflow-gpu), Keras
  • And a few other python libraries that you can install by running pip install -r requirements.txt in the root directory

Dataset

  • The script was only tested with .wav files (16-bit and 24-bit signed wavs should work, 32-bit float doesn't). Other formats might work if your version of librosa is capable of opening it.
  • Training data folder should have individual folder for each song. Each song should have two files - mixture.wav (the full song) and vocals.wav (original vocals). See below for a list of data sets that you could potentially use to train this network.
  • To see an example of how the directory structures should look like, refer to structure.md.
  • To make things faster, all songs should have the same sampling rate as configured (I only tested 22050kHz, but other sample rates should work) and should be in mono (if it isn't, the script will convert them, but the result isn't saved anywhere and it takes a while).

Example data sets

Setting up

  1. pip install -r requirements.txt
  2. py main.py

Running

  1. python main.py -h to see all arguments
  2. python main.py will train the network with the default options
  3. python main.py --mode=separate --file=audio.wav will attempt source separation on audio.wav and will output vocals.wav
  4. python main.py --mode=evaluate will evaluate the effectiveness of audio source separation. More information below.

Configuring

All relevant settings are located in the config.ini file. The file doesn't exist in the repository and will be automatically created and prepopulated with the default values on first run. For information on what each option does see config.py.

Evaluating

This program also includes a simple wrapper around BSS-Eval which can be used to determine how effective audio source separation is. To use it you need - the original vocals (vocals.wav), the original accompaniment (accompaniment.wav), estimated vocals (estimated-vocals.wav) and estimated accompaniment (estimated-accompaniment.wav). If you don't have the accompaniment but have a mixture and vocals, you can use the apply_vocal_mask.py script in the misc folder. To get estimated accompaniment, you need to perform separation with the --save_accompaniment flag set to true. After you have all the files, create a data directory that contains a directory with the name of the song and copy all 4 files to it.

Note that librosa by default outputs a 32bit wav file which it can't load without ffmpeg, so you either need to add an extra conversion step between separating and evaluation or install ffmpeg and add it to your PATH. Both files need to have the same format and bitrate for evaluation to be successful as well.

Bug

For a reason I haven't had the time to determine yet the neural network output sometimes has slightly less samples (~76 samples to be exact which is around 0.004s worth of samples) than the original. The evaluation script will account for this, but be advised that some samples are being lost during evaluation.

Weights files when training

While training the network will save its weights every 5 epochs to avoid data loss should you have a power failure or a similar issue. These files may be deleted after training.

Misc

The misc directory contains a few scripts that might be useful but aren't required to run the neural net.

vocal-music-separation's People

Contributors

zingmars avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vocal-music-separation's Issues

AttributeError: 'Dataset' object has no attribute 'mixture'

Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
E0820 21:22:32.377477 16124 main.py:37] CRITICAL - Script started.
Traceback (most recent call last):
  File "main.py", line 41, in <module>
    dataset.load(args.datadir)
  File "C:\Users\gersi\Desktop\vocal-music-separation-master\dataset.py", line 40, in load
    if(len(self.mixture) != len(self.vocals)):
AttributeError: 'Dataset' object has no attribute 'mixture'

I followed the structure example and also renamed the song name to songname1 (just to be sure)
https://github.com/zingmars/vocal-music-separation/blob/master/structure.md
My folder: https://i.imgur.com/CLlt2T2.png

I'm using the MedleyDB_Sample.tar.gz (I have the 43gb one but can't decompress it as I don't have any space left, for now)

Command line used: py main.py --mode=train

I'm kinda lost right now and I'm not quite sure what I should do.

I'm dyslexic so I might have typed something wrong.

My accuracy is low. Which dataset did you use?

I am using DSD100, but it seems i cannot get the accuracy higher than 0.01. Did you use a bigger dataset?

The outcome is that the vocals.wav file is completely empty. (At some point when i messed with some parameters i had even lower accuracy and the vocals.wav contained recognizeable song parts).

Also newer versions of librosa don't support librosa.output.write_wav anymore it seems. I used soundfile as a replacement.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.