Git Product home page Git Product logo

mtg / deepconvsep Goto Github PK

View Code? Open in Web Editor NEW
465.0 34.0 109.0 37.15 MB

Deep Convolutional Neural Networks for Musical Source Separation

License: GNU Affero General Public License v3.0

Python 74.81% MATLAB 23.98% Shell 1.21%
signal-processing deep-learning source-separation theano convolutional-neural-networks sample-querying data-augmentation data-generation score-synthesis audio-synthesis

deepconvsep's People

Contributors

gerruz avatar hmartelb avatar nkundiushuti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepconvsep's Issues

a cool idea

idea1

feeding wavenet implementation in tensorflow simulatenous with the following things to do advanced music gestural recognition:

  • elastic fusion dense slam
  • audio data + audio advanced gestural recognition / spectral classifiers

plus:

  • doing training in real time

idea2

implementing the following artificial intelligence model in tensorflo:

  • deep convolutional recursive swarm of hybrid bid and artificial neural networks

idea3

feeding an implementing the following artificial intelligence model in tensorflow:

  • deep convolutional recursive swarm of hybrid bid and artificial neural networks with the following_things to do advanced music gestural recognition:
    • elastic fusion dense slam
    • audio data + audio advanced gestural recognition spectral classifiers

plus:

  • doing training in real time

idea 4

creating a c++ framework for live electronics and algorithmic composition using some of these

- use next generation state of art machine learning algorithmics implemented in tensorflow such as deep convolutional recursive swarm of hybrids bdi and ann;
- using elasticfusion orbit slam2 as an input for gesture recognition by using computer vision;
- using gpgpu driven fft for audio digital signal processing, and gesture recognition, and audio feature extraction;
- computing audio in non real time using complex gpgpu transformations
- using raya as a sound spatialization gpgpu engine

Using examples/dsd100/separate_dsd.py

Hello,

I am getting an error in running the above file. I used the fft_1024.pkl file and a mixture.wav file from the dsd100 dataset.

The error is :
File "separate_dsd.py", line 336, in
main(sys.argv[1:])
File "separate_dsd.py", line 333, in main
train_auto(inputfile,outdir,model,0.3,30,25,32,513)
File "separate_dsd.py", line 251, in train_auto
lasagne.layers.set_all_param_values(network2,params)
File "C:\Users\path\lasagne\layers\helper.py", line 516, in set_all_param_values
(len(values), len(params)))
ValueError: mismatch: got 13 values to set 15 parameters

another cool idea

implementing the following artificial intelligence model in tensorflo:

  • deep convolutional recursive swarm of hybrid bid and artificial neural networks

iKala is amplifying my results to the point of distortion

Hello. I'm running the iKala script on recordings and I like the results: However, it is amplifying my results considerably when they're done processing! They clip very bad. I've tried to normalize to -4 dB via Audacity prior to processing and that doesn't fix it... It also complains about Theano but seems to work regradless.

parameters in a file?

Maybe I am missing something here, but if I want to let's say use other FFT or hopsize, etc, I have to update the entire framework with these values.

I was thinking a separate script or file that can update the other scripts that use these parameters, and the script would only contain couple of main parameters:

FFT = x
hopsize = y
scale factor = z
etc

Thanks!

lasagne.layers.Conv2DLayer default adds bias

So no need to use lasagne.layers.BiasLayer

I found this when I print the shape of arrays in file fft_1024.pkl (iKala dataset)

These are the shapes and bias array (30,) occurs twice for each convolutional layer.

(30, 1, 1, 30)
(30,)
(30,)
(30, 30, 10, 20)
(30,)
(30,)
(13230, 256)
(256,)
(256, 13230)
(13230,)
(256, 13230)
(13230,)
(2,)

I also checked the code of lasagne.layers.Conv2DLayer, it confirms my assumption.

how to improve separation between sources?

No matter what mixture I try to separate, even if the separation gives me nice results, but I was wondering what are the main parameters and tweaks that can be done in the framework to improve the separation results, so that the vocals has really only the vocals, and the bass, drums, other have less vocal artifacts in them?

Is this maybe just a matter of how many training material is used? or/and is there a way to tweak the framework further to improve the results?

Thanks!

Theano 0.9.0 API change

Traceback (most recent call last):
File "/homedtic/rgong/DeepConvSep/trainCNN.py", line 42, in
import lasagne
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/init.py", line 24, in
from . import layers
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/layers/init.py", line 7, in
from .pool import *
File "/homedtic/rgong/keras_env/lib/python2.7/site-packages/lasagne/layers/pool.py", line 6, in
from theano.tensor.signal import downsample
ImportError: cannot import name downsample

Theano has changed its API for the 0.9.0 version. lasagne haven't updated yet.

max_pool_2D method doesn't exist anymore in ``downsample".

I found the solution for this issue in Theano/Theano#4337

another way of generating the pkl models?

Since I am still having trouble with my issue here:
#1

I was wondering, maybe if instead of that, there was another way to generate the required pkl model
after the successful compute_features, for the dsd100 separation?

I am just trying to avoid the trainCNN problem, since no one has a fix for my problem, yet.
If not, I will just wait until someone figures out my initial problem.

thanks!

other separation tasks with this framework? force stereo?

Now that I have the training and separation finally working, I was wondering about the limits of this framework. For example, can this be modified somehow to separate speech(dialogue) from background music? or is it only built for singing vocals?

Also, training material is in stereo, but input can be stereo or mono, however why is the output mono
if input was stereo? Is there no way to force stereo output with this framework? or is that a project for the future?

Thanks!

TypeError when running trainCNN, please help

OK, I have windows 7 ultimate 64-bit with service pack 1 installed.
i have visual studio 2013 community with update 5 installed
i have every req you guys put in your readme (even if you didn't specify exact version of each
req, but i assumed at least theano 0.8.2 and lasagne 0.2dev1), numpy, scipy, climate, etc
are all standard installs.

my theano installation works, i tried it by itself, so the problem is not there, also,
the compiler nvcc does work, so everything is linked and working.

in terms of your framework here, i can manage to separate a mixture using pre-trained
pkl you guys provided, without any errors. i can also manage to do the compute_features
option of dsd100 (im using dsd100subset 120mb package, not the full dsd100).

the compute_features generates a warning about non-data chunks in the wav files,
so not sure how wav files were generated, but anyway, just saying this in case this
turns out to be a problem), but it works, i get .data and .shape files in the transform
folder.

however, only thing i can't get to work, is the dsd100 trainCNN. i get the following
error:
Using gpu device 0: GeForce GTX 770 (CNMeM is enabled with initial size: 70.0% o
f memory, cuDNN 5005)
I 2017-02-21 21:49:32 trainer:433 Maximum: 0.634328
I 2017-02-21 21:49:32 trainer:434 Mean: 0.003356
I 2017-02-21 21:49:32 trainer:435 Standard dev: 0.013143
I 2017-02-21 21:49:32 trainer:163 Building Autoencoder
Traceback (most recent call last):
File "C:\Python27\lib\runpy.py", line 162, in run_module_as_main
"main", fname, loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 444, in
train_errs=train_auto(train=ld1,fun=build_ca,transform=tt,outdir=db+'output/
'+model+"/",testdir=db+'Mixtures/',model=db+"models/"+"model
"+model+".pkl",num

epochs=nepochs,scale_factor=scale_factor)
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 173, in train_auto
network2 = fun(input_var=input_var2,batch_size=train.batch_size,time_context
=train.time_context,feat_size=train.input_size)
File "C:\DeepConvSep\examples\dsd100\trainCNN.py", line 93, in build_ca
l_conv1 = lasagne.layers.Conv2DLayer(l_in_1, num_filters=50, filter_size=(1,
feat_size),stride=(1,1), pad='valid', nonlinearity=None)
File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 599, in in
it

**kwargs)
File "C:\Python27\lib\site-packages\lasagne\layers\conv.py", line 282, in in
it

self.filter_size = as_tuple(filter_size, n, int)
File "C:\Python27\lib\site-packages\lasagne\utils.py", line 196, in as_tuple
"of {0}, got {1} instead".format(t.name, x))
TypeError: expected a single value or an iterable of int, got (1, 513L) instead

C:\DeepConvSep>

I am really not sure what that means, seems to be either a problem in your code or
something else on my end, but what could it be? thanks a lot for amazing source code
guys, i hope you can help me with my problem :)

dear MTG

I have a problem when run your codes. My command line like "python separate_dsd.py -i /home/hjz/test/1.wav -o /home/hjz/test/ -m /home/hjz/test/model_dsd_fft_1024.pkl". The 1.wav is a music file converted from a .mp3 file. Screenshot like that:
2018-03-23 21-38-41
The version of theano installed on my computer is 0.8.2 and Lasagne is 0.1. I run this code in linux17.04
Could you tell me why?

DSD Compute Features - ImportError transform

which pip install do I need to overcome this Error:

from transform import transformFFT

gives error

python -m compute_features --db 'D:\DSD100\DSD100'
Traceback (most recent call last):
File "C:\Program Files\Python36\lib\runpy.py", line 193, in _run_module_as_mai
n
"main", mod_spec)
File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\DeepConvSep-master\DeepConvSep-master\examples
\dsd100\compute_features.py", line 21, in
import transform
ModuleNotFoundError: No module named 'transform'

The DSD model, where is it from?

hello, and thanks for posting this project, facinating!

Can you share where the DSD100 model is comming from? Who (or what team) was the author of that model? In this project, it is imply a link to a google drive download.

Thanks!

Trying to get a version of DeepConvSep in Python3

Trying to make DeepConvSep work for Python3, on my mac, seemed like a pretty simple task at the beginning. However, I reached an impasse at the point described below:

When I run the program with this command...

python3 separate_dsd.py -i ./../../Ricotti\ \&\ Alburquerque\ -\ Dont\ You\ Believe\ Me.mp3 -o ./ -m ./../../model1.pkl

...I get the error NameError: name 'file' is not defined. file has been replaced with open within Python3.

Then I changed my code to:

def load_model(filename):
    with open(filename, 'rb') as f:
        return pickle.load(f)

However, I got the error:

Traceback (most recent call last):
  File "separate_dsd.py", line 336, in <module>
    main(sys.argv[1:])
  File "separate_dsd.py", line 333, in main
    train_auto(inputfile,outdir,model,0.3,30,25,32,513)
  File "separate_dsd.py", line 250, in train_auto
    params=load_model(model)
  File "separate_dsd.py", line 19, in load_model
    params=pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 2: ordinal not in range(128)

Is there something up with your pickler? Can you please take a look at this? I want to help make a Python3 version of this code. I would be glad to help you along with this task.

Cannot convert model_dsd_fft_1024.pkl Model from PKL to JSON

I'm trying to convert above PKL file to JSON but gives error before converting. So as to import KERAS Model in MATLAB.

using this code in Python:

Convert a pkl file into json file
'''
import sys
import os
import pickle as pkl
import json

def convert_dict_to_json(file_path):
with open(file_path, 'rb') as fpkl, open('%s.json' % file_path, 'w') as fjson:
data = pkl.load(fpkl)
json.dump(data, fjson, ensure_ascii=False, sort_keys=True, indent=4)

And the error is:

File "C:\Python27\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([[[[ 1.4630563e+00, 1.2371855e+00, 9.0325326e-01, ...,
2.0356092e-03, -9.0812740e-04, -8.2676094e-03]]],

Can You Provide converted .JSON file.

OSError: [Errno 22] using examples/dsd100/trainCNN.py (Win 10 64bit)

When I run trainCNN.py, with the following "db" path:
db = "D:\\DSD100\\"
I get this error:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
MemoryError
Traceback (most recent call last):
File "C:/Users/gabri/Desktop/Tesi Segnali Audio/PyCharmTestingArea/CNNAdapted/examples/dsd100/trainCNN.py", line 332, in
mult_factor_out=scale_factor)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 171, in init
self.updatePath(self.path_transform_in,self.path_transform_out)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 639, in updatePath
self.initBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 676, in initBatches
self.loadBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 255, in loadBatches
self.genBatches()
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 294, in genBatches
xall = parmap(self.loadFile, list(range(self.findex+1,self.nindex)),nprocs=self.nprocs)
File "C:\Users\gabri\Desktop\Tesi Segnali Audio\PyCharmTestingArea\CNNAdapted\dataset.py", line 58, in parmap
p.start()
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Users\gabri\Anaconda3\envs\CNN3.6Tesi\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OSError: [Errno 22] Invalid argument

I'm using python 3.6

Versions of relevant libraries:
numpy (1.16.4)
theano (1.0.4+unknown)
lasagne 0.2.dev1
tqdm (4.32.1)
scipy (1.2.1)
m2w64-toolchain (5.3.0)
mkl (2019.4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.