Git Product home page Git Product logo

vocal-style-transfer's Introduction

vocal-style-transfer

Members

Requirement

  • librosa==0.6.2
  • numpy==1.14.3
  • pyworld==0.2.4
  • tensorflow==1.9.0
  • python==3.6.5

Abstract

GOAL: Transfer vocal's singing style

Whole architecture for changing singing style transfer is shown below [1]

Usage

Training Separation Model

If you have already seperate Song, then skip these step.
Before training, you need to prepare datasets in "./data".
we used CCMixter(50 songs : each consists mix, inst, vocal)

  • Train
$ python CCMixter_process.py 
$ python Training.py 
  • Test ("Singing-Voice-Separation/sample.wav")
$ python Test.py

Training Style transfer Model

  • Train

Preparing two Datasets and each must be located in "./data/train/A", "./data/train/B"

$ python train.py
  • Test

You can set direction and test_directory in config.py

$ python test.py

Data

First We downloaded songs from "Youtube" by using pytube library.(This might be illegal)

  • For the vocal data we downloaded Park Hyo Shin and BolBBalGan Sachungi's songs. (about 15 songs each)

  • Since our main model was used to convert voices, we tried "Yu Inna" and "Son Suk Hee"'s voice data.

For the separation of Singing Voice & Accompaniment we used deep U-net model.[2]

As data for seperation you need separated data like "iKala, MedleyDB, DSD100". We used ccmixter data for training U-Net.

Filnally we removed silence for the bigger receptive field on voices.

Data were downsampled to 16 kHz. For the separation normalized magnitude spectrogram were used and for the transfer 24 Mel-cepstral coefficients (MCEPs) were used.[2][3]

Models

1. Deep U-Net for vocal separation

Our separation model is shown below.


2. Cycle Consistency - Boundary Equilibrium GAN

Since the singers we want to change don't sing same songs(Unpaired Data) we used Cycle-Gan for the transferring singing style.[1] Main model of Cycle-Gan is from "Cycle Gan Voice Converter".[3]

Due to the differeces between voice converting and transferring singing style we expanded frames to 512. Which frames were 128 (about 0.5sec) from "Cycle Gan Voice Converter".

Also we modified adversarial Loss functions, Discriminator and added hyper-parameters to adjust BEGAN to cycle-gan for the stablizing training process. [1][4][5]

2-1. Generator & Discriminator Architectures

2-2. Loss Function

where







Future Work

More powerful separation for vocal separation.

Embed more information such as lyrics and adjust Tacotron. ex) Tacotron GAN "https://github.com/tmulc18/S2SCycleGAN"

References

[1] Cheng-Wei Wu, et al. Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks. 2018
paper: https://arxiv.org/abs/1807.02254

[2] Andreas Jansson, et al. SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS. 2017.
paper: https://ismir2017.smcnus.org/wp-content/uploads/2017/10/171_Paper.pdf
code: https://github.com/Jeongseungwoo/Singing-Voice-Separation

[3] Takuhiro Kaneko, Hirokazu Kameoka. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. 2017.
paper:https://arxiv.org/abs/1711.11293
code: https://github.com/Jeongseungwoo/Singing-Style-transfer

[4] David Berthelot, et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks. 2017.
paper:https://arxiv.org/pdf/1703.10717.pdf
code: https://github.com/carpedm20/BEGAN-tensorflow

[5] CycleBE-VocalConverter code: https://github.com/NamSahng/SingingStyleTransfer

vocal-style-transfer's People

Contributors

hwangjohn avatar jeongseungwoo avatar namsahng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.