- Hwang John
- Jeong Seung Woo
- Nam Yoon Sahng
- Choi Ji Seok
- Jang Woo Jong
- Lee Yu Jin
- Jang Ji Chang
- librosa==0.6.2
- numpy==1.14.3
- pyworld==0.2.4
- tensorflow==1.9.0
- python==3.6.5
GOAL: Transfer vocal's singing style
Whole architecture for changing singing style transfer is shown below [1]
If you have already seperate Song, then skip these step.
Before training, you need to prepare datasets in "./data".
we used CCMixter(50 songs : each consists mix, inst, vocal)
- Train
$ python CCMixter_process.py
$ python Training.py
- Test ("Singing-Voice-Separation/sample.wav")
$ python Test.py
- Train
Preparing two Datasets and each must be located in "./data/train/A", "./data/train/B"
$ python train.py
- Test
You can set direction and test_directory in config.py
$ python test.py
First We downloaded songs from "Youtube" by using pytube library.(This might be illegal)
-
For the vocal data we downloaded Park Hyo Shin and BolBBalGan Sachungi's songs. (about 15 songs each)
-
Since our main model was used to convert voices, we tried "Yu Inna" and "Son Suk Hee"'s voice data.
For the separation of Singing Voice & Accompaniment we used deep U-net model.[2]
As data for seperation you need separated data like "iKala, MedleyDB, DSD100". We used ccmixter data for training U-Net.
Filnally we removed silence for the bigger receptive field on voices.
Data were downsampled to 16 kHz. For the separation normalized magnitude spectrogram were used and for the transfer 24 Mel-cepstral coefficients (MCEPs) were used.[2][3]
Our separation model is shown below.
Since the singers we want to change don't sing same songs(Unpaired Data) we used Cycle-Gan for the transferring singing style.[1] Main model of Cycle-Gan is from "Cycle Gan Voice Converter".[3]
Due to the differeces between voice converting and transferring singing style we expanded frames to 512. Which frames were 128 (about 0.5sec) from "Cycle Gan Voice Converter".
Also we modified adversarial Loss functions, Discriminator and added hyper-parameters to adjust BEGAN to cycle-gan for the stablizing training process. [1][4][5]
where
More powerful separation for vocal separation.
Embed more information such as lyrics and adjust Tacotron. ex) Tacotron GAN "https://github.com/tmulc18/S2SCycleGAN"
[1] Cheng-Wei Wu, et al. Singing Style Transfer Using Cycle-Consistent Boundary Equilibrium Generative Adversarial Networks. 2018
paper: https://arxiv.org/abs/1807.02254
[2] Andreas Jansson, et al. SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS. 2017.
paper: https://ismir2017.smcnus.org/wp-content/uploads/2017/10/171_Paper.pdf
code: https://github.com/Jeongseungwoo/Singing-Voice-Separation
[3] Takuhiro Kaneko, Hirokazu Kameoka. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. 2017.
paper:https://arxiv.org/abs/1711.11293
code: https://github.com/Jeongseungwoo/Singing-Style-transfer
[4] David Berthelot, et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks. 2017.
paper:https://arxiv.org/pdf/1703.10717.pdf
code: https://github.com/carpedm20/BEGAN-tensorflow
[5] CycleBE-VocalConverter code: https://github.com/NamSahng/SingingStyleTransfer