Git Product home page Git Product logo

srd-vc's Introduction

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

1. Dependencies

Install required python packages:

pip install -r requirements.txt

2. Quick Start

Download pre-trained model from TsinghuaCloud or BaiduNetDisk and put it into My_model/my_demo.

Download Speechsplit pre-trained model (pitch decoder 640000-P.ckpt and vocoder checkpoint_step001000000_ema.pth) from here.

Then cd My_model and modify paths in demo.py to your own paths.

Run python demo.py and you will get the converted audio .wav in /my_demo same like test_result.

You else can choose the conditions in demo.py.

3. Preparation, Training and Inference

Download the VCTK dataset.

Extract spectrogram and f0:make_spect_f0.py.

And modify it to your own path and divide the dataset, run data_split.py.

Generate training metadata: make_metadata.py.

Run the training scripts: main.py.

Generate testing metadata: make_test_metadata.py.

Run the inference scripts: inference.py

4. Evaluation

You may refer to the following: WER.py, mcd.py, f0_pcc.py, draw_f0_distributions.py, draw_speaker_embedding.py

5. Acknowledgement and References

This work is supported by National Natural Science Foundation of China (NSFC) (62076144), National Social Science Foundation of China (NSSF) (13&ZD189) and Shenzhen Key Laboratory of next generation interactive media innovative technology (ZDSYS20210623092001004).

Our work mainly inspired by:

(1) SpeechSplit:

K. Qian, Y. Zhang, S. Chang, M. Hasegawa-Johnson, and D. Cox, “Unsupervised speech decomposition via triple information bottleneck,” in International Conference on Machine Learning. PMLR, 2020, pp. 7836–7846.

(2) VQMIVC:

D. Wang, L. Deng, Y. T. Yeung, X. Chen, X. Liu, and H. Meng, “VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion,” in Interspeech, 2021, pp. 1344–1348.

(3) ClsVC:

H. Tang, X. Zhang, J. Wang, N. Cheng, and J. Xiao, “Clsvc: Learning speech representations with two different classification tasks.” Openreview, 2021, https://openreview.net/forum?id=xp2D-1PtLc5.

6. Citation

If you find our work useful in your research, please consider citing:

@inproceedings{yang22f_interspeech,
  author={SiCheng Yang and Methawee Tantrawenith and Haolin Zhuang and Zhiyong Wu and Aolan Sun and Jianzong Wang and Ning Cheng and Huaizhen Tang and Xintao Zhao and Jie Wang and Helen Meng},
  title={{Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2553--2557},
  doi={10.21437/Interspeech.2022-571}
}

Some results can be found here. Please feel free to contact us ([email protected]) with any question or concerns.

srd-vc's People

Contributors

youngseng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.