Git Product home page Git Product logo

realtime-yukarin's Introduction

Realtime Yukarin: an application for real-time voice conversion

Realtime Yukarin is the application for real-time voice conversion with a single command. This application needs trained deep learning models and a GPU computer. The source code is an OSS and MIT license. So you can modify this code, or use it for your applications whether commercial or non-commercial.

Japanese README

Supported environment

  • Windows
  • GeForce GTX 1060
  • 6GB GPU memory
  • Intel Core i7-7700 CPU @ 3.60GHz
  • Python 3.6

Preparation

Installation required libraries

pip install -r requirements.txt

Prepare trained models

You need two trained models, a first stage model responsible for voice conversion and a second stage model for enhancing the quality of the converted results. You can create a first stage model with Yukarin and a second stage model with Become Yukarin.

Also, for voice pitch conversion, you need a file of frequency statistics at Yukarin.

Here, each filename is as follows:

Content Filename
Frequency statistics for input voice ./sample/input_statistics.npy
Frequency statistics for target voice ./sample/target_statistics.npy
First stage model from Yukarin ./sample/model_stage1/predictor.npz
First stage's config file ./sample/model_stage1/config.json
Second stage model from Become Yukarin ./sample/model_stage2/predictor.npz
Second stage's config file ./sample/model_stage2/config.json

Verification

You can verify prepared files with executing ./check.py. The following example converts 5 seconds voice data of input.wav, and save to output.wav.

python check.py \
    --input_path 'input.wav' \
    --input_time_length 5 \
    --output_path 'output.wav' \
    --input_statistics_path './sample/input_statistics.npy' \
    --target_statistics_path './sample/target_statistics.npy' \
    --stage1_model_path './sample/model_stage1/predictor.npz' \
    --stage1_config_path './sample/model_stage1/config.json' \
    --stage2_model_path './sample/model_stage2/predictor.npz' \
    --stage2_config_path './sample/model_stage2/config.json' \

If you have problems, you can ask questions on Github Issue.

Run

To perform real-time voice conversion, create a config file config.yaml and run ./run.py.

python run.py ./config.yaml

Description of config file

# Name of input sound device. Partial Match. Details are below.
input_device_name: str

# Name of output sound device. Partial Match. Details are below.
output_device_name: str

# Input sampling rate
input_rate: int

# Output sampling rate
output_rate: int

# frame_period for Acoustic feature
frame_period: int

# Length of voice to convert at one time (seconds).
# If it is too long, delay will increase, and if it is too short, processing will not catch up.
buffer_time: float

# Method to calclate the fundamental frequency. world ofr crepe.
# CREPE needs additional libraries, details are requirements.txt
extract_f0_mode: world

# Length of voice to be synthesized at one time (number of samples)
vocoder_buffer_size: int

# Amplitude scaling for input.
# When it is more than 1, the amplitude becomes large, and when it is less than 1, the amplitude becomes small.
input_scale: float

# Amplitude scaling for output.
# When it is more than 1, the amplitude becomes large, and when it is less than 1, the amplitude becomes small.
output_scale: float

# Silence threshold for input (db).
# The smaller the value, the easier it is to silence.
input_silent_threshold: float

# Silence threshold for output (db).
# The smaller the value, the easier it is to silence.
output_silent_threshold: float

# Overlap for encoding (seconds)
encode_extra_time: float

# Overlap for converting (seconds)
convert_extra_time: float

# Overlap for decoding (seconds)
decode_extra_time: float

# Path of frequency statistics file
input_statistics_path: str
target_statistics_path: str

# Path of trained model file
stage1_model_path: str
stage1_config_path: str
stage2_model_path: str
stage2_config_path: str

(preliminary knowledge) Name of sound device

In the example below, Logitech Speaker is the name of the sound device.

License

MIT License

realtime-yukarin's People

Contributors

hiroshiba avatar wakiyamap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

realtime-yukarin's Issues

Questions and documentation

  1. Which model does Yukarin uses for its training?
  2. Are there any target voice training document specifications?
  3. Would public voice datasets help with training?
  4. Does this project work with English datasets?
  5. Why is the example page's voice so "robotic"/"compressed"?

run.pyやcheck.pyの実行時にエラーが出てします。

check.pyやrun.pyを実行した際に
OSError: /usr/local/lib/python3.6/dist-packages/world4py/libworld.so: cannot open shared object file: No such file or directory
と出力され、動作できない状態となっています。
(ファイルを探してみたところ、上記の階層にちゃんと存在していました。
 ※os.path.exists(_WORLD_LIBRARY_PATH)でチェックし True が返ることを確認しました。)

realtime-yukarinが推奨環境がWindowsなのは存じておりますが、
become-yukarinでも同様に発生しているため、こちらに書かせていただきました。

解決方法などご存知でしたらご共有いただけると大変助かります。

OS:Ubuntu 18.04.5 LTS (Bionic Beaver)
Python 3.6.9

エラー全文

python3 check.py --input_path 'input.wav' --input_time_length 5 --output_path 'output.wav' --input_statistics_path './sample/input_statistics.npy' --target_statistics_path './sample/target_statistics.npy' --stage1_model_path './sample/model_stage1/predictor_260000.npz' --stage1_config_path './sample/model_stage1/config.json' --stage2_model_path './sample/model_stage2/predictor_8000.npz' --stage2_config_path './sample/model_stage2/config.json'
Traceback (most recent call last):
File "check.py", line 13, in
from realtime_voice_conversion.config import VocodeMode
File "/home/****/ドキュメント/IkeboMaster/realtime-yukarin/realtime_voice_conversion/init.py", line 1, in
from . import stream
File "/home/****/ドキュメント/IkeboMaster/realtime-yukarin/realtime_voice_conversion/stream/init.py", line 2, in
from .decode_stream import DecodeStream
File "/home/****/ドキュメント/IkeboMaster/realtime-yukarin/realtime_voice_conversion/stream/decode_stream.py", line 7, in
from ..yukarin_wrapper.vocoder import Vocoder
File "/home/****/ドキュメント/IkeboMaster/realtime-yukarin/realtime_voice_conversion/yukarin_wrapper/vocoder.py", line 5, in
from world4py.native import structures, apidefinitions, utils
File "/usr/local/lib/python3.6/dist-packages/world4py/native/init.py", line 6, in
from world4py.native import apis, tools, utils, structures
File "/usr/local/lib/python3.6/dist-packages/world4py/native/apis.py", line 6, in
from world4py.native import apidefinitions, structures, utils
File "/usr/local/lib/python3.6/dist-packages/world4py/native/apidefinitions.py", line 7, in
from world4py.native import structures, instance
File "/usr/local/lib/python3.6/dist-packages/world4py/native/instance.py", line 9, in
_WORLD = ctypes.cdll.LoadLibrary(_WORLD_LIBRARY_PATH)
File "/usr/lib/python3.6/ctypes/init.py", line 426, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.6/dist-packages/world4py/libworld.so: cannot open shared object file: No such file or directory

Can u introduce your thought of real-time?

Hi, I have a voice conversion model, I want to convert it to a realtime model.

  1. I am confused what do "start_time" and "extra_time" in your code mean.
  2. I want to record audio from microphone , process the audio data and play the processed audio at the meanwhile. How can I design the code?
    Thank u very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.