Lip Sync - Neural Network Rhubarb Replication

This is a Python neural network replication of Rhubarb Lip Sync, designed to enable complex lip movement for real-time chatbots.

This project utilizes a simple neural network trained on pairs of spoken texts and Rhubarb Lip Sync outputs, approximating lip movements with around 75% accuracy. This level of accuracy is sufficient for generating a sense of realism in most applications.

Please note, if your application does not require real-time performance, you might want to consider using Rhubarb Lip Sync directly.

How to Use

Inference

For now, only inference is supported, as the training code is being heavily refactored. If for some reason you need to train your own model and can't wait, let me know. For inference, use the following command:

python .\inference.py --wav_file_name .\001.wav --model_name model_full_dataset_2layers.pth

Training

If you wish to train your own model, you can do so as well. The program looks for 41khz WAV files in the "wavs" directory, and texts generated by Rhubarb (The command-line program) in the "texts" directory. WAVs and TXTs should share the same filename ("001.wav" and "001.txt"). I had better luck not using the extended mouthshapes (except for 'X') so the training program is set to not include them, if you wish to do when training please set the OUTPUT_SIZE variable to 9. If you decide to use the extended mouthshape "X", please find/replace it with "G", or "I" if using the extended mouthshapes.

To-Do List

Convert from using .pth to using SafeTensors.
Add video example to README.md.

Current Status

The code is currently undergoing refactoring and users may encounter errors, particularly when attempting to train their own models. However, the provided model (model_full_dataset_2layers.pth) should be satisfactory for most purposes. It's been trained on over 80 GB of WAV files from a variety of sources, providing a comprehensive and versatile foundation for lip-syncing tasks.

License and Use

This code is available under the MIT license and is free for anyone to use without obligation. However, I would be delighted if you'd drop me a line to let me know if and how you're using it!

Contributions and Feedback

Please feel free to contribute to this project or provide feedback by opening an issue or pull request on GitHub. Your insights are greatly appreciated!

cryptowooser / lipsynch-nn Goto Github PK

lipsynch-nn's Introduction

Lip Sync - Neural Network Rhubarb Replication

How to Use

Inference

Training

To-Do List

Current Status

License and Use

Contributions and Feedback

lipsynch-nn's People

Contributors

Stargazers

Watchers

Forkers

lipsynch-nn's Issues

Multi language support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent