Git Product home page Git Product logo

lipsynch-nn's Introduction

Lip Sync - Neural Network Rhubarb Replication

This is a Python neural network replication of Rhubarb Lip Sync, designed to enable complex lip movement for real-time chatbots.

This project utilizes a simple neural network trained on pairs of spoken texts and Rhubarb Lip Sync outputs, approximating lip movements with around 75% accuracy. This level of accuracy is sufficient for generating a sense of realism in most applications.

Please note, if your application does not require real-time performance, you might want to consider using Rhubarb Lip Sync directly.

How to Use

Inference

For now, only inference is supported, as the training code is being heavily refactored. If for some reason you need to train your own model and can't wait, let me know. For inference, use the following command:

python .\inference.py --wav_file_name .\001.wav --model_name model_full_dataset_2layers.pth  

Training

If you wish to train your own model, you can do so as well. The program looks for 41khz WAV files in the "wavs" directory, and texts generated by Rhubarb (The command-line program) in the "texts" directory. WAVs and TXTs should share the same filename ("001.wav" and "001.txt"). I had better luck not using the extended mouthshapes (except for 'X') so the training program is set to not include them, if you wish to do when training please set the OUTPUT_SIZE variable to 9. If you decide to use the extended mouthshape "X", please find/replace it with "G", or "I" if using the extended mouthshapes.

To-Do List

  1. Convert from using .pth to using SafeTensors.
  2. Add video example to README.md.

Current Status

The code is currently undergoing refactoring and users may encounter errors, particularly when attempting to train their own models. However, the provided model (model_full_dataset_2layers.pth) should be satisfactory for most purposes. It's been trained on over 80 GB of WAV files from a variety of sources, providing a comprehensive and versatile foundation for lip-syncing tasks.

License and Use

This code is available under the MIT license and is free for anyone to use without obligation. However, I would be delighted if you'd drop me a line to let me know if and how you're using it!

Contributions and Feedback

Please feel free to contribute to this project or provide feedback by opening an issue or pull request on GitHub. Your insights are greatly appreciated!

lipsynch-nn's People

Contributors

cryptowooser avatar lhl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

akaedu2012 lhl

lipsynch-nn's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.