Git Product home page Git Product logo

heygenclone's Introduction

HeyGenClone

Welcome to HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow πŸ‡·πŸ‡Ί who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English πŸ‡¬πŸ‡§!

Installation

  • Clone this repo
  • Install requirements:
    pip install -r requirements.txt
    
  • In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
  • Download weights from drive, unzip downloaded file into weights folder
  • Install ffmpeg

Configurations (config.json)

Key Description Can modify
LANGUAGES_URL Url for getting available languages ❌
DET_TRESH Face detection treshtold [0.0:1.0] βœ…
DIST_TRESH Face embeddings distance treshtold [0.0:1.0] βœ…
DB_NAME Name of the database for data storage βœ…
HF_TOKEN Your HuggingFace token (see Installation) βœ…

Usage

At the root of the project there is a translate script that translates the video you set.

  • video_filename - the filename of your input video (.mp4)
  • output_language - the code of the language to be translated into
  • output_filename - the filename of output video (.mp4)
python translate.py video_filename output_language -o output_filename

I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Π‘urrently it works for videos with one person.

  • voice_filename - the filename of your speech (.wav)
  • video_filename - the filename of your input video (.mp4)
  • output_filename - the filename of output video (.mp4)
python speech_changer.py voice_filename video_filename -o output_filename

How it works

  1. Detecting scenes (PySceneDetect)
  2. Face detection (yolov8-face)
  3. Reidentification (deepface)
  4. Speech enhancement (MDXNet)
  5. Speakers transcriptions and diarization (whisperX)
  6. Text translation (googletrans)
  7. Voice cloning (TTS)
  8. Lip sync (lipsync)
  9. Face restoration (GFPGAN)
  10. [Need to fix] Search for talking faces, determining what this person is saying

Translation results

Note that this example was created without GFPGAN usage!

Destination language Source video Output video
πŸ‡·πŸ‡Ί (Russian) Watch the video Watch the video

To-Do List

  • Fully GPU support
  • Multithreading support (optimizations)
  • Detecting talking faces (improvement)

Other

  • Tested on macOS
  • ⚠️ The project is under development!

heygenclone's People

Contributors

brasd99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.