HeyGenClone

Welcome to HeyGenClone, an open-source analogue of the HeyGen system.

I am a developer from Moscow 🇷🇺 who devotes his free time to studying new technologies. The project is in an active development phase, but I hope it will help you achieve your goals!

Currently, translation support is enabled only from English 🇬🇧!

Installation

Clone this repo
Install requirements:
```
pip install -r requirements.txt
```
In config.json file change HF_TOKEN argument. It is your HuggingFace token. Visit speaker-diarization, segmentation and accept user conditions
Download weights from drive, unzip downloaded file into weights folder
Install ffmpeg

Configurations (config.json)

Key	Description	Can modify
LANGUAGES_URL	Url for getting available languages	❌
DET_TRESH	Face detection treshtold [0.0:1.0]	✅
DIST_TRESH	Face embeddings distance treshtold [0.0:1.0]	✅
DB_NAME	Name of the database for data storage	✅
HF_TOKEN	Your HuggingFace token (see Installation)	✅

Usage

At the root of the project there is a translate script that translates the video you set.

video_filename - the filename of your input video (.mp4)
output_language - the code of the language to be translated into
output_filename - the filename of output video (.mp4)

python translate.py video_filename output_language -o output_filename

I also added a script to overlay the voice on the video with lip sync, which allows you to create a video with a person pronouncing your speech. Сurrently it works for videos with one person.

voice_filename - the filename of your speech (.wav)
video_filename - the filename of your input video (.mp4)
output_filename - the filename of output video (.mp4)

python speech_changer.py voice_filename video_filename -o output_filename

How it works

Detecting scenes (PySceneDetect)
Face detection (yolov8-face)
Reidentification (deepface)
Speech enhancement (MDXNet)
Speakers transcriptions and diarization (whisperX)
Text translation (googletrans)
Voice cloning (TTS)
Lip sync (lipsync)
Face restoration (GFPGAN)
[Need to fix] Search for talking faces, determining what this person is saying

Translation results

Note that this example was created without GFPGAN usage!

Destination language	Source video	Output video
🇷🇺 (Russian)

To-Do List

Fully GPU support
Multithreading support (optimizations)
Detecting talking faces (improvement)

Other

Tested on macOS
⚠️ The project is under development!

skylord2 / heygenclone Goto Github PK

heygenclone's Introduction

HeyGenClone

Installation

Configurations (config.json)

Usage

How it works

Translation results

To-Do List

Other

heygenclone's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent