johngettings / lihq Goto Github PK

Long-Inference, High Quality Synthetic Speaker (AI avatar/ AI presenter)

Python 100.00%

artificial-intelligence avatar avatar-generator deep-learning text-to-speech video talking-head

lihq's Introduction

LIHQ

Long-Inference, High Quality Synthetic Speaker

LIHQ is not a new architecture, it is an application that utilizes several open source deep learning models to generate an artifical speaker of your own design. It was built to be run in google colab to take advantage of free/ cheap GPU with basically zero setup and designed to be as user friendly as I could make it. It was not created for deepfake purposes but you can give it a try if you want (See LIHQ Examples colab and secondary Demo Video). You will find that some voices or face images will not give your desired output and will take a little trial and error to get right. LIHQ really works best with a stylegan2 face and a simple narrator voice. Creating a simple speaker video with a styleGAN2 face and a simple TorToiSe voice is pretty straightforward and often produces good output.

Update:

Looks like Bark (https://github.com/suno-ai/bark) is now the open source SOTA for speech generation. Try it out instead of TorToiSe.

How it works

Steps you need to take:

Run Setup
Create/ Upload Audio (TorToiSe, VITS)
Upload Speaker Face Image (StyleGAN2 preferred)
(Optional) Add reference video
(Optional) Replace Background

Steps the program takes:

Face/ Head Motion Transfer (FOMM)
Mouth Motion Generation (Wav2Lip)
Upscale and Restoration (GFPGAN)
Second Run Through (FOMM and GFPGAN)
(Optional) Frame Interpolation (QVI) (Noticable improvement but long inference)
(Optional) Background Matting (MODNet)

Pick out an image of a face that is forward-facing with a closed mouth (https://github.com/NVlabs/stylegan2) and upload or create audio of anyone you want using TorToiSe (built into the LIHQ colab) https://github.com/neonbjb/tortoise-tts

LIHQ will first transfer head and eye movement from my default reference video to your face image using a First Order Motion Model. Wav2Lip will then create mouth movement from your audio and paste it onto the FOMM output. Since the output is a very low resolution (256x256) we need to run through a face restoration & super resolution model. Repeating this process a second time will make the video look even better. And if you want it to be the highest quality, you can add frame interpolation at the end to increase the fps.

Demo Video

Above is the primary LIHQ demo video, demonstrating the software being used as intended. Deepfakes are possible in LIHQ as well but they take a bit more work. Check out some examples in the Deepfake Example Video.

Colabs

The google colabs have a lot of information, tips and tricks that I will not be putting in the README. LIHQ has a cell for every feature you might want or need with an explanation of everything. It is lengthy and a little cluttered but I recommend reading through everything. LIHQ Examples has four different examples, trimmed down to only what you need for each example.

LIHQ

LIHQ Examples

Possible Future Work

Create more reference videos. Lip movement seems to work best at ref_vid_offset = 0 and I don't think this is coincidence. I think it has to do with the FOMM movement transfer. I may create more reference videos of shorter length, and with varying emotions.
Make wav2LIP, FOMM optional. If you want to use reference video with correct lip movement, or a speaker video that already contains target speaker.
Add randomizer to ref vid offset
Expand post-processing capabilities
Revise for local use. I'm not sure what it would take. It's probably 95% there, I just simply haven't tried outside of colab.

lihq's People

Contributors

Stargazers

Watchers

lihq's Issues

Windows setup & local execution

For changes related to setup and execution on Windows:

I have made some local modifications and would like to submit a PR. However, I lack write access. Could you assist with this, @johnGettings ?

No module named 'einops'

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 10>()
8 import IPython
9
---> 10 from tortoise.api import TextToSpeech
11 from tortoise.utils.audio import load_audio, load_voice, load_voices
12

3 frames
/content/LIHQ/tortoise_tts/tortoise/models/xtransformers.py in
6 import torch
7 import torch.nn.functional as F
----> 8 from einops import rearrange, repeat
9 from torch import nn, einsum
10

ModuleNotFoundError: No module named 'einops'

I am facing this issue when I try to install it in Google Colab, In local I was not even able to install requirements.txt.

Which Python version have you used?

ModuleNotFoundError: No module named 'LIHQ.QVI.demo'

I get this error every time I try to run the collab

Wav2Lip could not generate at least one of your videos.

/content/LIHQ
Initializing
Moviepy - Building video ./first_order_model/input-ref-vid/Sister/Sister.mp4.
Moviepy - Writing video ./first_order_model/input-ref-vid/Sister/Sister.mp4

Moviepy - Done !
Moviepy - video ready ./first_order_model/input-ref-vid/Sister/Sister.mp4
Moviepy - Building video ./first_order_model/input-ref-vid/Sister2/Sister2.mp4.
Moviepy - Writing video ./first_order_model/input-ref-vid/Sister2/Sister2.mp4

Moviepy - Done !
Moviepy - video ready ./first_order_model/input-ref-vid/Sister2/Sister2.mp4
Running First Order Motion Model
100%
176/176 [00:07<00:00, 24.59it/s]
100%
104/104 [00:04<00:00, 24.54it/s]
FOMM Success!
Running Wav2Lip
Wav2Lip could not generate at least one of your videos.
Possibly bad audio, unrecognizable mouth, bad file paths, out of memory.
Run below command in a separate cell to get full traceback.
###########################################################
###########################################################
import os
adir = 'Folder1' # The audio folder that failed. See Wav2Lip output folder to see whats missing.

vid_path = f'{os.getcwd()}/output/FOMM/Round1/{adir}.mp4'
aud_path = f'{os.getcwd()}/input/audio/{adir}/{adir}.wav'
%cd /content/LIHQ/Wav2Lip
!python inference.py --checkpoint_path ./checkpoints/wav2lip.pth --face {vid_path} --audio {aud_path} --outfile /content/test.mp4 --pads 0 20 0 0

An exception has occurred, use %tb to see the full traceback.

SystemExit

Running into error in colab, when running main script

Hi I am receiving ths error when trying to run the main script

/content/LIHQ
Initializing
Moviepy - Building video ./first_order_model/input-ref-vid/Folder1/Folder1.mp4.
Moviepy - Writing video ./first_order_model/input-ref-vid/Folder1/Folder1.mp4

Moviepy - Done !
Moviepy - video ready ./first_order_model/input-ref-vid/Folder1/Folder1.mp4
Running First Order Motion Model
100%
569/569 [00:21<00:00, 25.00it/s]
FOMM Success!
Running Wav2Lip
Wav2Lip could not generate at least one of your videos.
Possibly bad audio, unrecognizable mouth, bad file paths, out of memory.
Run below command in a separate cell to get full traceback.
###########################################################
###########################################################
import os
adir = 'Folder1' # The audio folder that failed. See Wav2Lip output folder to see whats missing.

vid_path = f'{os.getcwd()}/output/FOMM/Round1/{adir}.mp4'
aud_path = f'{os.getcwd()}/input/audio/{adir}/{adir}.wav'
%cd /content/LIHQ/Wav2Lip
!python inference.py --checkpoint_path ./checkpoints/wav2lip.pth --face {vid_path} --audio {aud_path} --outfile /content/test.mp4  --pads 0 20 0 0


An exception has occurred, use %tb to see the full traceback.

SystemExit

When tryin to run the command as suggested in the comment I get this:

The command I ran:

%cd /content/LIHQ
import os
adir = 'Folder1' # The audio folder that failed. See Wav2Lip output folder to see whats missing.

vid_path = f'{os.getcwd()}/output/FOMM/Round1/{adir}.mp4'
aud_path = f'{os.getcwd()}/input/audio/{adir}/{adir}.wav'
%cd /content/LIHQ/Wav2Lip
!python inference.py --checkpoint_path ./checkpoints/wav2lip.pth --face {vid_path} --audio {aud_path} --outfile /content/test.mp4  --pads 0 20 0 0

The error it shows:

/content/LIHQ
/content/LIHQ/Wav2Lip
Using cuda for inference.
Reading video frames...
Number of frames available for inference: 569
Traceback (most recent call last):
  File "/content/LIHQ/Wav2Lip/inference.py", line 280, in <module>
    main()
  File "/content/LIHQ/Wav2Lip/inference.py", line 225, in main
    mel = audio.melspectrogram(wav)
  File "/content/LIHQ/Wav2Lip/audio.py", line 47, in melspectrogram
    S = _amp_to_db(_linear_to_mel(np.abs(D))) - hp.ref_level_db
  File "/content/LIHQ/Wav2Lip/audio.py", line 95, in _linear_to_mel
    _mel_basis = _build_mel_basis()
  File "/content/LIHQ/Wav2Lip/audio.py", line 100, in _build_mel_basis
    return librosa.filters.mel(hp.sample_rate, hp.n_fft, n_mels=hp.num_mels,
TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given

Could you help me understand what the issue is and how to fix it ?

error: unrecognized arguments: -f

Hello every one,

Iam trying to use the code, but it shows an error. The error is in the line called "Running main script", after run the line code, it shows this result:

`/content/tortoise-tts/LIHQ
usage: colab_kernel_launcher.py [-h] config
colab_kernel_launcher.py: error: unrecognized arguments: -f
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2`

Can you help me please? Thanks in advance.

Replicate.com

It would be awesome to see this on replicate.com the only one on there at present is the makeItTalk 256x256 model and its not great.

Import Error Iterable from Collections

Hi,

Whenever trying to run the main script either from the main LIHQ ror using the first example, I get the following import error:
Searching for answer, it would appear that Iterables has been removed from collections in Python.

/content/LIHQ
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-14-ee2446352053>](https://localhost:8080/#) in <cell line: 2>()
      1 get_ipython().run_line_magic('cd', '/content/LIHQ')
----> 2 from runLIHQ import run
      3 
      4 run(face='/content/LIHQ/input/face/examples/2372448.png')

2 frames
[/content/LIHQ/QVI/utils/config.py](https://localhost:8080/#) in <module>
      2 import sys
      3 from argparse import ArgumentParser
----> 4 from collections import Iterable
      5 from importlib import import_module
      6 from easydict import EasyDict as edict

ImportError: cannot import name 'Iterable' from 'collections' (/usr/lib/python3.10/collections/__init__.py)

To fix this, apparently you can modify the code to read:

from collections.abc import Iterable

However I am unsure how to do this is a pre written colab script?

Generation Time Limit

@johnGettings Hi thanks for sharing this wonderful code base had a few queries which have been listed below

The whole pipeline can be used to generate videos of about <1 min. cant we extend to more than 1 min? is so what is the constraint which we can face
Instead of using text 2 audio can we pass in the direct audio file to the module then one process shall be reduced right

Thanks in advance

Any examples on real human picture rather than gan generated one?

it looks like gan generated picture are obviously faked.

johngettings / lihq Goto Github PK

lihq's Introduction

LIHQ

Long-Inference, High Quality Synthetic Speaker

Update:

How it works

Steps you need to take:

Steps the program takes:

Demo Video

Colabs

Possible Future Work

lihq's People

Contributors

Stargazers

Watchers

Forkers

lihq's Issues

Recommend Projects

Recommend Topics

Recommend Org