Git Product home page Git Product logo

facer's Introduction

FACER

Face related toolkit. This repo is still under construction to include more models.

Updates

  • [14/05/2023] Face attribute recognition model trained on CelebA is available, check it out here.
  • [04/05/2023] Face alignment model trained on IBUG300W, AFLW19, WFLW dataset is available, check it out here.
  • [27/04/2023] Face parsing model trained on CelebM dataset is available, check it out here.

Install

The easiest way to install it is using pip:

pip install git+https://github.com/FacePerceiver/facer.git@main

No extra setup needs, pretrained weights will be downloaded automatically.

If you have trouble install from source, you can try install from PyPI:

pip install pyfacer

the PyPI version is not guaranteed to be the latest version, but we will try to keep it up to date.

Face Detection

We simply wrap a retinaface detector for easy usage.

import facer

image = facer.hwc2bchw(facer.read_hwc('data/twogirls.jpg')).to(device=device)  # image: 1 x 3 x h x w

face_detector = facer.face_detector('retinaface/mobilenet', device=device)
with torch.inference_mode():
    faces = face_detector(image)

facer.show_bchw(facer.draw_bchw(image, faces))

Check this notebook for full example.

Please consider citing

@inproceedings{deng2020retinaface,
  title={Retinaface: Single-shot multi-level face localisation in the wild},
  author={Deng, Jiankang and Guo, Jia and Ververas, Evangelos and Kotsia, Irene and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5203--5212},
  year={2020}
}

Face Parsing

We wrap the FaRL models for face parsing.

import torch
import facer

device = 'cuda' if torch.cuda.is_available() else 'cpu'

image = facer.hwc2bchw(facer.read_hwc('data/twogirls.jpg')).to(device=device)  # image: 1 x 3 x h x w

face_detector = facer.face_detector('retinaface/mobilenet', device=device)
with torch.inference_mode():
    faces = face_detector(image)

face_parser = facer.face_parser('farl/lapa/448', device=device) # optional "farl/celebm/448"

with torch.inference_mode():
    faces = face_parser(image, faces)

seg_logits = faces['seg']['logits']
seg_probs = seg_logits.softmax(dim=1)  # nfaces x nclasses x h x w
n_classes = seg_probs.size(1)
vis_seg_probs = seg_probs.argmax(dim=1).float()/n_classes*255
vis_img = vis_seg_probs.sum(0, keepdim=True)
facer.show_bhw(vis_img)
facer.show_bchw(facer.draw_bchw(image, faces))

Check this notebook for full example.

Please consider citing

@inproceedings{zheng2022farl,
  title={General facial representation learning in a visual-linguistic manner},
  author={Zheng, Yinglin and Yang, Hao and Zhang, Ting and Bao, Jianmin and Chen, Dongdong and Huang, Yangyu and Yuan, Lu and Chen, Dong and Zeng, Ming and Wen, Fang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18697--18709},
  year={2022}
}

Face Alignment

We wrap the FaRL models for face alignment.

import torch
import cv2
from matplotlib import pyplot as plt

device = 'cuda' if torch.cuda.is_available() else 'cpu'

import facer
img_file = 'data/twogirls.jpg'
# image: 1 x 3 x h x w
image = facer.hwc2bchw(facer.read_hwc(img_file)).to(device=device)  

face_detector = facer.face_detector('retinaface/mobilenet', device=device)
with torch.inference_mode():
    faces = face_detector(image)

face_aligner = facer.face_aligner('farl/ibug300w/448', device=device) # optional: "farl/wflw/448", "farl/aflw19/448"

with torch.inference_mode():
    faces = face_aligner(image, faces)

img = cv2.imread(img_file)[..., ::-1]
vis_img = img.copy()
for pts in faces['alignment']:
    vis_img = facer.draw_landmarks(vis_img, None, pts.cpu().numpy())
plt.imshow(vis_img)

Check this notebook for full example.

Please consider citing

@inproceedings{zheng2022farl,
  title={General facial representation learning in a visual-linguistic manner},
  author={Zheng, Yinglin and Yang, Hao and Zhang, Ting and Bao, Jianmin and Chen, Dongdong and Huang, Yangyu and Yuan, Lu and Chen, Dong and Zeng, Ming and Wen, Fang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18697--18709},
  year={2022}
}

Face Attribute Recognition

We wrap the FaRL models for face attribute recognition, the model achieves 92.06% accuracy on CelebA dataset.

import sys
import torch
import facer

device = "cuda" if torch.cuda.is_available() else "cpu"

# image: 1 x 3 x h x w
image = facer.hwc2bchw(facer.read_hwc("data/girl.jpg")).to(device=device)

face_detector = facer.face_detector("retinaface/mobilenet", device=device)
with torch.inference_mode():
    faces = face_detector(image)

face_attr = facer.face_attr("farl/celeba/224", device=device)
with torch.inference_mode():
    faces = face_attr(image, faces)

labels = face_attr.labels
face1_attrs = faces["attrs"][0] # get the first face's attributes

print(labels)

for prob, label in zip(face1_attrs, labels):
    if prob > 0.5:
        print(label, prob.item())

Check this notebook for full example.

Please consider citing

@inproceedings{zheng2022farl,
  title={General facial representation learning in a visual-linguistic manner},
  author={Zheng, Yinglin and Yang, Hao and Zhang, Ting and Bao, Jianmin and Chen, Dongdong and Huang, Yangyu and Yuan, Lu and Chen, Dong and Zeng, Ming and Wen, Fang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18697--18709},
  year={2022}
}

facer's People

Contributors

haya2333 avatar ttayu avatar yang-h avatar yinglinzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facer's Issues

Questions about what labels mean

Can you tell me the full meanings of these labels.
'background', 'neck', 'face', 'cloth', 'rr', 'lr', 'rb', 'lb', 're', 'le', 'nose', 'imouth', 'llip', 'ulip', 'hair', 'eyeg', 'hat', 'earr', 'neck_l'

Abbreviations like rr give me a bit of trouble.

gpu memory and latency time

During the model warm-up, especially in the initial steps, there is significant fluctuation in GPU memory, and the process takes a considerable amount of time, reaching around twenty seconds. The GPU memory and the latency time is not that stable. Is there any suggestions please? thx.

A bug happened when I use face parser to calculate loss

I created a loss which uses the detector and parser to get the face segment. The loss function is used as a supervisory signal for the loop optimization generation process. The loss is this:

class SegLoss(nn.Module):
    def __init__(self, device):
        super(SegLoss, self).__init__()
        self.face_detector = facer.face_detector('retinaface/mobilenet', device=device)
        self.face_parser = facer.face_parser('farl/lapa/448', device=device)

    def forward(self, x: torch.Tensor, segments: torch.Tensor):
        # image = np.zeros(x.shape[1:]).astype(np.uint8)
        # image = np.ascontiguousarray(np.transpose(image, (1, 2, 0)))
        # save_path = os.path.join("/home/ssd2/ldx/workplace/GANInverter-dev/test_edit/e4e/edit1/kp183072", f"{i}.png")

        x = x.clone()
        x = (x + 1) / 2
        x = x.clamp(0., 1.)
        x = (x * 255).type(torch.uint8)

        faces = self.face_detector(x)
        if not faces:
            return 0
        faces = self.face_parser(x, faces)
        seg_probs = faces['seg']['logits'].softmax(dim=1)[0]

        loss = F.mse_loss(seg_probs, segments)
        return loss

But when the loop reaches the third step, the debug happened:

  File "/home/ssd2/priv/miniconda3/envs/inversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/liudongxv/workplace/GANInverter-dev/criteria/seg_loss.py", line 34, in forward
    faces = self.face_parser(x, faces)
  File "/home/ssd2/priv/miniconda3/envs/inversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ssd2/priv/miniconda3/envs/inversion/lib/python3.9/site-packages/facer/face_parsing/farl.py", line 85, in forward
    w_seg_logits, _ = self.net(w_images)  # (b*n) x c x h x w
  File "/home/ssd2/priv/miniconda3/envs/inversion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 3)

I tried to step into the code, but it seems about the jit so I can't know what caused the index so big. Do you have any infomation?

Artifacts

Hi, the segmentation seems better than commonly used Bisenet, espectially it is not limited to crop - which is great. But I see artifacs (wide vertical line) on some images like this:
out

The problem is that I inferenced on video which I cannot share and there those artifacts are quite frequent:

image

Is there any way to deal with them?

My code:

import sys
import torch
from PIL import Image
import numpy as np
#sys.path.append('..')

device = 'cuda' if torch.cuda.is_available() else 'cpu' 

import facer

image = facer.hwc2bchw(facer.read_hwc('girls.jpg')).to(device=device) # image: 1 x 3 x h x w
face_detector = facer.face_detector('retinaface/mobilenet', device=device)
faces = face_detector(image)


face_parser = facer.face_parser('farl/lapa/448', device=device)

with torch.inference_mode():
    faces = face_parser(image, faces)
    
seg_logits = faces['seg']['logits']
seg_probs = seg_logits.softmax(dim=1)  # nfaces x nclasses x h x w
print(seg_probs.shape)



from facer.util import bchw2hwc

out = facer.draw_bchw(image, faces)
print(out.shape)

image = bchw2hwc(out)

if image.dtype != torch.uint8:
        image = image.to(torch.uint8)
if image.size(2) == 1:
    image = image.repeat(1, 1, 3)
pimage = Image.fromarray(image.cpu().numpy())

pimage.save('out.png')

Downloads about the model

Hello, your program turned out very well. But there is no usable model for the Celebm database to run, can you provide one?

Hacky attempt at integrating this with FaRL LAION checkpoints

Hi there! First of all thank you for the work on this awesome project.

I'm looking for an easy way to run the FaRL LAION checkpoints on a directory of images. The code structure in the main FaRL repo is not user friendly for inference uses so I tried to get it to work in here. However, since the LAION trained checkpoints are not JIT compiled I had to manually copy-paste the torch modules over and integrate this with the old inference code you had written in this repo.

My implementation is likely wrong because it does not output correct looking predictions. Maybe you can tell me what I'm doing wrong :) Or we can brainstorm a better solution to getting this repo working with the FaRL LAION checkpoints (maybe JIT compile them?). It could be very valuable for the research community!

https://gist.github.com/evancasey/eea749d7186e92670fca728ddb384212

Sample output:
output_face

output

lazy wrapper should be called at most once

when run face_parser with multi-threading, get error

  File "/root/.local/lib/python3.10/site-packages/facer/face_parsing/farl.py", line 79, in forward
[2024-05-14 21:41:31,730][mif-ml09.marz.vfx][__main__:_segment:130][INFO] - process track 5
    grid = setting['get_grid_fn'](matrix=matrix, orig_shape=(h, w))
  File "/root/.local/lib/python3.10/site-packages/facer/transform.py", line 353, in make_tanh_warp_grid
    return _forge_grid(
  File "/root/.local/lib/python3.10/site-packages/facer/transform.py", line 225, in _forge_grid
    out_xxyy: torch.Tensor = fn(in_xxyy)  # (h x w) x 2
  File "/root/.local/lib/python3.10/site-packages/facer/transform.py", line 279, in inverted_tanh_warp_transform
    inv_matrix = torch.linalg.inv(matrix)  # b x 3 x 3
RuntimeError: lazy wrapper should be called at most once

Warning triggered whenever I'm using the forward pass the second time

Hi,

I'm getting this warning when i'm calling to farward function of ---- the second time

"""
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194: UserWarning: operator() profile_node %385 : int[] = prim::profile_ivalue(%383)
does not have profile information (Triggered internally at ../torch/csrc/jit/codegen/cuda/graph_fuser.cpp:105.)
return forward_call(*input, **kwargs)
"""

Here is the code i'm using

    # INIT
    self.face_detector = facer.face_detector('retinaface/mobilenet', device=device)
    conf_name = 'celebm/448'
    model_path = 'https://github.com/FacePerceiver/facer/releases/download/models-v1/face_parsing.farl.celebm.main_ema_181500_jit.pt'
    self.face_parser_for_source = FaRLFaceParser(conf_name, model_path=model_path, device=device).to(device)        
    
    # HERE IS THE CODE THAT TRIGER THE WARNING THE SECOND TIME I'M CALL THE FUNC
     with torch.inference_mode():
                faces = self.face_detector(image)
                faces = self.face_parser_for_source(image, faces)

BTW
If I'm init the function before I'm calling it the second time there are no warnings, though it's twice the running time )=

Thanks for your help

Error when loading jit model

when run face_parser = facer.face_parser('farl/lapa/448', device='cuda')
Error:

  File "facer/facer/util.py", line 79, in download_jit
    return torch.jit.load(cached_file, map_location=map_location)
  File "anaconda3/lib/python3.6/site-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
RuntimeError: 
Unknown type name 'NoneType':
Serialized   File "code/__torch__/farl/network/transformers.py", line 7
  image_std : Tensor
  training : bool
  _is_full_backward_hook : NoneType
                           ~~~~~~~~ <--- HERE
  num_extra_tokens : int
  input_resolution : int

Envs:
torch=1.7.1
torchvision=0.8.2
cuda=11.0
RTX 3090

Excess output of the FaRL model wrapper

Here is the part of the code from the inference of the FaRL model wrapper:

seg_logits = F.grid_sample(
w_seg_logits, inv_grid, mode='bilinear', align_corners=False)
data['seg'] = {'logits': seg_logits,
'label_names': setting['label_names']}
return data

Why do you propose to return in every inference such information, as label_names, while this can be obtained, for example, with model.label_names field outside of the inference? The same with the data itself โ€” you can obtain the data later after inference from the same variable you have passed into inference. Why not return only the processed output of the model, seg_logits?

Could you please explain the motivation behind it?

Stable diffusion webui extension

Hi, thank you for your nice work.
I'm making stable diffusion webui extension by facer project. (link)

Using stable diffusion and controllnet, masking is really important.
A lot of people usually use SAM(Segment Anything Model), But SAM usually detect wrong result.
So I made face portrait mask extension using facer.

BTW, when I load face_aligner model, some error was occured. (please see this code)
It caused by io.ByteIO function. unfortunately I don't know why...
So I re-implemented face_aligner function without io.ByteIO.
I hope this problem will be fixed.

Best regards.

Some questions about faceparsing

The current process of faceparsing is face detection + parsing. If I have a picture with a face cropped, can I use faceparsing directly without facer's face detection?

Keyerror

Hello and thank you for your excellent work. I' m facing a keyError when I run the face Attributes code for not just an image but a dataset. More specifically it runs flawless for some hundreds of times but then at the point in code:

with torch.inference_mode():
faces = face_attr(image, faces)

It raises a Keyerror: 'image_ids' .

I observed that at this point the data keysis equal to ([ ]) wheere before it was equal to (['rects, 'points', 'scores', image_ids']).

Any idea what it might be wrong?

Will the Ear output be available

Hi!
Thank you for your work.
Will the ear segmentation be available after some time?
As I understand, ear segmentation is needed for the Celeb-A-HQ-Mask dataset, but I can not find it in the current "Facer" implementation?

Best regards,
Vadims.

Is the repo going to be updated?

Hey all.

This is a very performant repository.

I wanted masks in the style of CelebA (with neck, clothing, hats etc). The FaRL repo already has those models.

Are they going to be updated anytime soon, or can we consider this repo to be abandoned?

Best,
Wamiq

Instructions on coverting to ONNX

How does one convert a specific module like face parsing to ONNX?
Not everyone has a CUDA-enabled Nvidia GPU, so ONNX may help. The network code includes some operations like download_jit and other utilities that don't make exporting to ONNX easy.

The predicted semantic mask has a slight offset.

When I test the new published pretrained model, I found that semantic segmentation results and input images do not match perfectly.
I looked through the'tanh_warp'-related processing and found that the coordinates of'grid_sample' might have a slight problem. In the previous procedure, the'align_corners' option is disabled by default, so the sampling coordinates should be (0, n) instead of (0, n-1). So I changed the code a little and found that the effect was significantly improved.

facer.facer.transform.py line 218 & 219
yy = yy.unsqueeze(0).broadcast_to(batch_size, h, w).to(device)
xx = xx.unsqueeze(0).broadcast_to(batch_size, h, w).to(device)

change to

yy = yy.unsqueeze(0).broadcast_to(batch_size, h, w).to(device)+0.5
xx = xx.unsqueeze(0).broadcast_to(batch_size, h, w).to(device)+0.5

Are the above changes reasonable?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.