Git Product home page Git Product logo

farl's Introduction

FaRL for Facial Representation Learning

PWC PWC PWC PWC PWC

This repo hosts official implementation of our CVPR2022 paper "General Facial Representation Learning in a Visual-Linguistic Manner".

Updates

  • [04/05/2023] State-of-the-art face alignment and face parsing model is ready for easy usage in facer.
  • [21/06/2022] LAION-Face dataset was released.
  • [10/03/2022] FaRL was accepted by CVPR 2022 as Oral presentation.
  • [02/03/2022] facer was released. It is a face related toolkit build upon FaRL.

Introduction

FaRL offers powerful pre-training transformer backbones for face analysis tasks. Its pre-training combines both the image-text contrastive learning and the masked image modeling.

framework

After the pre-training, the image encoder can be utilized for various downstream face tasks.

Pre-trained Backbones

We offer different pre-trained transformer backbones as below.

Model Name Data Epoch Link
FaRL-Base-Patch16-LAIONFace20M-ep16 (used in paper) LAION Face 20M 16 Github; Baidu Key: wu7p
FaRL-Base-Patch16-LAIONFace20M-ep64 LAION Face 20M 64 Github; Baidu Key: mgau

Use FaRL as FaceCLIP

We provied both the pretrained text encoder and the image encoder. As FaRL shares the same network structure as CLIP, you can load the weights of FaRL using exactly the same network structure as CLIP VIT-B16, and use it exactly like CLIP. Here are the code sample modified from CLIP.

import torch
import clip
from PIL import Image

device ="cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/16", device="cpu")
model = model.to(device)
farl_state=torch.load("FaRL-Base-Patch16-LAIONFace20M-ep16.pth") # you can download from https://github.com/FacePerceiver/FaRL#pre-trained-backbones
model.load_state_dict(farl_state["state_dict"],strict=False)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  

Setup Downstream Training

We run all downstream trainings on 8 NVIDIA GPUs (32G). Our code supports other GPU configurations, but we do not guarantee the resulting performances on them. Before setting up, install these packages:

Then, install the rest dependencies with pip install -r ./requirement.txt.

Please refer to ./DS_DATA.md to prepare the training and testing data for downstream tasks.

Download the pre-trained backbones into ./blob/checkpoint/. Now you can launch the downstream trainings & evaluations with following command template.

python -m blueprint.run \
  farl/experiments/{task}/{train_config_file}.yaml \
  --exp_name farl --blob_root ./blob

The repo has included some config files under ./farl/experiments/ that perform finetuning for face parsing and face alignment. For example, if you would like to launch a face parsing training on LaPa by finetuning our FaRL-Base-Patch16-LAIONFace20M-ep16 pre-training, simply run with:

python -m blueprint.run \
  farl/experiments/face_parsing/train_lapa_farl-b-ep16_448_refinebb.yaml \
  --exp_name farl --blob_root ./blob

Or if you would like to launch a face alignment training on 300W by finetuning our FaRL-Base-Patch16-LAIONFace20M-ep16 pre-training, you can simply run with:

python -m blueprint.run \
  farl/experiments/face_alignment/train_ibug300w_farl-b-ep16_448_refinebb.yaml \
  --exp_name farl --blob_root ./blob

It is also easy to create new config files for training and evaluation on your own. For example, you can customize your own face parsing task on CelebAMask-HQ by editing the values below (remember to remove the comments before running).

package: farl.experiments.face_parsing

class: blueprint.ml.DistributedGPURun
local_run:
  $PARSE('./trainers/celebm_farl.yaml', 
    cfg_file=FILE,
    train_data_ratio=None, # The data ratio used for training. None means using 100% training data; 0.1 means using only 10% training data.
    batch_size=5, # The local batch size on each GPU.
    model_type='base', # The size of the pre-trained backbone. Supports 'base', 'large' or 'huge'.
    model_path=BLOB('checkpoint/FaRL-Base-Patch16-LAIONFace20M-ep16.pth'), # The path to the pre-trained backbone.
    input_resolution=448, # The input image resolution, e.g 224, 448. 
    head_channel=768, # The channels of the head.
    optimizer_name='refine_backbone', # The optimization method. Should be 'refine_backbone' or 'freeze_backbone'.
    enable_amp=False) # Whether to enable float16 in downstream training.

Performance

The following table illustrates the performances of our FaRL-Base-Patch16-LAIONFace20M-ep16 pre-training, which is pre-trained with 16 epoches, both reported in the paper (Paper) and reproduced using this repo (Rep). There are small differences between their performances due to code refactorization.

Name Task Benchmark Metric Score (Paper/Rep) Logs (Paper/Rep)
face_parsing/
train_celebm_farl-b-ep16-448_refinebb.yaml
Face Parsing CelebAMask-HQ F1-mean ⇑ 89.56/89.65 Paper, Rep
face_parsing/
train_lapa_farl-b-ep16_448_refinebb.yaml
Face Parsing LaPa F1-mean ⇑ 93.88/93.86 Paper, Rep
face_alignment/
train_aflw19_farl-b-ep16_448_refinebb.yaml
Face Alignment AFLW-19 (Full) NME_diag ⇓ 0.943/0.943 Paper, Rep
face_alignment/
train_ibug300w_farl-b-ep16_448_refinebb.yaml
Face Alignment 300W (Full) NME_inter-ocular ⇓ 2.93/2.92 Paper, Rep
face_alignment/
train_wflw_farl-b-ep16_448_refinebb.yaml
Face Alignment WFLW (Full) NME_inter-ocular ⇓ 3.96/3.98 Paper, Rep

Below we also report results of our new FaRL-Base-Patch16-LAIONFace20M-ep64, which is pre-trained with 64 epoches instead of 16 epoches as above, showing further improvements on most tasks.

Name Task Benchmark Metric Score Logs
face_parsing/
train_celebm_farl-b-ep64-448_refinebb.yaml
Face Parsing CelebAMask-HQ F1-mean ⇑ 89.57 Rep
face_parsing/
train_lapa_farl-b-ep64_448_refinebb.yaml
Face Parsing LaPa F1-mean ⇑ 94.04 Rep
face_alignment/
train_aflw19_farl-b-ep64_448_refinebb.yaml
Face Alignment AFLW-19 (Full) NME_diag ⇓ 0.938 Rep
face_alignment/
train_ibug300w_farl-b-ep64_448_refinebb.yaml
Face Alignment 300W (Full) NME_inter-ocular ⇓ 2.88 Rep
face_alignment/
train_wflw_farl-b-ep64_448_refinebb.yaml
Face Alignment WFLW (Full) NME_inter-ocular ⇓ 3.88 Rep

Pre-trained Downstream Models

We will continuously update the pre-trained downstream face models in our facer package.

LAION-Face Dataset

We use the LAION-Face dataset for training the FaRL model, LAION-Face is the human face subset of LAION-400M, it consists of 50 million image-text pairs, we use the 20M subset for fast verification.

Contact

For help or issues concerning the code and the released models, feel free to submit a GitHub issue, or contact Hao Yang ([email protected]).

Citation

If you find our work helpful, please consider citing

@article{zheng2021farl,
  title={General Facial Representation Learning in a Visual-Linguistic Manner},
  author={Zheng, Yinglin and Yang, Hao and Zhang, Ting and Bao, Jianmin and Chen, Dongdong and Huang, Yangyu and Yuan, Lu and Chen, Dong and Zeng, Ming and Wen, Fang},
  journal={arXiv preprint arXiv:2112.03109},
  year={2021}
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

farl's People

Contributors

haya2333 avatar microsoftopensource avatar yang-h avatar yinglinzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

farl's Issues

download pre-trained-backbones leads to 404 not found

First of all, I would like to thank you for your fantastic project. It has been incredibly helpful and well-designed.

I've come across an issue while trying to access the pretrained backbone. The provided link results in a "404 Not Found" error. Could you please help me resolve this problem? It would be great if you could share an updated link or guide me on how to access the correct resource.

https://github.com/microsoft/FaRL#pre-trained-backbones

LAION-FACE Dataset

Hi,
Appreciate your work,

  1. Are you planning on releasing the LAION-Face subset (metadata)?
  2. You demonstrated that pretraining on LAION-Face improved 3 downstream tasks, I was wondering have you done a benchmark on Face Recognition Task?

face alignement task

Hello there, thank for your contribution !
would it be possible for you to add the landmarks detection /face alignement task in this repo ?
thank you

Question about input size

Hi, thank you very much for this great work. I see that the input to the CLIP model is of size 224x224, however the parsing and the alignment models' input is 448x448, would you please clarify this? thank you.

Missing file

list_eval_partition.txt is missing while training with celebamask

Text encoder

Thank you for your awesome work. Do you have plan to release pretrained text encoder?

Weird time behaviour for face parsing

I believe there is something about the JIT (Just-In-Time) load causing some unusual behavior. The first batch takes around 2 seconds, the second batch takes around 20 seconds, but the third batch and subsequent batches only take 0.1 seconds.

Is there any information available about this issue?

Error downloading object

when git clone the repo, encounter the problem

Cloning into 'FaRL'...
remote: Enumerating objects: 587, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 587 (delta 24), reused 4 (delta 2), pack-reused 548
Receiving objects: 100% (587/587), 556.46 KiB | 761.00 KiB/s, done.
Resolving deltas: 100% (302/302), done.
Downloading farl/network/ext/p2i_ops/sample.ipynb (2.3 KB)
Error downloading object: farl/network/ext/p2i_ops/sample.ipynb (f7d4c2c): Smudge error: Error downloading farl/network/ext/p2i_ops/sample.ipynb (f7d4c2c0c21613b6c6d6bad83a10723b21a3606c4156d39551d4adba13ef47e1): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /Users/dongdengke/Desktop/FaRL/.git/lfs/logs/20230930T170447.302127.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: farl/network/ext/p2i_ops/sample.ipynb: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

how to solve it

Some questions about the application

Hello, I'm a newcomer to Facial Representation Learning. I want to view the result of your training model, for example, face parsing. I want to input a picture and output a result with each area marked.

Some questions on training downstream face parsing tasks

Has anyone used the author's pre-trained model to train a downstream task for face parsing? RuntimeError: CUDA error: device-side assert triggered when using multi-GPU training. Has anyone encountered this? Also I want to convert pth model to onnx model, has anyone done it?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation!!

I tried to train face parsing using following code but i got error:

python -m blueprint.run
farl/experiments/face_parsing/train_lapa_farl-b-ep16_448_refinebb.yaml
--exp_name farl --blob_root ./blob

====== RUNNING farl/experiments/face_parsing/train_lapa_farl-b-ep16_448_refinebb.yaml ======
blueprint: Parsing farl/experiments/face_parsing/train_lapa_farl-b-ep16_448_refinebb.yaml
DistributedGPURun: init_process_group: 0/1
blueprint: Parsing farl/experiments/face_parsing/./trainers/lapa_farl.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../augmenters/lapa/train.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../augmenters/lapa/test.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../augmenters/lapa/test_post.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../networks/farl.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../scorers/lapa.yaml
blueprint: Parsing farl/experiments/face_parsing/./trainers/../optimizers/refine_backbone.yaml
Mon Apr 25 11:21:11 2022 - farl_0 - outputs_dir: ./blob/outputs/farl/face_parsing.train_lapa_farl-b-ep16_448_refinebb
Mon Apr 25 11:21:11 2022 - farl_0 - states_dir: ./blob/states/farl/face_parsing.train_lapa_farl-b-ep16_448_refinebb
Mon Apr 25 11:21:11 2022 - farl_0 - locating the latest loadable state ...
Mon Apr 25 11:21:11 2022 - farl_0 - no valid state files found in ./blob/states/farl/face_parsing.train_lapa_farl-b-ep16_448_refinebb
Mon Apr 25 11:21:11 2022 - farl_0 - There will be 6056 training steps in this epoch.
loss=2.4654557704925537
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/blueprint/run.py", line 69, in
_main()
File "/usr/local/lib/python3.7/dist-packages/blueprint/run.py", line 65, in _main
runnable()
File "/usr/local/lib/python3.7/dist-packages/blueprint/ml/distributed.py", line 123, in call
_single_thread_run, args=(num_gpus, self), nprocs=num_gpus, join=True)
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/usr/local/lib/python3.7/dist-packages/blueprint/ml/distributed.py", line 68, in _single_thread_run
local_run()
File "/usr/local/lib/python3.7/dist-packages/blueprint/ml/trainer.py", line 194, in call
self._backward(loss)
File "/usr/local/lib/python3.7/dist-packages/blueprint/ml/trainer.py", line 120, in _backward
loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 768, 28, 28]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I'm using colab to run your project.
and changed batchsize to 3.

How to use trianed model in facer?

hi, thanks for your work,
We trained your model on our own dataset which contains lapa dataset and new images including wearing mask.
after train we got checkpoints like these in blob/states/farl/face_parsing.train_lapa_farl-b-ep64_448_refinebb directory:
2.5G 10902_2.pth
2.5G 21804_4.pth
2.5G 32706_6.pth
2.5G 43608_8.pth
2.5G 54510_10.pth
2.5G 65412_12.pth
2.5G 76314_14.pth
2.5G 87216_16.pth
4.0K _records.pth
which everyone contains model weights, optimizer and etc.

the problem is how to use these weight in facer project?
i checked facer loading proccess and it use torchscript model.

Error

raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in

Did you use the pretrained codebook provided by OpenAI in the papar "FARL"?

Dear Authors,

I am Deyu, Zhou, a Ph.D student from HKUST(GZ). I am very interested in your work “FARL”! Well Done!
I am curious about the codebook you used to do Masked Image Modelling.
I notice that the DALLE’s codebook is able to get 1,024 tokens given an image as input.
However, I note that your work use sequence with length as 196 (tokens) + 1 ([CLS] token).
So did you train the codebook from scratch by yourselves or did you use any other pretrained codebook?
Or could you release the codebook in your github?

Thanks,
Deyu

Demo!!

hi,
thanks for your amazing results.
can you please provide some demo code to take the output from any input image?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.