Git Product home page Git Product logo

tface's Introduction

Introduction

TFace: A trusty face analysis research platform developed by Tencent Youtu Lab. It provides a high-performance distributed training framework and releases our efficient methods implementations. Some of the algorithms are self-developed, and we believe the released codes benefits researchers to follow.

This project consists of several modules: Face Recognition, Face Security, Face Quality and Facial Attribute.

Face Recognition

This module implements various state-of-art algorithms for face recognition.

Paper List:

2022.9: Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain accepted by ECCV2022. [paper]

2022.9: DuetFace: Collaborative Privacy-Preserving Face Recognition via Channel Splitting in the Frequency Domain accepted by ACMMM2022. [paper]

2022.6: Evaluation-oriented knowledge distillation for deep face recognition accepted by CVPR2022. [paper]

2021.3: Consistent Instance False Positive Improves Fairness in Face Recognition accepted by CVPR2021. [paper]

2021.3: Spherical Confidence Learning for Face Recognition accepted by CVPR2021. [paper]

2020.8: Improving Face Recognition from Hard Samples via Distribution Distillation Loss accepted by ECCV2020. [paper]

2020.3: Curricularface: adaptive curriculum learning loss for deep face recognition has been accepted by CVPR2020. [paper]

Face Security

This module implements various state-of-art algorithms for face security.

Paper List:

2023.09: Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition accepted by CVPR2023

2021.12: Dual Contrastive Learning for General Face Forgery Detection accepted by AAAI2022

2021.12: Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning accepted by AAAI2022

2021.12: Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection accepted by AAAI2022

2021.12: Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing accepted by AAAI2022

2021.07: Spatiotemporal Inconsistency Learning for DeepFake Video Detection accepted by ACM MM2021[paper] [Analysis]

2021.07: Adaptive Normalized Representation Learning for Generalizable Face Anti-Spoofing accepted by ACM MM2021[paper]

2021.07: Structure Destruction and Content Combination for Face Anti-Spoofing accepted by IJCB2021[paper]

2021.04: Adv-Makeup: A New Imperceptible and Transferable Attack on Face Recognition accepted by IJCAI2021[paper]

2021.04: Dual Reweighting Domain Generalization for Face Presentation Attack Detection accepted by IJCAI2021[paper]

2021.03: Delving into Data: Effectively Substitute Training for Black-box Attack accepted by CVPR2021. [paper]

2020.12: Generalizable Representation Learning for Mixture Domain Face Anti-Spoofing accepted by AAAI2021. [paper]

2020.12: Local Relation Learning for Face Forgery Detection accepted by AAAI2021. [paper]

2020.06: Face Anti-Spoofing via Disentangled Representation Learning accepted by ECCV2020. [paper]

Face Quality

This module implements the SDD-FIQA algorithm for face quality.

Paper List:

2021.3: SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance accepted by CVPR2021. [paper]

Facial Attribute

This module implements the M3DFEL algorithm for facial attribute.

Paper List:

2023.6: Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition accepted by CVPR2023. [paper]

tface's People

Contributors

chenshen03 avatar dependabot[bot] avatar eltociear avatar hitspring2015 avatar huangyg123 avatar sleepingzky avatar wjxzju avatar xkx0430 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tface's Issues

AllGatherFunc方法backward最后要对梯度再乘以len(grad_list)也就是world_size?

AllGatherFunc方法backward最后要对梯度再乘以len(grad_list)也就是world_size? 这个没有想明白。还望大牛给解释一下。
之前关于Allgather这块是如何进行梯度传递没有了解,这几天整理了一下。主要包括2个问题:
(1)为什么要对grad_out求和?
(2)为什么要乘以world_size?
第(1)点我查看了一些资料并推导了一下公式,差不多理解了。下面写出来,大家看看我的理解是否正确哈。第(2)还是没有理解。

下面用示意图来描述大规模人脸分类的过程,结合下面示意图和公式表达来理解。
B: batch size, d: feature dimension, K: gpu number, C: class number, $c_j$: class number of j-th gpu
(1):第j块GPU上特征
(2):表示所有的K个GPU上特征合并在一起
(3):表示所有的K个GPU上特征合并在一起
(4):这里简化分类层为常规线性变换。(下面的公式中$y_j$就表示$logit_j$)
(5)

可以看出每块GPU上产生的对全体特征向量的梯度维度都是一样(这个是肯定的),每块GPU上产生梯度是通过上述链式法则得到的,得到梯度的公式中,分两个部分相乘,一个是对logit值的导数,一个是当前卡上局部分类权重W的导数。对于每块卡而言这两部分都不一样。也就是每块gpu都对全体特征向量$F_{total}$都产生梯度。总的loss是各个GPU上loss先求和再归约,因此在求对logit梯度时,也除以了总的样本数量(KB),然后对全体特征向量$F_{total}$在allgather层要进行相加。

上面是对第(1)点的回答。大家看看我的解释是否正确。

(2)可是至此,还不明白上述代码为什么要乘以GPU的数量,对应代码为:grad_out *= len(grad_list)
https://amsword.medium.com/gradient-backpropagation-with-torch-distributed-all-gather-9f3941a381f8 也提到了要乘以world_size,但是还是没有明白。

[SDD-FIQA] Inconsistent recognition model with quality label

Hello, thanks for sharing your excellent work!

Since the pseudo labels I generated are a little bit far from intuition, I'd like to compare my generated pseudo labels with yours. I found that you have shared a recognition model in the repo, and the link for the trained quality model. But from the file name, the labels of the quality model is not generated from the shared recognition model? Could you help share the recognition model used to generate the pseudo labels for the shared quality model (R50)? Thanks!

Model Download

Can you put the SDD-FIQA pretrained model on Baidu Disk? I can't download it from google disk... thank you.

_make_uncer_net_conv not found

I run the code of Spherical Confidence Learning for Face Recognition at tasks/scf
but the function _make_uncer_net_conv hasn't implemented yet, could you update that function?

How to make dataset for DDL method

In the ddl yaml

DATASETS:  # the dataset index name
    - name: TFR-vggface2_easy_test
      batch_size: 32
      weight: 1.
    
    - name: TFR-vggface2_hard_test
      batch_size: 32
      weight: 1.

    - name: TFR-vggface2_pair_easy_test
      batch_size: 32
      weight: 1.
      IS_PAIR: True

    - name: TFR-vggface2_pair_hard_test
      batch_size: 32
      weight: 1.
      IS_PAIR: True

If we have MS1Mv2 as easy set and MS1Mv2 + face mask as hard set. How can we generage TFR-ms1m_pair_easy_test and TFR-ms1m_pair_hard_test?

Threshold parameter ru+ in CIFP work

Thanks for the CIFP work on CVPR2021, it's very impressive!
I'd like to check about the hyper-parameter setting. As mentioned in the paper, ru+ is set to be 1e-4 according to the experiment. However, in the code of cifp.py, I didn't find any definition of it. Could you please offer some instruction on this?

About the AOC results in the paper

In the paper, the AOC results is 1 - sum(FNMR), but the paper do not say what margin of the ratio of unconsidered images set. I compare the table1 and Figure4, the ressults seem to be different,so how can I get the AOC results in the table1?

Question about the order loss in DDL

感谢分享,我有一个疑问,请问关于DDL中order loss, 优化目标为什么是最小化负样对相似分布期望与正样对相似分布期望的距离??不应是分布期望相距越大(分布之间距离较大),重叠部分越小吗??

Questions about CIFP implementation

  • 1)代码58行计算出的cos_theta_neg_th 应该就是文中公式里的Tu吧。按我的理解,Tu是所有non-target logit中挑选第far_rank个最大的数值。但是为什么代码中是在小于target预测概率的剩余non-target logit中进行挑选?
  • 2)代码70行应该就是文中公式(11)计算ri+/ru+的部分,因ru+设为1/(n-1),因此公式中分母可以消掉。但为什么代码中用了平均值而不是公式分子中的总和?
  • 3)代码71行将FP penalty加到target logit上,按照文中公式(10)进行推导可以得到target logit的修改如下图,在原来的基础上减去alpha*ri+/ru+(即cos_theta_neg_topk)。但代码中减去的部分为(1 + target_cos_theta_) * cos_theta_neg_topk,这里的(1 + target_cos_theta_) 是alpha吗?
  • image
  • 4)target logit加入penalty之后数值很小,几乎均为负数,这个正常吗?
  • 5)关于梯度回传以及cos_theta_的使用仍然有些困惑。如果按照文中将penalty直接加到non-target logit上是不是可以避免这一操作?

Originally posted by @milliema in #7 (comment)

CurricularFace_Backbone 加载错误问题 似乎源码有错误?

对于CurricularFace的pretrained,原来的BasicBlockIR我无法加载成功,似乎是有一个算子错位了。我修改后加载成功了。

class BasicBlockIR(Module):
    """ BasicBlock for IRNet
    """
    def __init__(self, in_channel, depth, stride):
        super(BasicBlockIR, self).__init__()
        if in_channel == depth:
            self.shortcut_layer = MaxPool2d(1, stride)
        else:
            self.shortcut_layer = Sequential(
                Conv2d(in_channel, depth, (1, 1), stride, bias=False),
                BatchNorm2d(depth))
        # self.res_layer = Sequential(
        #     BatchNorm2d(in_channel),
        #     Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False),
        #     BatchNorm2d(depth),
        #     PReLU(depth),
        #     Conv2d(depth, depth, (3, 3), stride, 1, bias=False),
        #     BatchNorm2d(depth))

        self.res_layer = Sequential(
            BatchNorm2d(in_channel),
            Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False),
            # BatchNorm2d(depth),  # 需要去掉这个batchnorm
            PReLU(depth),
            Conv2d(depth, depth, (3, 3), stride, 1, bias=False),
            BatchNorm2d(depth))

    def forward(self, x):
        shortcut = self.shortcut_layer(x)
        res = self.res_layer(x)

        return res + shortcut

加载成功的log:

  %1081 = Gemm[alpha = 1, beta = 1, transB = 1](%1080, %output_layer.3.weight, %output_layer.3.bias)
  %1082 = Constant[value = <Tensor>]()
  %1083 = Constant[value = <Tensor>]()
  %embedding = BatchNormalization[epsilon = 9.99999974737875e-06, momentum = 0.899999976158142](%1081, %1082, %1083, %output_layer.4.running_mean, %output_layer.4.running_var)
  return %embedding
}
Checking 0/3...
Checking 1/3...
Checking 2/3...
Converted ./pretrained/CurricularFace_Backbone.pth to ./pretrained/CurricularFace_Backbone.onnx done!

而在修改之前会出现以下错误:

    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Backbone:
        Missing key(s) in state_dict: "body.0.res_layer.2.bias", "body.0.res_layer.2.running_mean", "body.0.res_layer.2.running_var", "body.0.res_layer.5.weight", "body.0.res_layer.5.bias", "body.0.res_layer.5.running_mean", "body.0.res_layer.5.running_var", "body.1.res_layer.2.bias", "body.1.res_layer.2.running_mean", "body.1.res_layer.2.running_var", "body.1.res_layer.5.weight", "body.1.res_layer.5.bias", "body.1.res_layer.5.running_mean",  
...
       size mismatch for body.0.res_layer.3.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for body.0.res_layer.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for body.1.res_layer.3.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for body.1.res_layer.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for body.2.res_layer.3.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for body.2.res_layer.4.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for body.3.res_layer.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for body.3.res_layer.4.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for body.4.res_layer.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for body.4.res_layer.4.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for body.5.res_layer.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128]).

从这段log看,似乎是由于一个算子错位引起的,它尝试将一个卷积Conv的权重拷贝到BatchNorm上。我修改过后可以成功加载pth,并转换成onnx。我想这应该不影响使用。 以下是我完整的测试代码:

import cv2
import onnx
import torch
import numpy as np

from torchkit.backbone import get_model


def convert_to_onnx(pretrained_path="./pretrained/BUPT_Balancedface_IR_34.pth",
                    backbone_type="IR_34", do_simplify=True,
                    output_path="./pretrained/BUPT_Balancedface_IR_34.onnx"):
    # assert backbone_type in ("IR_34", "IR_101", "IR_SE_101")
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    model = get_model(backbone_type)([112, 112])
    model.load_state_dict(torch.load(pretrained_path, map_location=device))
    model = model.to(device)
    model.eval()
    print(f"Load {pretrained_path} done! Device: {device}")

    test_path = "./test.png"
    img = cv2.imread(test_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (112, 112)).astype(np.float32)
    face = img / 255.0  # (0, 1)
    face = (face - 0.5) / 0.5  # (-1, 1)
    face = np.transpose(face, (2, 0, 1))  # (3,112,112)
    face = np.expand_dims(face, 0)  
    face_tensor = torch.from_numpy(face)

    embeddings = model(face_tensor).detach().cpu().numpy()

    print("Pth Detect done.")
    print(embeddings.shape)
    print('Converting ...')
    torch.onnx.export(model, face_tensor, output_path,
                      input_names=["input"],
                      output_names=["embedding"],
                      keep_initializers_as_inputs=False,
                      verbose=False, opset_version=11)
    model_onnx = onnx.load(output_path)
    print(onnx.helper.printable_graph(model_onnx.graph))
    if do_simplify:
        from onnxsim import simplify
        model_onnx, check = simplify(model_onnx, check_n=3)
        assert check, "Simplified ONNX model could not be validated"
    onnx.save(model_onnx, output_path)
    print(f"Converted {pretrained_path} to {output_path} done!")


if __name__ == "__main__":
    # convert_to_onnx(pretrained_path="./pretrained/BUPT_Balancedface_IR_34.pth",
    #                 output_path="./pretrained/BUPT_Balancedface_IR_34.onnx",
    #                 backbone_type="IR_34")
    convert_to_onnx(pretrained_path="./pretrained/CurricularFace_Backbone.pth",
                    output_path="./pretrained/CurricularFace_Backbone.onnx",
                    backbone_type="IR_101")
    """
    PYTHONPATH=. python3 ./inference.py
    """

以上是我在加载基于IR_101的CurricularFace遇到的问题。但除此外,cifp中提供的模型是基于IR_34的,奇怪的是,IR_34必须在我修改前的BasicBlockIR才能加载成功。即CurricularFace中基于IR_101的pretrained与cifp中基于IR_34的pretrained,应该是使用两个不同版本的BasicBlockIR来训练的,他们并不能共用同一个BasicBlockIR模块。cifp中基于IR_34的pretrained中的BasicBlockIR需要长这样子:

class BasicBlockIR(Module):
    """ BasicBlock for IRNet
    """
    def __init__(self, in_channel, depth, stride):
        super(BasicBlockIR, self).__init__()
        if in_channel == depth:
            self.shortcut_layer = MaxPool2d(1, stride)
        else:
            self.shortcut_layer = Sequential(
                Conv2d(in_channel, depth, (1, 1), stride, bias=False),
                BatchNorm2d(depth))
      
        self.res_layer = Sequential(
            BatchNorm2d(in_channel),
            Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False),
            BatchNorm2d(depth),  # 没有注释这个batchnorm
            PReLU(depth),
            Conv2d(depth, depth, (3, 3), stride, 1, bias=False),
            BatchNorm2d(depth))

        print("BasicBlockIR")

    def forward(self, x):
        shortcut = self.shortcut_layer(x)
        res = self.res_layer(x)

        return res + shortcut

How to run example?

Thank you for perfect repository <3

  • I see that you implemented many papers in task directory. I see training code but I cannot see any example code to run that model
  • Can you show me how to run with pre-trained model in TFace/tasks/cifp/ ? (Extract feature + verify)
  • Thank you so much

如何使用这个项目呀

哪位大神能不能补充一下,具体使用文档呀,我不太会,只是想用用这个功能,感谢

单机多卡运行问题

你好,我在代码里看到用的是多机多卡并行处理,请问怎么设置可以单机多卡运行,自己改动的时候发现比较复杂。

SDD-FIQA results understanding

Hi,
first of all, thanks for your work.

I decided to test it with pre-trained model and some own real images. I run eval.py demo to check it out.
But I am confused with the results of quality scores.

  1. What ranges of quality score values can be considered as low, medium, high for SDD_FIQA_checkpoints_r50.pth model?

Referring to the paper, Fig3, I can guess the following value [low: 0 - 35, medium: 36 - 48, high: 49 - 100]. So I wonder to hear your suggestion about it.

  1. Do these ranges depend on the model? It seems that yes, but correct me if I am wrong.

  2. Did you performed experiments with face occlusion, how its robust on it?

IMHO, there is no rough thresholds which separate low vs medium vs high sub-ranges, there are some overlaps between them. And these overlaps make me feeling that there is not clear enough quality scores dependency between low, medium and high quality face images.

Thanks~

Any detail report about CSIG-FAT-AI competition?

Congratulation for ur team winning the second place of CFAT competition. I'm very instrested in your work about making the feature distribution of masked-face and non-masked-face consistent. In my practice, masked faces' feature has very large gap with non-masked faces. This may cause accuracy decrease when train masked-face and nomasked-face together. I have tried DDL to try to decrease the gap between these two datases but the result is less effective.
Can you provide more information about this task?

ImportError: attempted relative import with no known parent package

When I run local_train.sh, it returns the error below:

Traceback (most recent call last):
  File "./tasks/cifp/train_cifp.py", line 9, in <module>
    from ..localfc.train_localfc import TrainTask
ImportError: attempted relative import with no known parent package
Traceback (most recent call last):
  File "./tasks/cifp/train_cifp.py", line 9, in <module>
    from ..localfc.train_localfc import TrainTask
ImportError: attempted relative import with no known parent package
Traceback (most recent call last):
  File "./tasks/cifp/train_cifp.py", line 9, in <module>
    from ..localfc.train_localfc import TrainTask
ImportError: attempted relative import with no known parent package
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '-u', './tasks/cifp/train_cifp.py', '--local_rank=7']' returned non-zero exit status 1.

The content of my local_train.sh is:

#!/bin/bash

if [ ! -d "logs" ]; then
    mkdir logs
fi
export CUDA_VISIBLE_DEVICES='2'
nohup python -u -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 ./tasks/cifp/train_cifp.py > logs/$(date +%F-%H-%M-%S).log 2>&1 &

Multi-gpu training got an unbanlanced memory and redundant processes

I train models in multi-GPUs(4 gpus here) behavior. But once I tried to load pretrained model, the nvidia-smi shown that another 4 redundant processes(8 processes in total) occupied the GPU 0, while not loading pretrained model those processes disappeared(4 process in total). Those redundant processes increase along with the GPU numbers you use, and due to small number of batch size in sum. And could you let me known how to keep redundant processes from appearing when train using multi-GPUs and loading pretrained model?

About Data Augmentation

I see Data Augmentation implement in repo code.
It seems like some usual Data Augmentation method of image classification.
Do you find what kinds of Data Augmentation method is benefit the model performance in face recognition?
Thanks.

RESUME got error on distfc

Hi author, I resume the training from epoch 21 with the setting.

  GNU nano 4.8                                                                    tasks/distfc/train_config.yaml                                                                               
SEED: 1337 # random seed for reproduce results
DATA_ROOT: '/home/john123/TFace/' # the parent root where your train/val/test data are stored
INDEX_ROOT: '/home/john123/TFace/'
DATASETS:  # the dataset index name
    - name: TFR-MS1M-list
      batch_size: 96
      weight: 1.

BACKBONE_RESUME: "/home/john123/ckpt/Backbone_Epoch_22_checkpoint.pth"
HEAD_RESUME: "/home/john123/ckpt/HEAD_Epoch_22"
META_RESUME: "/home/john123/ckpt/Optimizer_Epoch_22_checkpoint.pth"

# BACKBONE_NAME: 'EfficientNetB0'
BACKBONE_NAME: 'IR_34'
DIST_FC: true
TFRRECORD: true
MODEL_ROOT: './ckpt/' # the root to buffer your checkpoints
LOG_ROOT: './tensorboard' # the root to log your train/val status
HEAD_NAME: "CurricularFace" # support:  ['ArcFace', 'CurricularFace', 'CosFace']
LOSS_NAME: 'DistCrossEntropy' # support: ['DistCrossEntropy', 'Softmax']
INPUT_SIZE: [112, 112] # support: [112, 112] and [224, 224]
RGB_MEAN: [0.5, 0.5, 0.5] # for normalize inputs to [-1, 1]
RGB_STD: [0.5, 0.5, 0.5]
INPUT_SIZE: [112, 112]
EMBEDDING_SIZE: 512 # feature dimension
LR: 0.1 # initial LR
START_EPOCH: 0 # start epoch
WARMUP_STEP: -1
NUM_EPOCH: 26 # total epoch number
WEIGHT_DECAY: 0.0005 # do not apply to batch_norm parameters
MOMENTUM: 0.9
STAGES: [10, 18, 24] # epoch stages to decay learning rate
WORLD_SIZE: 1
RANK: 0
LOCAL_RANK: 0
DIST_BACKEND: 'nccl'
DIST_URL: 'env://'
NUM_WORKERS: 8
AMP: true # fp16 for backbone

However, I got the error. Could you help me to fix it?

Traceback (most recent call last):
  File "./tasks/distfc/train_distfc.py", line 150, in <module>
    main()
  File "./tasks/distfc/train_distfc.py", line 146, in main
    task.train()
  File "./tasks/distfc/train_distfc.py", line 138, in train
    self._loop_step(train_loaders, backbone, heads, loss, opt, scaler, epoch, class_splits)
  File "./tasks/distfc/train_distfc.py", line 88, in _loop_step
    scaler.step(opt)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 338, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 285, in _maybe_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/optim/sgd.py", line 117, in step
    nesterov=nesterov)
  File "/home/john123/pytorch/lib/python3.7/site-packages/torch/optim/_functional.py", line 173, in sgd
    buf.mul_(momentum).add_(d_p, alpha=1 - dampening)
RuntimeError: The size of tensor a (257489) must match the size of tensor b (257488) at non-singleton dimension 1

Cannot make a tfrecords file for BUPT-* datasets

Dear authors, thank you for the great work and for providing the codebase. I have some trouble with reproducing the results of CIFP. To do that, I wanted to train it on BUPT-BalancedFace + BUPT-GlobalFace (as described in the paper). After downloading each of them, I can't really understand how to make a tfrecords file out of the images.

The script tools/img2tfrecord.py asks me for a --pts_list argument, however, I'm not sure what should I pass there. If it's a file with 5 landmarks, I can't find any of them provided with these datasets.

Could you please help me with that?

looking for codes for “2021.3: Consistent Instance False Positive Improves Fairness in Face Recognition accepted by CVPR2021.”

Hello,

I'm redirected by the code link from the paper "2021.3: Consistent Instance False Positive Improves Fairness in Face Recognition accepted by CVPR2021." and kind of missing among the folders. So far, I think the code is buried under TFace/recognition.
But I could not find further information about the paper and there are no instructions about it, so would anyone guide me through the files? Thanks in advance.

PartialFC with Curricular Face

First of all, thank you for your great codebase. It is really helpful.

Besides, I wonder if you would release Curricular Face support for PartialFC Head?

Verification dataset

could u please tell me how to aquire the verification datasets such as "lfw.bin"

THX

should torch.distributed.all_reduce be in the loop of "for i in range(len(batch_sizes))"

should torch.distributed.all_reduce be in the loop of "for i in range(len(batch_sizes))"
line 114-117: tasks/localfc/train_localfc.py

                for step_loss in step_losses:
                    torch.distributed.all_reduce(step_loss, ReduceOp.SUM)
                    step_loss /= self.cfg["WORLD_SIZE"]
                    avg_losses.append(step_loss)

i think the code should be located before "for i in range(len(batch_sizes))" in line 104

How to train Data?

In img2tfrecord.py , Can you explain the data type of --img_list=${img_list} --pts_list=${pts_list} ? What data form is required for input?

Pre-trained model loading error

Thank you very much for this work!

Inconsistent quantities caused the loading to fail. Do you know why? Thanks~

  • tasks/distfc/README.md IR101 pretrained model: OrderedDict len is 672
  • torchkit/backbone/model_irse.py IR101:OrderedDict len is 917

Train quality model using MS1M dataset

HI, I try to train quality model using :

MS1M-ArcFace (85K ids/5.8M images) [5,7] from https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_

But this has all train data in .rec file, I can understand if I can use this dataset whit quality model, I search for a method to decode this, but I cant do this.

def get_dataloader(
    root_dir: str,
    local_rank: int,
    batch_size: int,
    dali = False) -> Iterable:
    if dali and root_dir != "synthetic":
        rec = os.path.join(root_dir, 'train.rec')
        idx = os.path.join(root_dir, 'train.idx')
        return dali_data_iter(
            batch_size=batch_size, rec_file=rec,
            idx_file=idx, num_threads=2, local_rank=local_rank)
    else:
        if root_dir == "synthetic":
            train_set = SyntheticDataset()
        else:
            train_set = MXFaceDataset(root_dir=root_dir, local_rank=local_rank)
        train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, shuffle=True)
        train_loader = DataLoaderX(
            local_rank=local_rank,
            dataset=train_set,
            batch_size=batch_size,
            sampler=train_sampler,
            num_workers=2,
            pin_memory=True,
            drop_last=True,
        )
        return train_loader

I already have train.rec and train.idx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.