zhongdao / gcn_clustering Goto Github PK

Code for CVPR'19 paper Linkage-based Face Clustering via GCN

License: MIT License

Jupyter Notebook 94.87% Python 5.13%

face-clustering clustering gcn graph-convolutional-networks

gcn_clustering's Introduction

Linkage-based Face Clustering via Graph Convolution Network

This repository contains the code for our CVPR'19 paper Linkage-based Face Clustering via GCN, by Zhongdao Wang, Liang Zheng, Yali Li and Shengjin Wang, Tsinghua University and Australian National University.

Introduction

We present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance(face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs.

Requirements

PyTorch 0.4.0
Python 2.7
sklearn >= 0.19.1

Data Format

Firstly, extract features for IJB-B data, and save the features as an NxD dimensional .npy file, in which each row is a D-dimensional feature for a sample. Then, save the labels as an Nx1 dimensional .npy file, each row is an integer indicating the identity. Lastly, generate the KNN graph (either by brute force or ANN). The KNN graph should be saved as an Nx(K+1) dimensional .npy file, and in each row, the first element is the node index, and the following K elements are the indices of its KNN nodes.

For training, featrues+labels+knn_graphs are needed. For testing, only features+knn_graphs are needed, but if you need to compute accuracy the labels are also needed. We also provide the ArcFace features / labels / knn_graphs of IJB-B/CASIA dataset at OneDrive and Baidu NetDisk, extract code: 8wj1

Testing

python test.py --val_feat_path path/to/features --val_knn_graph_path path/to/knn/graph --val_labels_path path/to/labels --checkpoint path/to/gcn_weights

During inference, the test script will dynamically output the pairwise precision/recall/accuracy. After each subgraph is processed, the test script will output the final B-Cubed precision/recall/F-score (Note that it is not the same as the pairwise p/r/acc) and NMI score.

Training

python train.py --feat_path path/to/features --knn_graph_path path/to/knn/graph --labels_path path/to/labels

We employ the CASIA dataset to train the GCN. Usually, 4 epoch is sufficient. We provide a pre-trained model weights in logs/logs/best.ckpt

Citation

If you find GCN-Clustering helps your research, please cite our paper:

@inproceedings{wang2019gncclust,
  title={Linkage-based Face Clustering via Graph Convolution Network },
  author={Zhongdao Wang, Liang Zheng, Yali Li and Shengjin Wang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Ackownledgement

I borrowed some code on pseudo label propagation from CDP, many thanks to Xiaohang Zhan!

gcn_clustering's People

Stargazers

Watchers

Forkers

runauto liuheng0111 trantorrepository fangego lianqinglife aslpzhao weizhongjin swg209 forks-learning jiaxiangbu kelven123 doriswzg mengkunzhao qinghuazhao tensorflow-pool voidart-dwang 525747310 tony109060581 qinghaizheng1992 phymucs sumitpai compliceu rookiedata1 moxoo whrws verigle houjunhuang lichanglin3445667 kgkevinwg face-dl mensaochun nnu-gisa chaoso milkboylyf hyzcn jasoncocomo chidaidai mojabyte clovermini sunyuxi wastoon clhne shengzhang90 duxi gitqinxinyu chzhan ankanbhunia wanghuimu chulaihunde yuanwei0908 ryancv cyy1111-cai kqhuynguyen mhy-doracmon gzdshqt hjq0523 frankfan007 liwenxi liupengcnu catcodee stonem2017 hitori940101 sunfeng90 zakerifahimeh xwjbupt lyimage hvplus abhiksark peternara linkserendipity christinaranjith epicbinlee mightycrane esaghapour dearcaat alwayspku facecup-event sohailkhanmarwat dragon-cat kinsozheng dx199771 phu-minh mohbattharani monsterrek18 juqiu jasondu1993

gcn_clustering's Issues

how to generate the KNN graph？

请问为什么训练的时候没有softmax，而测试的时候使用

about training error.

Hello, when I try to train the program,it got some errors:
How can I slove it?

The details:
Current lr 0.01
Traceback (most recent call last):
File "train.py", line 165, in
main(args)
File "train.py", line 64, in main
train(trainloader, net, criterion, opt, epoch)
File "train.py", line 81, in train
for i, ((feat, adj, cid, h1id), gtmat) in enumerate(loader):
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 280, in next
idx, batch = self._get_batch()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 31687) is killed by signal: Killed.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 71, in _worker_manager_loop
r = in_queue.get()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/queues.py", line 378, in get
return recv()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
return pickle.loads(buf)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
fd = multiprocessing.reduction.rebuild_handle(df)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
conn = Client(address, authkey=current_process().authkey)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 432, in answer_challenge
message = connection.recv_bytes(256) # reject large message
IOError: [Errno 104] Connection reset by peer

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f77d1c11650>> ignored

关于梯度回传

您好，关于梯度回传的地方，如果不是只考虑 1-hop neighbors，而是考虑所有的情况，精度会下降吗？感觉考虑的更多精度是不是会更高？

pred will contain nan if k_at_hop is large

Thank you very much for your inspiring work!

As suggested in the paper, "In the testing phase, it is not necessary to keep the same configuration with the training phase. ", setting k_at_hop=[20,5] of test.py is reasonable for fast testing. But the pred and loss seem to become nan if k_at_hop=[200,10]. May I ask whether this phenomena is reproduced on your side, and why nan occurs?

我使用新数据集时，每个batch正样本比例只有1%左右，我使用resnet50提取特征，请问这有什么解决办法吗

Images of features.zip

Could you please tell me where to find the images corresponding to the features.zip.

Could u pls provide the data preprocessing code?

I am not familiar with this field. So could u pls provide the data preprocessing code? I mean the code for "features+labels+knn_graphs"

Using code in the new Gpu-s(Python Versions)

Hi ! I am trying to use this code in Google Colab and the latest problem that I am facing is the issue that the code is compiled in old cuda version. The error that I am getting is verbatim as follows :

"Found GPU0 Tesla T4 which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org"
Is there any structured way how to compile the code ? Can I try to change the code so it can work in the new Pytorch versions ? Will the current pretrained stuff work ? Please any advice or suggestion would be helpful ! Or at least if you could tell me how you have been able to run the code ? Thanks in advance!

想请问下，我没有改动代码任何地方，但是训练2个batch后precison和recall就变为0了,请问该如何训练出best.ckpt的效果呢

关于原始数据集

作者您好，我有一个问题想请教您一下。
您在论文中提到训练GCN网络是用CASIA数据集对网络进行训练，通过查看源代码我发现您的数据量大小为454590，但是CASIA数据集的大小为494414，请问您有经过筛选后的list吗？

测试的一个小 bug

测试的时候，如果测试集的数量和 batch size 的设置使得最后一个 batch 中只有一张图片时，会出现 invalid index to scalar variable 这样的 IndexError.
这是因为 test.py 第 154 行：

node_list = node_list.long().squeeze().numpy()
bs = feat.size(0)

产生的 node_list 在此时会退化成一维数组，导致索引错误.

可以再加一句检查：

node_list = node_list.long().squeeze().numpy()
bs = feat.size(0)
if bs == 1:
    node_list = np.array([node_list])

运算矩阵大小

How to generate "attention aggregation" feature?

Hello, I saw that you came up with a novel feature aggregation method "Attention Aggregation"(from 3.3), you said that the element in G is generate by a 2-layer MLP model. How to train this matrix, can you provide more detailed information? Thank you. (I didn't find relative source code, did I missed something?)

请问，GCN 如何使用 batch 训练？这里的batch 是指的对 node的分批训练吗？

某些batch速度比较慢，有没有类似问题的啊？

我设置的batch_size=64，正常1s就训练完，有些时候得1分钟，感觉不是很正常

About the dataset

Hi, you've done an excellent work！Could you please share the IJB-B dataset (including protocols)? Thanks a lot!

bcubed函数里面，根据我的验证，precision和recall两个变量可能搞反了

代码中coo_matrix返回的混淆矩阵axis=0的轴是指预测类别
错误的代码：
precision = np.sum(cm_norm * (cm / cm.sum(axis=0)))
recall = np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
正确的代码：
recall = np.sum(cm_norm * (cm / cm.sum(axis=0)))
precision= np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
但f1又恰好正确。
幸好论文里面没有打印这些数据。

下面是分析：我以BCubed的论文例子做实验

pred=np.array([0,0,0,0,1,1,1,2,2,2,2,2,2,2])
label=np.array([0,0,0,0,0,1,1,2,1,3,4,1,1,1])
cm= coo_matrix(
(np.ones((14)), (pred, label)),
shape=(3, 5),
dtype=np.int),toarray()
“”“
[[4, 0, 0, 0, 0],
[1, 2, 0, 0, 0],
[0, 4, 1, 1, 1]]
”“”
np.expand_dims(cm.sum(axis=1), 1)
”“”
[[4],
[3],
[7]]
“”“
cm / np.expand_dims(cm.sum(axis=1),1)
“”“
[[1. , 0. , 0. , 0. , 0. ],
[0.33333333, 0.66666667, 0. , 0. , 0. ],
[0. , 0.57142857, 0.14285714, 0.14285714, 0.14285714]]
”“”
cm * cm / np.expand_dims(cm.sum(axis=1),1)
"""
[[4. , 0. , 0. , 0. , 0. ],
[0.33333333, 1.33333333, 0. , 0. , 0. ],
[0. , 2.28571429, 0.14285714, 0.14285714, 0.14285714]]
"""

np.sum(cm * cm / np.expand_dims(cm.sum(axis=1),1))/cm.sum()

0.5986394557823128

这和你的代码是一样的算法，但是变量名错了：
cm_norm = cm / cm.sum()
recall = np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
这应该是精度
precision:(44/4+1/3+22/3+31/7+44/7)/14 = 0.5986394557823128

显然应该要除以每个预测类的总数
np.expand_dims(cm.sum(axis=1), 1)
”“”
[[4],
[3],
[7]]
“”“
axis=1才是计算精度的轴。

计算边分数时， score_dict[e[0], e[1]] = 0.5*(score_dict[e[0], e[1]] + score[i]),这样会导致后遍历到的边分数权重更大，为什么不用真正的平均呢

模型测试效果不理想

您好，我按照论文中的步骤，用CASIA数据集抽样5000个类（总共22万个样本）做训练集，用512，1024，1845三个数据集做测试集，用train.py的默认参数训练的模型，测试时达不到best.ckpt模型的效果。和best.ckpt比，我的训练结果的precision只有不到0.8。请问您有没有什么建议？

Why use the random subset of CASIA?

Hello, I want to know why you use the random subset instead of the whole CASIA?

Additionally, could you provide the list of the random subset?

Thank you very much.

Extracting features from ArcFace

Hi, I have a question regarding the feature extraction, as I cannot reproduce the results with my own preprocessed files. Given IJB-B-512, your checkpoint for CASIA, and pytorch implementation of ArcFace. I came up with the following code:

import numpy as np
from tqdm import tqdm

import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import datasets

from model import Backbone

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.Resize(size=(112, 112)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])

data_path = "../data/IJB-B-512/"
batch_size = 16
num_workers = 16

data = datasets.ImageFolder(data_path, transform=transform)
loader = torch.utils.data.DataLoader(data, 
                                     batch_size=batch_size, 
                                     num_workers=num_workers,
                                     shuffle=True,
                                     pin_memory=True)

model = Backbone(50, 0.6, 'ir_se')
ckpt = torch.load("../pretrained/model_ir_se50.pth")
model.load_state_dict(ckpt)
model.cuda()
model.eval()

features = []
def hook(module, input, output):
    N, C, H, W = output.shape
    output = output.reshape(N, C, -1)
    features.append(output.mean(dim=2).cpu().detach().numpy())

handle = model._modules['body'][23].res_layer[5].fc2.register_forward_hook(hook)
for i_batch, inputs in tqdm(enumerate(loader), total=len(loader)):
    _ = model(inputs[0].cuda())

features = np.concatenate(features)
handle.remove()

Could you please let me know if my approach makes sense or how is it different from yours or could you kindly share your pre-processing module?

face images of features

Could you provide the face images of features?

features.zip

您好，您提供的那个供下载features的百度网盘链接，为什么下载时总是显示“下载请求中”，但就是无法下载呢？

IJB-B list

Hi! Could you share the list of 'IJB-B' dataset?

about the features.zip

Thanks for your work.

I wanna know that whether the features in features.zip are extracted from arcface or the resnet-101 trained by yourself?

The dataset that I found wasn't in the type of npy, which makes me confused.

predicting edges are all zeros during training

Hello, when I was trying to train the model with your example, I found predicting edges came to be zeros. (No edge is predicted to be true) Have you ever met this situation?

This status usually occurs after 100 batchs' training, with following args.

k_at_hop = [200, 10]
active_connection = 10
batch_size = 16
momentum = 0.9
weight_decay = 1e-5
lr = 1e-5

knn上界已经复现了，但我认为graph_propagation的max_sz不太合理，完全可以改成th逐渐递减的模式去划分clusters

k=5

k=80

What is the ideal number of neighbour for any dataset?

Suppose I have 10000 images of 400 individual what would be the best way to find number of neighbours

关于特征提取

您的CASIA.feas.npy里特征的维度是512，您提到使用resnet101提取特征，但是resnet101提取到的特征不是2048维的吗？感谢解答

关于idea

你这个work应该就是用了《Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition》里面的insight吧，尤其是kNN那块的很明显，还用了人家的label propogation。

Ms-Celeb-1M的测试结果是多少，不知道作者有单独测试过吗

不知道作者有没有单独在Ms-Celeb-1M上进行过测试，测试结果是多少，耗时是多少。

关于labels

identities指的是类别数么，第一个测试集512.labels里包含不止512个整数？请问labels里的数值指的是什么？

关于Pression和Recall的问题，代码中应当是计算反了

这个问题我也有点晕乎，反复确认了一下，代码中bcubed那个函数，reference应当是GT，system应当是predict。麻烦作者你再确认一下。
比如GT为[0, 0, 0, 1, 1, 1],预测为[0, 0, 0, 0, 0, 0]。这样precision应当是0.5，recall是1，但是你的代码算出来recall 0.5，precision 为1.

关于邻接矩阵的变换

您好，关于feeder中邻接矩阵的变换我有两个疑问想请教一下：
1、gcn中一般会添加self-loop来做renormalization，但是您的代码中好像没有添加self-loop，请问这是什么原因呢
2、代码中的A通过A=A.div(D)进行了变换，但是这种变换方式并不等同于D^(-1/2)AD^(-1/2)，请问这里采用A=A.div(D)是有什么特殊原因吗？如果要使用D^(-1/2)AD^(-1/2)变换，可以看下下面的写法正确吗？

D = A.sum(1, keepdim=True)
D_ = torch.diagflat(torch.pow(D,-0.5))
A = torch.mm(D_,torch.mm(A,D_))

why first iteration not use thershold in test code

I wonder why the first iteration not use thershold in connected_components_constraint(vertex, max_sz) / graph_propagation() /graph.py?

Is the experiment in the paper also based on this setting？

I find that the first iteration result 'remain' may be 'null', so code will not do the next iteration and finally clustering result has nothing to do with the model's predict.

For the first iteration:
the 'vertex' contains all the node-pairs/links/edges are generated by KNN and also are the input data of model,
and the code just directly use all these node-pairs/links (neighbors = n.links, line 69) to create groups/cluster, just like BFS algorithm, rather than use the score predicted by the model to filter them. Is it right?

I would be very grateful if you could provide suggestion.

About features of CASIA

Hello, I use another network to extract features from CASIA and use your GCN to train it.

I keep the same parameter setting and find that the accuracy is lower.

I want to ask if there is some point I need to pay attention when I use a new feature.

Thank you.

无监督还是有监督？

代码用了标签来计算损失，可见该方法应该是有监督，为什么题目中还用到cluster等关键词？以及对比实验还与典型无监督方法，例如：K-means等比较？

is this a clustering model or classification model？ why the code "self.labels = np.load(label_path)" is in feeder.py

请问一下这是分类还是聚类，为何feeder.py中会有一个"self.labels = np.load(label_path)"
这个label从何而来？
where's the labels ?

索引越界问题！

作者你好，我在测试期间对knn graph进行构建时，利用类似于下面的语句构建：
result, dists = flann.nn(dataset, testset, 201, algorithm="kmeans", branching=32, iterations=7, checks=16)，但是我使用的dataset, testset,两个数据的大小是不一样的，dataset包含了testset。
在测试时出现了以下问题：
InsexError: index 6209 is out of bounds for axis 0 with size 3368.
具体报错代码是：hops[-1].update(set(self.knn_graph[h][1:self.k_at_hop[d]+1]))

请问这个问题可以解决吗？谢谢！

About acknowledgement

Hi, I'm pleased to find that you've used the code from my repo (https://github.com/XiaohangZhan/cdp/blob/master/source/graph.py). I will appreciate it if you could acknowledge us in README :)

usinng mean aggregation can not satisfy the requriment that G = g(X, A) is an aggregation matrix of size N × N and each row is summed up to 1

G = g(X, A) is an aggregation matrix of size N × N and each row is summed up to 1

but

so
usinng mean aggregation can not satisfy the requriment that G = g(X, A) is an aggregation matrix of size N × N and each row is summed up to 1

looking forward to your reply

Can the author provide the data preprocessing script

Hello, I have some questions about the feature extraction process in the paper. I want to know the specific details of feature extraction. Can the author open source the preprocessing script?

the idea about GCN

你好，请问这篇论文的思路或者采用的gcn与graphsage的具体区别在哪个地方，请教一下创新的具体点，谢谢