Git Product home page Git Product logo

gcn_clustering's Introduction

Linkage-based Face Clustering via Graph Convolution Network

This repository contains the code for our CVPR'19 paper Linkage-based Face Clustering via GCN, by Zhongdao Wang, Liang Zheng, Yali Li and Shengjin Wang, Tsinghua University and Australian National University.

Introduction

We present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance(face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs.

Requirements

  • PyTorch 0.4.0
  • Python 2.7
  • sklearn >= 0.19.1

Data Format

Firstly, extract features for IJB-B data, and save the features as an NxD dimensional .npy file, in which each row is a D-dimensional feature for a sample. Then, save the labels as an Nx1 dimensional .npy file, each row is an integer indicating the identity. Lastly, generate the KNN graph (either by brute force or ANN). The KNN graph should be saved as an Nx(K+1) dimensional .npy file, and in each row, the first element is the node index, and the following K elements are the indices of its KNN nodes.

For training, featrues+labels+knn_graphs are needed. For testing, only features+knn_graphs are needed, but if you need to compute accuracy the labels are also needed. We also provide the ArcFace features / labels / knn_graphs of IJB-B/CASIA dataset at OneDrive and Baidu NetDisk, extract code: 8wj1

Testing

python test.py --val_feat_path path/to/features --val_knn_graph_path path/to/knn/graph --val_labels_path path/to/labels --checkpoint path/to/gcn_weights

During inference, the test script will dynamically output the pairwise precision/recall/accuracy. After each subgraph is processed, the test script will output the final B-Cubed precision/recall/F-score (Note that it is not the same as the pairwise p/r/acc) and NMI score.

Training

python train.py --feat_path path/to/features --knn_graph_path path/to/knn/graph --labels_path path/to/labels

We employ the CASIA dataset to train the GCN. Usually, 4 epoch is sufficient. We provide a pre-trained model weights in logs/logs/best.ckpt

Citation

If you find GCN-Clustering helps your research, please cite our paper:

@inproceedings{wang2019gncclust,
  title={Linkage-based Face Clustering via Graph Convolution Network },
  author={Zhongdao Wang, Liang Zheng, Yali Li and Shengjin Wang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Ackownledgement

I borrowed some code on pseudo label propagation from CDP, many thanks to Xiaohang Zhan!

gcn_clustering's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcn_clustering's Issues

about training error.

Hello, when I try to train the program,it got some errors:
How can I slove it?

The details:
Current lr 0.01
Traceback (most recent call last):
File "train.py", line 165, in
main(args)
File "train.py", line 64, in main
train(trainloader, net, criterion, opt, epoch)
File "train.py", line 81, in train
for i, ((feat, adj, cid, h1id), gtmat) in enumerate(loader):
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 280, in next
idx, batch = self._get_batch()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch
return self.data_queue.get()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/Queue.py", line 168, in get
self.not_empty.wait()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 178, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 31687) is killed by signal: Killed.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 71, in _worker_manager_loop
r = in_queue.get()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/queues.py", line 378, in get
return recv()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
return pickle.loads(buf)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
fd = multiprocessing.reduction.rebuild_handle(df)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
conn = Client(address, authkey=current_process().authkey)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/home/xiaozhenzhen/anaconda2/envs/gcn_pytorch/lib/python2.7/multiprocessing/connection.py", line 432, in answer_challenge
message = connection.recv_bytes(256) # reject large message
IOError: [Errno 104] Connection reset by peer

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f77d1c11650>> ignored

关于梯度回传

您好,关于梯度回传的地方,如果不是只考虑 1-hop neighbors,而是考虑所有的情况,精度会下降吗?感觉考虑的更多精度是不是会更高?

pred will contain nan if k_at_hop is large

Thank you very much for your inspiring work!

As suggested in the paper, "In the testing phase, it is not necessary to keep the same configuration with the training phase. ", setting k_at_hop=[20,5] of test.py is reasonable for fast testing. But the pred and loss seem to become nan if k_at_hop=[200,10]. May I ask whether this phenomena is reproduced on your side, and why nan occurs?

Using code in the new Gpu-s(Python Versions)

Hi ! I am trying to use this code in Google Colab and the latest problem that I am facing is the issue that the code is compiled in old cuda version. The error that I am getting is verbatim as follows :

"Found GPU0 Tesla T4 which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org"
Is there any structured way how to compile the code ? Can I try to change the code so it can work in the new Pytorch versions ? Will the current pretrained stuff work ? Please any advice or suggestion would be helpful ! Or at least if you could tell me how you have been able to run the code ? Thanks in advance!

关于原始数据集

作者您好,我有一个问题想请教您一下。
您在论文中提到训练GCN网络是用CASIA数据集对网络进行训练,通过查看源代码我发现您的数据量大小为454590,但是CASIA数据集的大小为494414,请问您有经过筛选后的list吗?

测试的一个小 bug

测试的时候,如果测试集的数量和 batch size 的设置使得最后一个 batch 中只有一张图片时,会出现 invalid index to scalar variable 这样的 IndexError.
这是因为 test.py 第 154 行:

node_list = node_list.long().squeeze().numpy()
bs = feat.size(0)

产生的 node_list 在此时会退化成一维数组,导致索引错误.

可以再加一句检查:

node_list = node_list.long().squeeze().numpy()
bs = feat.size(0)
if bs == 1:
    node_list = np.array([node_list])

How to generate "attention aggregation" feature?

Hello, I saw that you came up with a novel feature aggregation method "Attention Aggregation"(from 3.3), you said that the element in G is generate by a 2-layer MLP model. How to train this matrix, can you provide more detailed information? Thank you. (I didn't find relative source code, did I missed something?)

About the dataset

Hi, you've done an excellent work!Could you please share the IJB-B dataset (including protocols)? Thanks a lot!

bcubed函数里面,根据我的验证,precision和recall两个变量可能搞反了

代码中coo_matrix返回的混淆矩阵axis=0的轴是指预测类别
错误的代码:
precision = np.sum(cm_norm * (cm / cm.sum(axis=0)))
recall = np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
正确的代码:
recall = np.sum(cm_norm * (cm / cm.sum(axis=0)))
precision= np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
但f1又恰好正确。
幸好论文里面没有打印这些数据。

下面是分析:我以BCubed的论文例子做实验
image
pred=np.array([0,0,0,0,1,1,1,2,2,2,2,2,2,2])
label=np.array([0,0,0,0,0,1,1,2,1,3,4,1,1,1])
cm= coo_matrix(
(np.ones((14)), (pred, label)),
shape=(3, 5),
dtype=np.int),toarray()
“”“
[[4, 0, 0, 0, 0],
[1, 2, 0, 0, 0],
[0, 4, 1, 1, 1]]
”“”
np.expand_dims(cm.sum(axis=1), 1)
”“”
[[4],
[3],
[7]]
“”“
cm / np.expand_dims(cm.sum(axis=1),1)
“”“
[[1. , 0. , 0. , 0. , 0. ],
[0.33333333, 0.66666667, 0. , 0. , 0. ],
[0. , 0.57142857, 0.14285714, 0.14285714, 0.14285714]]
”“”
cm * cm / np.expand_dims(cm.sum(axis=1),1)
"""
[[4. , 0. , 0. , 0. , 0. ],
[0.33333333, 1.33333333, 0. , 0. , 0. ],
[0. , 2.28571429, 0.14285714, 0.14285714, 0.14285714]]
"""

np.sum(cm * cm / np.expand_dims(cm.sum(axis=1),1))/cm.sum()

0.5986394557823128

这和你的代码是一样的算法,但是变量名错了:
cm_norm = cm / cm.sum()
recall = np.sum(cm_norm * (cm / np.expand_dims(cm.sum(axis=1), 1)))
这应该是精度
precision:(44/4+1/3+22/3+31/7+44/7)/14 = 0.5986394557823128
image
显然应该要除以每个预测类的总数
np.expand_dims(cm.sum(axis=1), 1)
”“”
[[4],
[3],
[7]]
“”“
axis=1才是计算精度的轴。

模型测试效果不理想

您好,我按照论文中的步骤,用CASIA数据集抽样5000个类(总共22万个样本)做训练集,用512,1024,1845三个数据集做测试集,用train.py的默认参数训练的模型,测试时达不到best.ckpt模型的效果。和best.ckpt比,我的训练结果的precision只有不到0.8。请问您有没有什么建议?

Why use the random subset of CASIA?

Hello, I want to know why you use the random subset instead of the whole CASIA?

Additionally, could you provide the list of the random subset?

Thank you very much.

Extracting features from ArcFace

Hi, I have a question regarding the feature extraction, as I cannot reproduce the results with my own preprocessed files. Given IJB-B-512, your checkpoint for CASIA, and pytorch implementation of ArcFace. I came up with the following code:

import numpy as np
from tqdm import tqdm

import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import datasets

from model import Backbone

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.Resize(size=(112, 112)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])

data_path = "../data/IJB-B-512/"
batch_size = 16
num_workers = 16

data = datasets.ImageFolder(data_path, transform=transform)
loader = torch.utils.data.DataLoader(data, 
                                     batch_size=batch_size, 
                                     num_workers=num_workers,
                                     shuffle=True,
                                     pin_memory=True)

model = Backbone(50, 0.6, 'ir_se')
ckpt = torch.load("../pretrained/model_ir_se50.pth")
model.load_state_dict(ckpt)
model.cuda()
model.eval()

features = []
def hook(module, input, output):
    N, C, H, W = output.shape
    output = output.reshape(N, C, -1)
    features.append(output.mean(dim=2).cpu().detach().numpy())

handle = model._modules['body'][23].res_layer[5].fc2.register_forward_hook(hook)
for i_batch, inputs in tqdm(enumerate(loader), total=len(loader)):
    _ = model(inputs[0].cuda())

features = np.concatenate(features)
handle.remove()

Could you please let me know if my approach makes sense or how is it different from yours or could you kindly share your pre-processing module?

features.zip

您好,您提供的那个供下载features的百度网盘链接,为什么下载时总是显示“下载请求中”,但就是无法下载呢?

IJB-B list

Hi! Could you share the list of 'IJB-B' dataset?

about the features.zip

Thanks for your work.

I wanna know that whether the features in features.zip are extracted from arcface or the resnet-101 trained by yourself?

The dataset that I found wasn't in the type of npy, which makes me confused.

predicting edges are all zeros during training

Hello, when I was trying to train the model with your example, I found predicting edges came to be zeros. (No edge is predicted to be true) Have you ever met this situation?

This status usually occurs after 100 batchs' training, with following args.

k_at_hop = [200, 10]
active_connection = 10
batch_size = 16
momentum = 0.9
weight_decay = 1e-5
lr = 1e-5

关于特征提取

您的CASIA.feas.npy里特征的维度是512,您提到使用resnet101提取特征,但是resnet101提取到的特征不是2048维的吗?感谢解答

关于idea

你这个work应该就是用了《Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition》里面的insight吧,尤其是kNN那块的很明显,还用了人家的label propogation。

关于labels

identities指的是类别数么,第一个测试集512.labels里包含不止512个整数?请问labels里的数值指的是什么?

关于Pression和Recall的问题,代码中应当是计算反了

这个问题我也有点晕乎,反复确认了一下,代码中bcubed那个函数,reference应当是GT,system应当是predict。麻烦作者你再确认一下。
比如GT为[0, 0, 0, 1, 1, 1],预测为[0, 0, 0, 0, 0, 0]。这样precision应当是0.5,recall是1,但是你的代码算出来recall 0.5,precision 为1.

关于邻接矩阵的变换

您好,关于feeder中邻接矩阵的变换我有两个疑问想请教一下:
1、gcn中一般会添加self-loop来做renormalization,但是您的代码中好像没有添加self-loop,请问这是什么原因呢
2、代码中的A通过A=A.div(D)进行了变换,但是这种变换方式并不等同于D^(-1/2)AD^(-1/2),请问这里采用A=A.div(D)是有什么特殊原因吗?如果要使用D^(-1/2)AD^(-1/2)变换,可以看下下面的写法正确吗?

D = A.sum(1, keepdim=True)
D_ = torch.diagflat(torch.pow(D,-0.5))
A = torch.mm(D_,torch.mm(A,D_))

why first iteration not use thershold in test code

I wonder why the first iteration not use thershold in connected_components_constraint(vertex, max_sz) / graph_propagation() /graph.py?

Is the experiment in the paper also based on this setting?

I find that the first iteration result 'remain' may be 'null', so code will not do the next iteration and finally clustering result has nothing to do with the model's predict.

For the first iteration:
the 'vertex' contains all the node-pairs/links/edges are generated by KNN and also are the input data of model,
and the code just directly use all these node-pairs/links (neighbors = n.links, line 69) to create groups/cluster, just like BFS algorithm, rather than use the score predicted by the model to filter them. Is it right?

I would be very grateful if you could provide suggestion.

About features of CASIA

Hello, I use another network to extract features from CASIA and use your GCN to train it.

I keep the same parameter setting and find that the accuracy is lower.

I want to ask if there is some point I need to pay attention when I use a new feature.

Thank you.

无监督还是有监督?

代码用了标签来计算损失,可见该方法应该是有监督,为什么题目中还用到cluster等关键词?以及对比实验还与典型无监督方法,例如:K-means等比较?

索引越界问题!

作者你好,我在测试期间对knn graph进行构建时,利用类似于下面的语句构建:
result, dists = flann.nn(dataset, testset, 201, algorithm="kmeans", branching=32, iterations=7, checks=16),但是我使用的dataset, testset,两个数据的大小是不一样的,dataset包含了testset。
在测试时出现了以下问题:
InsexError: index 6209 is out of bounds for axis 0 with size 3368.
具体报错代码是:hops[-1].update(set(self.knn_graph[h][1:self.k_at_hop[d]+1]))

请问这个问题可以解决吗?谢谢!

the idea about GCN

你好,请问这篇论文的思路或者采用的gcn与graphsage的具体区别在哪个地方,请教一下创新的具体点,谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.