zhengpeng7 / glcnet Goto Github PK

View Code? Open in Web Editor NEW

28.0 5.0 7.0 8.11 MB

[arXiv'21, ICASSP'23] Global-Local Context Network for Person Search.

License: MIT License

Python 99.09% Shell 0.91%

person-search person-reid

glcnet's Introduction

`Global-Local Context Network for Person Search`

This repo is the official implementation of "Global-Local Context Network for Person Search" (ICASSP 2023).

Authors: Jie Qin, Peng Zheng, Yichao Yan, Quan Rong, Xiaogang Cheng, & Bingbing Ni.

[arXiv] [code] [stuff]

Abstract:

Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we exploit them in a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement. Specifically, re-ID embeddings and context features are simultaneously learned in a multi-stage fashion, ultimately leading to enhanced, discriminative features for person search. We conduct the experiments on two person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our approach to a more challenging setting (i.e., character search on MovieNet). Extensive experimental results demonstrate the consistent improvement of the proposed GLCNet over the state-of-the-art methods on all three datasets. Our source codes, pre-trained models, and the new dataset are publicly available at: this https URL.
Overall architecture of our GLCNet:

Performance

Datasets	CUHK-SYSU	CUHK-SYSU	PRW	PRW
Methods	mAP	top-1	mAP	top-1
OIM	75.5	78.7	21.3	49.4
NAE+	92.1	92.9	44.0	81.1
TCTS	93.9	95.1	46.8	87.5
AlignPS+	94.0	94.5	46.1	82.1
SeqNet	93.8	94.6	46.7	83.4
SeqNet+CBGM	94.8	95.7	47.6	87.6
GLCNet	95.5	96.1	46.7	84.9
GLCNet+CBGM	95.8	96.2	47.8	87.8

Different gallery size on CUHK-SYSU:

Qualitative Results:

Env

conda create -n glc python=3.8 -y && conda activate glc
pip install -r requirements.txt

Data

Find all relevant data on my google-drive folder.
Set the variable SYS_HOME_DIR in defaults.py to the root path of all projects. I always set the structure of file system in my machine as SYS_HOME_DIR/codes/[ps/...], SYS_HOME_DIR/datasets/[ps/...], SYS_HOME_DIR/weights/[swin/pvt/...].

Train

sh ./run_${DATASET}.sh CUDA_DEVICE

Test

sh ./test_${DATASET}.sh CUDA_DEVICE

Inference

Run the demo.py to make inference on given images. GLCNet runs at 10.3 fps on a single Tesla V100 GPU with batch_size 3.

Weights

You can download our well-trained models -- cuhk_957.pth and prw_469.pth from my google-drive folder for GLCNet.

MovieNet-PS

Download the whole MovieNet-PS dataset from our google-drive or BaiduDisk (25.2GB, with frames and annotations).
To extend person search framework to a more challenging setting, i.e., movies. We borrow the character detection and ID annotations from the MovieNet dataset to organize MovieNet-PS, and set different levels of training set and different gallery size same as CUHK-SYSU. MovieNet-PS is saved exactly the same format and structure as CUHK-SYSU, which could be of great convenience to further research and experiments. BTW, you can also download all the movie frames in MovieNet on their official website.

If your network is unstable, you can also take a look at this google-drive folder to separately download the annotation files and subsets of the frames, i.e., frames_CS-1.zip ~ frames_CS-6.zip and combine them together.

Acknowledgement

Thanks to the solid codebase from SeqNet.

Citation

@article{zheng2021glcnet,
  title={Global-local context network for person search},
  author={Zheng, Peng and Qin, Jie and Yan, Yichao and Liao, Shengcai and Ni, Bingbing and Cheng, Xiaogang and Shao, Ling},
  journal={arXiv preprint arXiv:2112.02500},
  volume={8},
  year={2021}
}

@inproceedings{qin2023movienet,
  title={MovieNet-PS: a large-scale person search dataset in the wild},
  author={Qin, Jie and Zheng, Peng and Yan, Yichao and Quan, Rong and Cheng, Xiaogang and Ni, Bingbing},
  booktitle=ICASSP,
  pages={1--5},
  year={2023},
  organization={IEEE}
}

glcnet's People

Contributors

Stargazers

Watchers

Forkers

feboreigns jie311 zqx951102 cv-ip ahwhbc beiningwu m-bigbike

glcnet's Issues

Difference between mvn.yaml and mvn_pretrain.yaml in config dir

What is the difference?

What should I run to get the results in the paper?

Multiclass classification

How to make code changes to detect multiclass classification. 30 classes problem.

Is it even possible?

http://movienet.site unable to access, are there any other access methods

About GLC method

Hello. Thank you for great work.

I was wondering if adding your GCL module improves the performance in PRW and CUHK-SYSU datasets also.

Does the performance only improve in MovieNet test dataset?

Pretrained weigths

@ZhengPeng7 hi thanks for the wonderful work and the code base , can you please share the pretrained weight file on google drive or on one drive
Thanks in advance

Network Details

Thank you for your excellent work！I have two questions about network details.
1.Scene Context:
对于一张图片的每一个人来说，Scene Context 是怎么区别呢。是不是每个人所对应Scene Context都是相同的？都是该图片resnet最后输出的特征，经过CE模块后变为一个2048维的向量。应该是这样，但是我还想找你确认一下。
For everyone in a picture, how does Scene Context make the difference? Is the Scene Context the same for everyone? These are the features of the last output of the image resnet. After passing through the CE module, it becomes a vector of 2048 dimensions.It should be so, but I still want to check with you.

2.Group Context
在 Group CE之前，128维的特征向量是怎么来的？
您是把所有正样本的特征变成一个128维的特征吗？假如图中有两个人，那么有两个ROI区域。每一个ROI就有一个256维的特征向量，您把两个256的特征向量变维一个128维的向量。如果是三个人的话，就把三个256维的向量变为一个128维的向量。还有具体您是怎么实现的，直接cat变为256xN维的特征，然后再做一个1X1的卷积，变为128通道的特征，是这样吗？
Before Group CE, how did the 128-dimensional eigenvectors come from?
Are you turning all the features of a positive sample into a 128-dimensional feature? If there are two people in the graph, then there are two ROI regions. Each ROI has a 256-dimensional eigenvector, and you turn two 256 eigenvectors into a 128-dimensional vector. If it were three people, three vectors of 256 dimensions would be changed into one vector of 128 dimensions. And how exactly did you achieve this, directly cat into a 256*N-dimensional feature, and then do a 1X1 convolution to become a 128-channel feature, is that right?

Problems with run demo.py

When I run demo.py using comman as follow:

python3 demo.py --cfg configs/prw.yaml --ckpt ckpts/prw_469.pth

A size mismatch error occurs as follow:
Traceback (most recent call last):
File "demo.py", line 87, in
main(args)
File "demo.py", line 58, in main
resume_from_ckpt(args.ckpt, model)
File "/home/wbn/GLCNet/utils/utils.py", line 417, in resume_from_ckpt
model.load_state_dict(my_state_dict, strict=False)
File "/home/wbn/.conda/envs/glcnet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SeqNet:
size mismatch for roi_heads.embedding_head.projectors.feat_res4.0.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([128, 2048]).
size mismatch for roi_heads.embedding_head.projectors.feat_res4.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for roi_heads.embedding_head.projectors.feat_res4.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for roi_heads.embedding_head.projectors.feat_res4.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for roi_heads.embedding_head.projectors.feat_res4.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for roi_heads.embedding_head.projectors.feat_res4.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).

I’m sure my pytorch version is correct, I don't konw why it happend.
Could you help me ? Thanks.

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

您好！
当我将self.psn_feat_labelledOnly 设置为True时，出现了RuntimeError: Boolean value of Tensor with more than one value is ambiguous 这个错误，使用的数据集为cuhk，这是怎么回事呢？