Git Product home page Git Product logo

taffywrinkle / semantics-aligned-representation-learning-for-person-re-identification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/semantics-aligned-representation-learning-for-person-re-identification

0.0 1.0 0.0 4.68 MB

This is an implementation of AAAI'20 paper "Semantics-Aligned Representation Learning for Person Re-identification". We leverages dense semantics to address both the spatial misalignment and semantics misalignment challenges in person re-identification.

License: MIT License

Python 99.90% Makefile 0.10%

semantics-aligned-representation-learning-for-person-re-identification's Introduction

Introduction

This repository holds the codes and methods for the following AAAI-2020 paper:

Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. Note that this work is targeted for the applications of finding lost child, and the customer density analysis in retail stores. Person reID is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc.

We propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Our design significantly outperforms the performance of baseline and achieve the state-of-the-art performance.

image

Figure 1: Illustration of the proposed Semantics Aligning Network (SAN). It consists of a base network as encoder (SA-Enc) and a decoder sub-network (SA-Dec). The reID feature vector is obtained by average pooling the feature map of the SA-Enc, followed by the reID losses. To encourage the encoder learning semantically aligned features, the SA-Dec is followed which regresses the densely semantically aligned full texture image with the pseudo groundtruth supervision. In inference, the SA-Dec is discarded.

Synthesized Paired-Image-Texture Dataset (PIT Dataset)

To train the SAN-PG, we synthesize a Paired-Image-Texture dataset (PIT dataset), based on SURREAL dataset, for the purpose of providing the image pairs, i.e., the person image and its texture image. The texture image stores the RGB texture of the full person 3D surface. In particular, we use 929 raster-scanned texture maps provided by the SURREAL dataset to generate the image pairs. On SURREAL, all faces in the texture image are replaced by an average face of either man or woman. We generate 9,290 different meshes of diverse poses/shapes/viewpoints. For each texture map, we assign 10 different meshes and render these 3D meshes with the texture image. Then we obtain in total 9,290 different synthesized (person image, texture image) pairs. To simulate real-world scenes, the background images for rendering are randomly sampled from COCO dataset. Each synthetic person image is centered on a person with resolution 256x128. The resolution of the texture images is 256x256. The PIT dataset can be downloaded from here.

image

Figure 2: Examples of texture images (first row) and the corresponding synthesized person images with different poses, viewpoints, and backgrounds (second row). A texture image represents the full texture of the 3D human surface in a surface-based canonical coordinate system (UV space). Each position (u,v) corresponds to a unique semantic identity. For person images of different persons/poses/viewpoints (in the second row), their corresponding texture images are densely semantically aligned.

Installation

  1. Git clone this repo.
  2. Install dependencies by pip install -r requirements.txt (if necessary).
  3. To install the cython-based evaluation toolbox, cd to torchreid/eval_cylib and do make. As a result, eval_metrics_cy.so is generated under the same folder. Run python test_cython.py to test if the toolbox is installed successfully. (credit to luzai)

ReID Dataset Preparation

Here we use the CUHK03 dataset as an example for description. See torchreid/datasets/__init__.py for details. The data managers of image reID are implemented in torchreid/data_manager.py.

  1. Create a folder named cuhk03/ under /YOUR_DATASET_PATH/.
  2. Download dataset to data/cuhk03/ from http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html and extract cuhk03_release.zip, so you will have data/cuhk03/cuhk03_release.
  3. Download the train/test split protocal from person-re-ranking. What you need are cuhk03_new_protocol_config_detected.mat and cuhk03_new_protocol_config_labeled.mat. Put the two mat files under data/cuhk03. Finally, the data structure would look like
cuhk03/
    cuhk03_release/
    cuhk03_new_protocol_config_detected.mat
    cuhk03_new_protocol_config_labeled.mat
    ...
  1. Use -d cuhk03 when running the training code. In the default mode, we use the new split protocal (767/700). In addition, here we use both labeled modes. Please specify --cuhk03-labeled to train and test on labeled images.

Pseudo Groundtruth Texture Images Generation

We train a network for the purpose of generating pseudo groundtruth texture images for any given input person image. For simplicity, we reuse a simplified SAN (i.e., SAN-PG) which consists of the SA-Enc and SA-Dec, but with only the reconstruction loss. We train the SAN-PG with our synthesized PIT dataset. The SAN-PG model is then used to generate pseudo groundtruth texture image for reID dataset.

Here we provide the pre-trained weight for SAN-PG and the corresponding pseudo texture images generation script generate_texture.py, you can generate the pseudo texture images of your given person images by running:

python generate_texture.py -m /DOWNLOADED_SAN-PG_WEIGHTS -i example_results/input -o example_results/texture

For convenience, we also provide our generated pseudo groundtruth texture images for CUHK03 (Labeled), that is texture_cuhk03_labeled.

  • Place these generated pseudo groundtruth texture images of the CUHK03 dataset to /YOUR_DATASET_PATH/cuhk03/.
  • Finally, the data structure would look like
cuhk03/
    cuhk03_release/
    cuhk03_new_protocol_config_detected.mat
    cuhk03_new_protocol_config_labeled.mat
    texture_cuhk03_labeled
    ...

Train and Evaluation

python main.py \
--root DATASET_PATH \
-s cuhk03 \
-t cuhk03 
--height 256 \
--width 128 \
--optim amsgrad \
--label-smooth \
--lr 8e-04 \
--max-epoch 300 \
--stepsize 40 80 120 160 200 240 280 \
--train-batch-size 64 \
--test-batch-size 100
-a resnet50_fc512 \
--save-dir SAVE_PATH \
--gpu-devices 0 \
--train-sampler RandomIdentitySampler \
--warm-up-epoch 20 \
--cuhk03-labeled \
--eval-freq 80

Reference

If you find our technique and repo useful, please cite our paper. Thanks!

@article{jin2020semantics,
  title={Semantics-aligned representation learning for person re-identification},
  author={Jin, Xin and Lan, Cuiling and Zeng, Wenjun and Wei, Guoqiang and Chen, Zhibo},
  journal={AAAI},
  year={2020}
}

Microsoft Open Source Code of Conduct: https://opensource.microsoft.com/codeofconduct

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

semantics-aligned-representation-learning-for-person-re-identification's People

Contributors

lcl-2019 avatar microsoftopensource avatar msftgits avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.