SwapNet

Unofficial PyTorch reproduction of SwapNet.

For more than a year, I've put all my efforts into reproducing SwapNet (Raj et al. 2018). Since an official codebase has not been released, by making my implementation public, I hope to contribute to transparency and openness in the Deep Learning community.

Installation

I have only tested this build with Linux! If anyone wants to contribute instructions for Windows/MacOS, be my guest :)

This repository is built with PyTorch. I recommend installing dependencies via conda.

With conda installed run:

cd SwapNet/
conda env create  # creates the conda environment from provided environment.yml
conda activate swapnet

Make sure this environment stays activated while you install the ROI library below!

Install ROI library (required)

I borrow the ROI (region of interest) library from jwyang. This must be installed for this project to run. Essentially we must 1) compile the library, and 2) create a symlink so our project can find the compiled files.

1) Build the ROI library

cd ..  # move out of the SwapNet project
git clone https://github.com/jwyang/faster-rcnn.pytorch.git # clone to a SEPARATE project directory
cd faster-rcnn.pytorch
git checkout pytorch-1.0
pip install -r requirements.txt
cd lib/pycocotools

Important: now COMPLETE THE INSTRUCTIONS HERE!!

cd ..  # go back to the lib folder
python setup.py build develop

2) Make a symlink back to this repository.

ln -s /path/to/faster-rcnn.pytorch/lib /path/to/swapnet-repo/lib

Note: symlinks on Linux tend to work best when you provide the full path.

Dataset

Data in this repository must start with the following:

texture/ folder containing the original images. Images may be directly under this folder or in sub directories.

The following must then be added from preprocessing(see the Preprocessing section below):

body/ folder containing preprocessed body segmentations
cloth/ folder containing preprocessed cloth segmentations
rois.csv which contains the regions of interest for texture pooling
norm_stats.json which contain mean and standard deviation statistics for normalization

Deep Fashion

The dataset cited in the original paper is DeepFashion: In-shop Clothes Retrieval. If you plan to preprocess the images yourself, download the images zip and move the image files under data/deep_fashion/texture.

Alternatively, I've preprocessed the Deep Fashion image dataset already. The full preprocessed dataset can be downloaded here: https://drive.google.com/open?id=1oGE23DCy06zu1cLdzBc4siFPyg4CQrsj. If you want to use your own dataset, please follow the preprocessing instructions below while substituting "deep_fashion" for the name of your dataset.

Otherwise, jump ahead to the Training section.

(Optional) Create Your Own Dataset

If you'd like to take your own pictures, move the data into data/YOUR_DATASET/texture.

Preprocessing

The images must be preprocessed into BODY and CLOTH segmentation representations. These will be input for training and inference.

Body Preprocessing

The original paper cited Unite the People (UP) to obtain body segmentations; however, I ran into trouble installing Caffe to make UP work (probably due to its age). Therefore, I instead use Neural Body Fitting (NBF). My fork of NBF modifies the code to output body segmentations and ROIs in the format that SwapNet requires.

Follow the instructions in my fork. You must follow the instructions under "Setup" and "How to run for SwapNet". Note NBF uses TensorFlow; I suggest using a separate conda environment for NBF's dependencies.
Move the output under data/deep_fashion/body/, and the generated rois.csv file to data/deep_fashion/rois.csv.

Caveats: neural body fitting appears to not do well on images that do not show the full body. In addition, the provided model seems it was only trained on one body type. I'm open to finding better alternatives.

Cloth Preprocessing

The original paper used LIP_SSL. I instead use the implementation from the follow-up paper, LIP_JPPNet. Again, my fork of LIP_JPPNet outputs cloth segmentations in the format required for SwapNet.

Follow the installation instructions in the repository. Then follow the instructions under the "For SwapNet" section.
Move the output under data/deep_fashion/cloth/

Calculate Normalization Statistics

This calculates normalization statistics for the preprocessed body image segmentations, under body/, and original images, under texture/. The cloth segmentations do not need to be processed because they're read as 1-hot encoded labels.

Run the following: python util/calculate_imagedir_stats.py data/deep_fashion/body/ data/deep_fashion/texture/. The output should show up in data/deep_fashion/norm_stats.json.

Training

Train progress can be viewed by opening localhost:8097 in your web browser.

Train warp stage

python train.py --name deep_fashion/warp --model warp --dataroot data/deep_fashion

Sample visualization of warp stage:

Train texture stage

python train.py --name deep_fashion/texture --model texture --dataroot data/deep_fashion

Below is an example of train progress visualization in Visdom. The texture stage draws the input texture with ROI boundaries (left most), the input cloth segmentation (second from left), the generated output, and target texture (right most).

Inference

Inference will run the warp stage and texture stage in series.

To run inference on deep fashion, run this command:

python inference.py --checkpoint checkpoints/deep_fashion \
  --dataroot data/deep_fashion \
  --shuffle_data True

--shuffle_data True ensures that bodys are matched with different clothing for the transfer. By default, only 50 images are run for inference. This can be increased by setting the value of --max_dataset_size.

Alternatively, to translate clothes from a specific source to a specific target:

python inference.py --checkpoint checkpoints/deep_fashion \
  --cloth_dir [SOURCE] --texture_dir [SOURCE] --body_dir [TARGET]

Where SOURCE contains the clothing you want to transfer, and TARGET contains the person to place clothing on.

Comparisons to Original SwapNet

Similarities

Warp Stage
- Per-channel random affine augmentation for cloth inputs
- RGB images for body segmentations
- Dual U-Net warp architecture
- Warp loss (cross-entropy plus small adversarial loss)
Texture Stage
- ROI pooling
- Texture module architecture
- mostly everything else is the same

Differences

Warp Stage
- Body segmentation: Neural Body Fitting instead of Unite the People (note NBF doesn't work well on cropped bodies)
- I store cloth segmentations as a flat 2D map of numeric labels, then expand this into 1-hot encoded tensors at runtime. In the original SwapNet, they used probability maps, but this took up too much storage space (tens of dozens of GB) on my computer.
- Option to train on video data. For video data, the different frames provide additional "augmentation" for input cloth in the warp stage. Use --data_mode video to enable this.
Texture Stage
- Cloth segmentation: LIP_JPPNet instead of LIP_SSL
- Currently VGG feature loss prevents convergence, need to debug!
Overall
- Hyperparameters most likely; the hyperparameters were not listed in the original paper, so I had to experiment with these values.
- Implemented random label smoothing for better GAN stability

TODO:

Copy face data from target to generated output during inference ("we copy the face and hair pixels from B into the result")
Match texture quality produced in original paper (likely due to Feature Loss)
Test DRAGAN penalty and other advanced GAN losses

Credits

The layout of this repository is strongly influenced by Jun-Yan Zhu's pytorch-CycleGAN-and-pix2pix repository, though I've implemented significant changes. Many thanks to their team for open sourcing their code.
Many thanks to Amit Raj, the main author of SwapNet, for patiently responding to my questions throughout the year.
Many thanks to Khiem Pham for his helpful experiments on the warp stage and contribution to this repository.
Thank you Dr. Teng-Sheng Moh for advising this project.

shamim1977 / swapnet Goto Github PK

swapnet's Introduction