This code is a Pytorch implementation of RGB-D salient object detection using cGAN. Two-stream network generates the pixel-wise saliency map and PatchGAN discriminator learns to determine whether the generated saliency map is real or fake.
- Python 3
- Pytorch 1.4
- Torchvision 0.5.0
- Pillow
- Ubuntu 18.04
- Nvidia Geforce GTX 1080Ti
- Pytorch 1.4 docker image
-
Clone this repository
git clone https://github.com/wj1224/rgb-d_salient_object_detection.git
-
Prepare datasets We use NLPR and NJUDS2000 RGB-D saliency detection datasets to train the networks. (additionally DUT-OMRON, HKU-IS, and MSRA10K RGB saliency datasets are used with synthetic depth maps that was generated using pix2pix.
-
Training
cd rgb-d_salient_object_detection python main.py \ --mode train \ --input_dir path/to/trainset \ --output_dir path/to/logs \ --max_epochs 100 \ --cuda \ --[args]
See below for more args.
-
Testing
python main.py \ --mode test \ --input_dir path/to/testset \ --output_dir path/to/output_saliency_maps \ --checkpoint path/to/saved_logs \ --n_epochs 100 \ --cuda
-
More details of args. There are several options on running main.py with --[args].
--mode ["train", "test] : train or test mode selection --input_dir [path/to/imgs] : Folder path which containing input images --output_dir [path/to/output] : Folder path to save logs in training or output images in testing --checkpoint [path/to/logs] : Folder path to resume training or use for testing --n_epochs [100] : Load checkpoint from trained models with "n_epochs" --max_epochs [100] : Number of epochs in training step --batch_size [16] : Size of mini-batch --cuda : Using GPU --threds : Number of threds for data loading --ngf [64] : Number of filters on first convolution layer of the generator --ndf [16] : Number of filters on first convolution layer of the discriminator --lr [0.0002] : Learning rate of Adam optimizer --beta1 [0.9] : Momentum of Adam optimizer --lambda_g [10.0] : Weight on CrossEntropyLoss term of generator loss function --lambda_gp[1.0] : Weight on gradient penalty term of discriminator loss function
-
Pretrained model If you want to testing with pretrained model, download this and put it path/to/logs. The model was trained by using datasets as described in step 2. You can simply test the model with the following command.
python main.py \ --mode test \ --input_dir path/to/testset \ --output_dir path/to/output_saliency_maps \ --checkpoint path/to/pretrained_model \ --pretrained \ --cuda
- NLPR testset
- NJUDS2000 testset
-
F-measure scores Compared to not using depth maps completely and not using only synthetic depth maps in training step.
Dataset Only RGB RGB + real depth map RGB + real and synthetic depth map NLPR 0.7705 0.7780 0.8103 NJUDS2000 0.8014 0.8405 0.8567
Please see this.