Git Product home page Git Product logo

pytorch-cyclegan-and-pix2pix's Introduction




CycleGAN and pix2pix in PyTorch

New: Please check out img2img-turbo repo that includes both pix2pix-turbo and CycleGAN-Turbo. Our new one-step image-to-image translation methods can support both paired and unpaired training and produce better results by leveraging the pre-trained StableDiffusion-Turbo model. The inference time for 512x512 image is 0.29 sec on A6000 and 0.11 sec on A100.

Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that enables fast and memory-efficient training.

We provide PyTorch implementations for both unpaired and paired image-to-image translation.

The code was written by Jun-Yan Zhu and Taesung Park, and supported by Tongzhou Wang.

This PyTorch implementation produces results comparable to or better than our original Torch software. If you would like to reproduce the same results as in the papers, check out the original CycleGAN Torch and pix2pix Torch code in Lua/Torch.

Note: The current software works well with PyTorch 1.4. Check out the older branch that supports PyTorch 0.1-0.3.

You may find useful information in training/test tips and frequently asked questions. To implement custom models and datasets, check out our templates. To help users better understand and adapt our codebase, we provide an overview of the code structure of this repository.

CycleGAN: Project | Paper | Torch | Tensorflow Core Tutorial | PyTorch Colab

Pix2pix: Project | Paper | Torch | Tensorflow Core Tutorial | PyTorch Colab

EdgesCats Demo | pix2pix-tensorflow | by Christopher Hesse

If you use this code for your research, please cite:

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, Alexei A. Efros. In ICCV 2017. (* equal contributions) [Bibtex]

Image-to-Image Translation with Conditional Adversarial Networks.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. In CVPR 2017. [Bibtex]

Talks and Course

pix2pix slides: keynote | pdf, CycleGAN slides: pptx | pdf

CycleGAN course assignment code and handout designed by Prof. Roger Grosse for CSC321 "Intro to Neural Networks and Machine Learning" at University of Toronto. Please contact the instructor if you would like to adopt it in your course.

Colab Notebook

TensorFlow Core CycleGAN Tutorial: Google Colab | Code

TensorFlow Core pix2pix Tutorial: Google Colab | Code

PyTorch Colab notebook: CycleGAN and pix2pix

ZeroCostDL4Mic Colab notebook: CycleGAN and pix2pix

Other implementations

CycleGAN

[Tensorflow] (by Harry Yang), [Tensorflow] (by Archit Rathore), [Tensorflow] (by Van Huy), [Tensorflow] (by Xiaowei Hu), [Tensorflow2] (by Zhenliang He), [TensorLayer1.0] (by luoxier), [TensorLayer2.0] (by zsdonghao), [Chainer] (by Yanghua Jin), [Minimal PyTorch] (by yunjey), [Mxnet] (by Ldpe2G), [lasagne/Keras] (by tjwei), [Keras] (by Simon Karlsson), [OneFlow] (by Ldpe2G)

pix2pix

[Tensorflow] (by Christopher Hesse), [Tensorflow] (by Eyyüb Sariu), [Tensorflow (face2face)] (by Dat Tran), [Tensorflow (film)] (by Arthur Juliani), [Tensorflow (zi2zi)] (by Yuchen Tian), [Chainer] (by mattya), [tf/torch/keras/lasagne] (by tjwei), [Pytorch] (by taey16)

Prerequisites

  • Linux or macOS
  • Python 3
  • CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

  • Clone this repo:
git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
cd pytorch-CycleGAN-and-pix2pix
  • Install PyTorch and 0.4+ and other dependencies (e.g., torchvision, visdom and dominate).
    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, you can create a new Conda environment using conda env create -f environment.yml.
    • For Docker users, we provide the pre-built Docker image and Dockerfile. Please refer to our Docker page.
    • For Repl users, please click Run on Repl.it.

CycleGAN train/test

  • Download a CycleGAN dataset (e.g. maps):
bash ./datasets/download_cyclegan_dataset.sh maps
  • To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097.
  • To log training progress and test images to W&B dashboard, set the --use_wandb flag with train and test script
  • Train a model:
#!./scripts/train_cyclegan.sh
python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

To see more intermediate results, check out ./checkpoints/maps_cyclegan/web/index.html.

  • Test the model:
#!./scripts/test_cyclegan.sh
python test.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan
  • The test results will be saved to a html file here: ./results/maps_cyclegan/latest_test/index.html.

pix2pix train/test

  • Download a pix2pix dataset (e.g.facades):
bash ./datasets/download_pix2pix_dataset.sh facades
  • To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097.
  • To log training progress and test images to W&B dashboard, set the --use_wandb flag with train and test script
  • Train a model:
#!./scripts/train_pix2pix.sh
python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA

To see more intermediate results, check out ./checkpoints/facades_pix2pix/web/index.html.

  • Test the model (bash ./scripts/test_pix2pix.sh):
#!./scripts/test_pix2pix.sh
python test.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA
  • The test results will be saved to a html file here: ./results/facades_pix2pix/test_latest/index.html. You can find more scripts at scripts directory.
  • To train and test pix2pix-based colorization models, please add --model colorization and --dataset_mode colorization. See our training tips for more details.

Apply a pre-trained model (CycleGAN)

  • You can download a pretrained model (e.g. horse2zebra) with the following script:
bash ./scripts/download_cyclegan_model.sh horse2zebra
  • The pretrained model is saved at ./checkpoints/{name}_pretrained/latest_net_G.pth. Check here for all the available CycleGAN models.
  • To test the model, you also need to download the horse2zebra dataset:
bash ./datasets/download_cyclegan_dataset.sh horse2zebra
  • Then generate the results using
python test.py --dataroot datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout
  • The option --model test is used for generating results of CycleGAN only for one side. This option will automatically set --dataset_mode single, which only loads the images from one set. On the contrary, using --model cycle_gan requires loading and generating results in both directions, which is sometimes unnecessary. The results will be saved at ./results/. Use --results_dir {directory_path_to_save_result} to specify the results directory.

  • For pix2pix and your own models, you need to explicitly specify --netG, --norm, --no_dropout to match the generator architecture of the trained model. See this FAQ for more details.

Apply a pre-trained model (pix2pix)

Download a pre-trained model with ./scripts/download_pix2pix_model.sh.

  • Check here for all the available pix2pix models. For example, if you would like to download label2photo model on the Facades dataset,
bash ./scripts/download_pix2pix_model.sh facades_label2photo
  • Download the pix2pix facades datasets:
bash ./datasets/download_pix2pix_dataset.sh facades
  • Then generate the results using
python test.py --dataroot ./datasets/facades/ --direction BtoA --model pix2pix --name facades_label2photo_pretrained
  • Note that we specified --direction BtoA as Facades dataset's A to B direction is photos to labels.

  • If you would like to apply a pre-trained model to a collection of input images (rather than image pairs), please use --model test option. See ./scripts/test_single.sh for how to apply a model to Facade label maps (stored in the directory facades/testB).

  • See a list of currently available models at ./scripts/download_pix2pix_model.sh

We provide the pre-built Docker image and Dockerfile that can run this code repo. See docker.

Download pix2pix/CycleGAN datasets and create your own datasets.

Best practice for training and testing your models.

Before you post a new question, please first look at the above Q & A and existing GitHub issues.

Custom Model and Dataset

If you plan to implement custom models and dataset for your new applications, we provide a dataset template and a model template as a starting point.

To help users better understand and use our code, we briefly overview the functionality and implementation of each package and each module.

Pull Request

You are always welcome to contribute to this repository by sending a pull request. Please run flake8 --ignore E501 . and python ./scripts/test_before_push.py before you commit the code. Please also update the code structure overview accordingly if you add or remove files.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
  year={2017}
}

Other Languages

Spanish

Related Projects

contrastive-unpaired-translation (CUT)
CycleGAN-Torch | pix2pix-Torch | pix2pixHD| BicycleGAN | vid2vid | SPADE/GauGAN
iGAN | GAN Dissection | GAN Paint

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection.

Acknowledgments

Our code is inspired by pytorch-DCGAN.

pytorch-cyclegan-and-pix2pix's People

Contributors

3288103265 avatar ahmedhasandrlnd avatar alanyee avatar alexkassil avatar andyli avatar ayushexel avatar bernhardoj avatar bilal2vec avatar charlesdove avatar cmacho avatar happen2me avatar iver56 avatar jpmerc avatar junyanz avatar lambdawill avatar layumi avatar levirve avatar mannam95 avatar mohammedshafeeqet avatar naruto-sasuke avatar ruotianluo avatar salisbury-espinosa avatar shreyas-bk avatar simontreu avatar skuldur avatar ssnl avatar taesungp avatar tariqahassan avatar tylercarberry avatar zirvedaaytimur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-cyclegan-and-pix2pix's Issues

RuntimeError: cuda runtime error (2) : out of memory at .. THCStorage.cu:66

I am trying to train the pix2pix model with the facedes database just like in the tutorial but getting Out Of Memory Error below.

My computer has GTX Titan 6GB. Is it enough?

THCudaCheck FAIL file=/py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "train.py", line 14, in <module> model = create_model(opt) File "/users/taitien.doan/GANs/pix2pix/pytorch-CycleGAN-and-pix2pix/models/models.py", line 18, in create_model model.initialize(opt) File "/users/taitien.doan/GANs/pix2pix/pytorch-CycleGAN-and-pix2pix/models/pix2pix_model.py", line 26, in initialize opt.which_model_netG, opt.norm, opt.use_dropout, self.gpu_ids) File "/users/taitien.doan/GANs/pix2pix/pytorch-CycleGAN-and-pix2pix/models/networks.py", line 46, in define_G netG.cuda(device_id=gpu_ids[0]) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda return self._apply(lambda t: t.cuda(device_id)) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply module._apply(fn) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply param.data = fn(param.data) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda> return self._apply(lambda t: t.cuda(device_id)) File "/users/taitien.doan/anaconda3/envs/pix2pixlua/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda return new_type(self.size()).copy_(self, async) RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493676237139/work/torch/lib/THC/generic/THCStorage.cu:66

Question: PatchGAN Discriminator

Hi there.
I was investigating your CycleGAN paper and code. And looks like discriminator you've implemented is just a conv net, not a patchgan that was mentioned in the paper.
Maybe I've missed something. Could you point me where the processing of 70x70 patches is going on.
Thanks in advance!

Viewing loss plot of previous result

Sorry if these questions are very basic; I am a student and new to this.

Is there a way to retrieve the loss plot or final training accuracy of previous runs?

Is Ctrl^C the correct way to stopping training?

Additionally, we tried to continue a run right from where we left off, using --continue_train, but it did not seem to work. Is there something else we're missing?

CPU only

I went into the options folder and changed base_options to 1 , but it is still looking for a nvidia chip I do not have. Is it possible to run cpu only?

Question: Regarding the unsupported operand type error

Hi.
I've been using the torch based CycleGAN and Pix2pix code for a while and I think it's a great work.
The paper is also astonishing.

Now I want to move to the PyTorch based CycleGan/Pix2pix code.
But I have an error in my environment.

My environment is
-Ubuntu 14.04
-Python 2.7
-CUDA 8

and the error message is attached below.

As far as I remember the PyTorch based code worked OK in my environment.
I guess this problem might be caused by my environment settings and it might be fixed easily.
But I don't have any clue to fix this.

Could you give any tips for this issue?

< Error message>

python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan


...
...
...
model [CycleGANModel] was created
create web directory ./checkpoints/maps_cyclegan/web...
Traceback (most recent call last):
File "train.py", line 20, in
for i, data in enumerate(dataset):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 212, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 239, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 41, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/media/illusion/ML_DATA_SSD_M550/pytorch-CycleGAN-and-pix2pix/data/unaligned_dataset.py", line 46, in getitem
A_img = self.transform(A_img)
File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms.py", line 29, in call
img = t(img)
File "/usr/local/lib/python2.7/dist-packages/torchvision/transforms.py", line 139, in call
ow = int(self.size * w / h)
TypeError: unsupported operand type(s) for /: 'list' and 'int'

multiple forward passes but one backward call for updating G?

I saw the following code in the cycle_gan_model backward_G method

        # GAN loss
        # D_A(G_A(A))
        self.fake_B = self.netG_A.forward(self.real_A)
        pred_fake = self.netD_A.forward(self.fake_B)
        self.loss_G_A = self.criterionGAN(pred_fake, True)
        # D_B(G_B(B))
        self.fake_A = self.netG_B.forward(self.real_B)
        pred_fake = self.netD_B.forward(self.fake_A)
        self.loss_G_B = self.criterionGAN(pred_fake, True)
        # Forward cycle loss
        self.rec_A = self.netG_B.forward(self.fake_B)
        self.loss_cycle_A = self.criterionCycle(self.rec_A, self.real_A) * lambda_A
        # Backward cycle loss
        self.rec_B = self.netG_A.forward(self.fake_A)
        self.loss_cycle_B = self.criterionCycle(self.rec_B, self.real_B) * lambda_B
        # combined loss
        self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B + self.loss_idt_A + self.loss_idt_B
        self.loss_G.backward()

The way I see it, G_A and G_B each has three forward passes, twice accepting the real data and twice the fake data.
In tensorflow (I think) the backward pass is always computed w.r.t the last input data. In this case, the backpropagation of loss_G would be wrong and one should instead do backward pass thrice, each immediately following their involving forward pass.
I assume this is somehow taken care of in pytorch. But how does the model know w.r.t which input data it should compute the gradients?

AssertionError: Torch not compiled with CUDA enabled

When I trained a model, I got the below error. I'm using a Mac book pro. Thanks.

  File "/usr/local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 277, in __new__
    _lazy_init()
  File "/usr/local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 89, in _lazy_init
    _check_driver()
  File "/usr/local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 56, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

NaNs during CycleGAN training

I have run approx. a dozen test runs (using train.py) on 2 datasets (maps and my own custom dataset trying to convert Synthia to Cityscapes). Every run so far is giving NaNs after a couple of epochs, sometimes after more than 70 epochs, sometimes after only a handful of epochs. Until I am getting only NaNs actual learning seems to really happen as e.g. evidenced by looking at transformed images over epoch number. I have also played with various learning rates, but even at pretty low lr NaNs seem to eventually occur.

My question: Is this something others have also observed? Second: in case this is "normal" and e.g. due to the difficulties of training GANs (min-max), what would be critical params to vary to eventually avoid training to break down?

Unable to get as good result as in torch

Hi,
I ran your code on horse2zebra, with loadSize, fineSize and which_model_netG changed to match torch version. However, I couldn't get good results.

Your pytorch model here do have some differences to torch, like different padding, and different training strategy.

I'm wondering if you see similar problems.

Scale transform

Just want to let people know that for me, the scale transform was making the program crash. I worked it out by replacing the list by a single value. I didn't put it in pull requests as I might be the only one who experienced this problem.

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/data/unaligned_dataset.py

Line 26
Before: transform_list.append(transforms.Scale(osize, Image.BICUBIC))
After: transform_list.append(transforms.Scale(opt.loadSize, Image.BICUBIC))

Out of memory when training my own datasets

I want to train my own dataset with ~4800 images of training data, the size of each image is 512×512, no matter when I set the --loadSize (and --fineSize) to 512, 256, 128, the program run out of memory with NVIDIA GTX 1080 (8G GPU memory).

I'm new to use pytorch, I wonder whether it was caused by PyTorch or my GPU memory is not sufficient for your code.

blurred images with my dataset

Hi
first of all you did a very god job!
I am trying to get realistic airplane images from images of CAD models
when i tried to do it with 2000 images of CAD models and 2000 images of real airplanes those are the result
CAD model:
02691156_1a32f10b20170883663e90eaf6b4ca52_a035_e014_t001_d003
real image:
aeroplane_2009_002531_1_1

results:
02691156_1d663e36e305fa8e2178120752ee7a07_a013_e005_t002_d003

Do you think I need more data? more iterations? or that this the best results i should expect regarding the blur effect

a small mistake?

In data/unaligned_dataset.py, line 34, I find a small mistake:

  1. transform_list + = [transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]

I think "+ = " should be changed to "="?

Model parallelism

Hi,
I am trying to generate HD images, but results are not so good and I guess it is because I am training the models using lower resolution images (256x256).
Unfortunately, if I try to train these networks using higher resolution images I run out of memory. So, I was wondering if it is possible to split the networks over multiple GPUs and run the model over multiple devices.

Thanks

Out of memory?

I trained CycleGAN with a Nvidia Tesla K80 GPU, Ubuntu, batchSize=1.
But I got an error of "out of memory".
Anything I have missed? How large memory does this model use?

Edited: I tested the same thing on another machine with Nvidia TitanX , Ubuntu, batchSize=1, and got the same error.

I ran:
python train.py --dataroot ./datasets/horse2zebra --name horse2zebra_cyclegan --model cycle_gan

The messages I got:

---------------Options------------------
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: ./datasets/horse2zebra
dataset_mode: unaligned
display_freq: 100
display_id: 1
display_winsize: 256
fineSize: 256
gpu_ids: [0]
identity: 0.0
input_nc: 3
isTrain: True
lambda_A: 10.0
lambda_B: 10.0
loadSize: 286
lr: 0.0002
max_dataset_size: inf
model: cycle_gan
nThreads: 1
n_layers_D: 3
name: horse2zebra_cyclegan
ndf: 64
ngf: 64
niter: 100
niter_decay: 100
no_flip: False
no_html: False
no_lsgan: False
norm: instance
output_nc: 3
phase: train
pool_size: 50
print_freq: 100
resize_or_crop: resize_and_crop
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
use_dropout: False
which_direction: AtoB
which_epoch: latest
which_model_netD: basic
which_model_netG: resnet_9blocks
-------------- End ----------------
CustomDatasetDataLoader
dataset [UnalignedDataset] was created
#training images = 1067
cycle_gan
---------- Networks initialized -------------
ResnetGenerator (
  (model): Sequential (
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
    (1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU (inplace)
    (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU (inplace)
    (6): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (7): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (8): ReLU (inplace)
    (9): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (10): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (11): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (12): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (13): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (14): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (15): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (16): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (17): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (18): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (19): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (20): ReLU (inplace)
    (21): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (22): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (23): ReLU (inplace)
    (24): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
    (25): Tanh ()
  )
)
Total number of parameters: 11388675
ResnetGenerator (
  (model): Sequential (
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
    (1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU (inplace)
    (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU (inplace)
    (6): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (7): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (8): ReLU (inplace)
    (9): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (10): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (11): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (12): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (13): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (14): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (15): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (16): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (17): ResnetBlock (
      (conv_block): Sequential (
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
        (2): ReLU (inplace)
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
      )
    )
    (18): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (19): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (20): ReLU (inplace)
    (21): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (22): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (23): ReLU (inplace)
    (24): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
    (25): Tanh ()
  )
)
Total number of parameters: 11388675
NLayerDiscriminator (
  (model): Sequential (
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU (0.2, inplace)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (4): LeakyReLU (0.2, inplace)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (7): LeakyReLU (0.2, inplace)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.2, inplace)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
  )
)
Total number of parameters: 2766529
NLayerDiscriminator (
  (model): Sequential (
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU (0.2, inplace)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (4): LeakyReLU (0.2, inplace)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (7): LeakyReLU (0.2, inplace)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    (9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.2, inplace)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
  )
)
Total number of parameters: 2766529
-----------------------------------------------
model [CycleGANModel] was created
create web directory ./checkpoints/horse2zebra_cyclegan/web...
THCudaCheck FAIL file=/home/liyh/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 25, in <module>
    model.optimize_parameters()
  File "/home/liyh/projects/pytorch_implementation/CycleGAN/models/cycle_gan_model.py", line 158, in optimize_parameters
    self.backward_G()
  File "/home/liyh/projects/pytorch_implementation/CycleGAN/models/cycle_gan_model.py", line 144, in backward_G
    self.rec_A = self.netG_B.forward(self.fake_B)
  File "/home/liyh/projects/pytorch_implementation/CycleGAN/models/networks.py", line 170, in forward
    return nn.parallel.data_parallel(self.model, input, self.gpu_ids)
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 103, in data_parallel
    return module(*inputs[0], **module_kwargs[0])
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/home/liyh/anaconda2/envs/pytorch/lib/python2.7/site-packages/torch/nn/functional.py", line 41, in conv2d
    return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /home/liyh/pytorch/torch/lib/THC/generic/THCStorage.cu:66

ConnectionError[Errno 111], during run 'train.py' of maps

when i run :
$ python train.py --dataroot ./datasets/maps --name maps_cyclegan --model cycle_gan

the error code :
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa2c1b19290>: Failed to establish a new connection: [Errno 111] Connection refused',))

how can I solve this??

==================
full script of error

Traceback (most recent call last):
File "/home/khryang/.local/lib/python2.7/site-packages/visdom/init.py", line 228, in _send
data=json.dumps(msg),
File "/home/khryang/.local/lib/python2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/khryang/.local/lib/python2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/khryang/.local/lib/python2.7/site-packages/requests/sessions.py", line 513, in request
resp = self.send(prep, **send_kwargs)
File "/home/khryang/.local/lib/python2.7/site-packages/requests/sessions.py", line 623, in send
r = adapter.send(request, **kwargs)
File "/home/khryang/.local/lib/python2.7/site-packages/requests/adapters.py", line 504, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa2c1b19290>: Failed to establish a new connection: [Errno 111] Connection refused',))
(epoch: 1, iters: 700, time: 0.456) D_A: 0.215 G_A: 0.561 Cyc_A: 2.316 D_B: 0.163 G_B: 0.450 Cyc_B: 0.794

socket.error: [Errno 111] Connection refused

when i run the Pix2Pix
"python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --which_model_netG unet_256 --which_direction BtoA --lambda_A 100 --dataset_mode aligned --use_dropout --no_lsgan"

a problem happened as follow:

model [Pix2PixModel] was created
create web directory ./checkpoints/facades_pix2pix/web...
(epoch: 1, iters: 100, time: 5.015) G_GAN: 2.485 G_L1: 36.558 D_real: 0.151 D_fake: 0.257
(epoch: 1, iters: 200, time: 4.833) G_GAN: 3.015 G_L1: 43.858 D_real: 0.045 D_fake: 0.552
(epoch: 1, iters: 300, time: 4.797) G_GAN: 2.519 G_L1: 39.296 D_real: 0.039 D_fake: 0.149
(epoch: 1, iters: 400, time: 6.720) G_GAN: 2.393 G_L1: 25.259 D_real: 0.200 D_fake: 0.504
End of epoch 1 / 200 Time Taken: 5975 sec
Traceback (most recent call last):
File "train.py", line 20, in
for i, data in enumerate(dataset):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 206, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 22, in recv
return pickle.loads(buf)
File "/usr/lib/python2.7/pickle.py", line 1382, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
fd = multiprocessing.reduction.rebuild_handle(df)
File "/usr/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
conn = Client(address, authkey=current_process().authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 169, in Client
c = SocketClient(address)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
s.connect(address)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused

how to solve the problem?

pix2pix which_direction seems to always be BtoA

Following the directions in the README gave me good results on the facades.

python train.py \
    --display_id 0 \
    --dataroot ./datasets/facades \
    --name facades_pix2pix \
    --model pix2pix \
    --which_model_netG unet_256 \
    --which_direction BtoA \
    --lambda_A 100 \
    --align_data \
    --use_dropout \
    --no_lsgan

btoa

However, running it backwards in AtoB mode seems to not change the operation:

python train.py \
    --display_id 0 \
    --dataroot ./datasets/facades \
    --name facades_pix2pix_rev \
    --model pix2pix \
    --which_model_netG unet_256 \
    --which_direction AtoB \
    --lambda_A 100 \
    --align_data \
    --use_dropout \
    --no_lsgan

atob

For reference, here is an image from the dataroot:

100

Not a huge deal in my case as I can update my dataset appropriately to compensate, but wanted to note this issue.

Multi GPUs not working

I'm trying to run pix2pix network with 2 GPU with option
--gpu_ids 0,1
However only one GPU is running.
I have 2 GPUs both are Titan 6GB

Pretrained models

Do you have any CycleGAN pretrained models available that use pytorch?

How to predict single image

This network seems should feed into a pair of images which contains A and B, what if I want simply feed A image into net and help get a wonderful generation of B kind. How to do this exactly under the circumstance that I trained the network and got saved checkpoints? Any snippet ?

Out Of Memory Error on Facades dataset

I'm trying to run pix2pix on the facades dataset, but I keep getting an Out of Memory Error. I've tried using various --loadSize and --fineSize params, along with installing different versions of PyTorch. I have only a GeForge GTX 650 Ti, so only ~1GB of GPU RAM. I tried installing the no-cuda version of PyTorch, but I still get an out of memory error, even though my computer has 8GB of RAM.

I'm new to Torch and deep learning in general, so I'm likely just using it wrong. Any help is appreciated.

CycleGAN question

hello! I rewrite the cyclegan code based on your code. My code can run, but training can't be the result!
Can you give me your email? I sent my code to you. I hope you can help me find the cause of the problem.

Question: batch size

The paper indicates that training was done with batch size = 1

Is there a reason not to use a slightly larger batch size to more fully occupy the GPUs? For example, are the results better with batch size = 1 than with batch sizes larger than 1?

A question regarding dropout and latent vector x

Dear authors of pytorch-CycleGAN-and-pix2pix,

I have a question regarding latent vector x for CycleGAN.

A while ago, I read a paper related to pix2pix. According to the paper, the random noise is applied to GANs by using dropout.

I can also see that there's a dropout option for CycleGAN.
However, it seems that dropout is off by default(if it is not specified).

So I have a question.
Is the dropout option is the way to provide randomness for CycleGAN?
And if the option is off, does the generator produce the same output every time?

Thanks for your great work.

multiprocessing issue with nThreads>1

Hi,

poking around at the Pix2Pix code I noticed that at times training stops with an error that seems to be related to multiprocessing, probably threading for processing images in parallel. I've set nThreads=1 and that seems to have made the error go away. But I'm wondering if you've seen this in your experiments?

Full trace below:

Traceback (most recent call last):
File "train.py", line 21, in
for i, data in enumerate(dataset):
File "/usr/local/lib/python2.7/dist-packages/future/types/newobject.py", line 71, in next
return type(self).next(self)
File "/home/nbserver/urbanization-patterns/models/pytorch-CycleGAN-and-pix2pix/data/aligned_data_loader_csv.py", line 30, in nex
t

AB, labels, AB_paths = next(self.data_loader_iter)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 206, in next
idx, batch = self.data_queue.get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 378, in get
return recv()
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 22, in recv
return pickle.loads(buf)
File "/usr/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd
fd = multiprocessing.reduction.rebuild_handle(df)
File "/usr/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
conn = Client(address, authkey=current_process().authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 432, in answer_challenge
message = connection.recv_bytes(256) # reject large message
IOError: [Errno 104] Connection reset by peer

FCN score code

Junyan, great work! I cannot find the code to compute the FCN score? Did I miss anything?

Which loss should we monitor

Hi, I used CycleGan to train new data. But I don't know which loss or metrics should I monitor.
I wanted to choose loss G_A, but it seems to not to decrease. I also used the data horse2zebra, but loss G_A still didn't decrease a lot during 200 epochs.

potential bug

In current release version, there is one potential bug when the gpu_ids is not 0, because some tensor is initialized with "torch.cuda.Tensor(*shape)"(self.tensor(*shape)) like in the implementation of GANLoss. I guess it should be initialized in gpu0 by default, that is to say, when the model is not on gpu0, below errors will happen.
Some of weight/gradient/input tensors are located on different GPUs

one simple fix solution might be to add "torch.cuda.device(self.opt.gpu_ids[0])" in parse function of base_options class.

Question: monet2photo training loss

I'm trying to train the monet2photo. My command line was:

python train.py --dataroot ./datasets/monet2photo --name monet2photo --model cycle_gan --gpu_ids 0,1 --batchSize 8 --identity 0.5

The paper discussed using a batch size of 1, but I increased it to 8 to more fully occupy the GPUs. I think this is the only difference between what was described in the paper and my settings, but I may be wrong.

------------ Options -------------
align_data: False
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: ./datasets/monet2photo
display_freq: 100
display_id: 1
display_winsize: 256
fineSize: 256
gpu_ids: [0, 1]
identity: 0.5
input_nc: 3
isTrain: True
lambda_A: 10.0
lambda_B: 10.0
loadSize: 286
lr: 0.0002
max_dataset_size: inf
model: cycle_gan
nThreads: 2
n_layers_D: 3
name: monet2photo
ndf: 64
ngf: 64
niter: 100
niter_decay: 100
no_flip: False
no_html: False
no_lsgan: False
norm: instance
output_nc: 3
phase: train
pool_size: 50
print_freq: 100
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
use_dropout: False
which_direction: AtoB
which_epoch: latest
which_model_netD: basic
which_model_netG: resnet_9blocks
-------------- End ----------------
UnalignedDataLoader
#training images = 6287
cycle_gan

I'm training on two GTX-1070s

I'm about 80 epochs in (~40 hours on my set up) and it seems like I'm oscillating between generated 'photos' that look okay-ish and 'photos' that look pretty 'meh', more like the original painting.

My loss declined pretty rapidly for the first 20 or so epochs, but now seems to be relatively stable with occasional crazy spikes:

newplot

I think it's improving slightly with each epoch based on the images and there seems to be a slight downward trend on the loss, but I also might just be kidding myself because I've been staring at it for a while. In other words, I'm not certain that what it's generating a epoch 80 is really that much better than epoch 30. Here's the most recent detailed loss curve.

newplot 3

Question: Is this expected behavior (more or less) or should I be concerned that I've plateaued and/or used the wrong settings. At 100 epochs the learning rate is set to start decreasing based on the default settings. Given that it's taking about 30 minutes per epoch and thus about 61 more hours to complete 200 epochs, I'm wondering if I should "keep on going" or "abort" and fix some settings.

Got Connection Error

Here it goes:

raceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7fc52b9bd3c8>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.5/dist-packages/requests/packages/urllib3/util/retry.py", line 376, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fc52b9bd3c8>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/visdom/__init__.py", line 228, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 110, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fc52b9bd3c8>: Failed to establish a new connection: [Errno 111] Connection refused',))

Have no idea what this caurse

Aspect ratio flag in test.py

I am trying to use the --aspect_ratio flag in test.py

python test.py --dataroot ./datasets/text/testA/ --name text --model test --which_model_netG unet_256 --which_direction AtoB --dataset_mode single --aspect_ratio 2.0

In the results folder though, the images are still 1:1.

pix2pix without B side image,how to test

hi,sir:
I am training the model on my own dataset with the pix2pix model .I know how to create train dataset that is providing A image and B image ,and then combining the two image into AB image like facades image dataset.The model will learn how to translate A to B and create a fake image. I
But I dont know how to do the test when I only have an A side image.Should I conbine the A image with an empty image or something ? Help wanted ,thank you!

Does `continue_train` option work?

I trained edges2shoes dataset with following command:

python train.py --dataroot ./datasets/edges2shoes --name edges2shoes_pix2pix --model pix2pix --which_model_netG unet_256 --which_direction AtoB --lambda_A 100 --align_data --use_dropout --no_lsgan --batchSize 12 --niter 15 --niter_decay 15 

Then after 1 epoch (and it's already saved its own checkpoint), I interrupt with Ctrl+C. The day after, I want to continue training with following command:

python train.py --dataroot ./datasets/edges2shoes --name edges2shoes_pix2pix --model pix2pix --which_model_netG unet_256 --which_direction AtoB --lambda_A 100 --align_data --use_dropout --no_lsgan --batchSize 12 --niter 15 --niter_decay 15 --continue_train

You can notice I parse in --continue_train option (as I read in options/train_options.py).

I notice that generated fake image is kept, but epoch is reset back to 1, loss is also graphed from nothing.

I wonder if this continued training or not? If not, how can I keep training my model after interrupting it.

undefined symbol: PySlice_Unpack

Cool that you got it all to PyTorch!

When running python train.py --dataroot ./datasets/facades --name facades_cyclegan --model cycle_gan
I get:
Traceback (most recent call last):
File "train.py", line 2, in
from options.train_options import TrainOptions
File "/home/ubuntu/pytorch-CycleGAN-and-pix2pix/options/train_options.py", line 1, in
from .base_options import BaseOptions
File "/home/ubuntu/pytorch-CycleGAN-and-pix2pix/options/base_options.py", line 3, in
from util import util
File "/home/ubuntu/pytorch-CycleGAN-and-pix2pix/util/util.py", line 4, in
from PIL import Image
File "/home/ubuntu/miniconda3/lib/python3.6/site-packages/PIL/Image.py", line 56, in
from . import _imaging as core
ImportError: /home/ubuntu/miniconda3/lib/python3.6/site-packages/PIL/_imaging.cpython-36m-x86_64-linux-gnu.so: undefined symbol: PySlice_Unpack

What versions are you using?

CUDNN_STATUS_BAD_PARAM with output_nc=1

Hi,

I'm testing the Cycle GAN code using 3-channel input data (A) and 1-channel output data (B). I always get the following error message:

RuntimeError: CUDNN_STATUS_BAD_PARAM

However this works fine when I set output_nc to 3.

I can't find any place in the code where output_nc is hard-coded to 3, so I'm guessing this must be a CUDNN issue? Is there any reason you can think of why a 1-channel output should not work with the current architecture in Cycle GAN?

Thanks!

The full trace is below:
Traceback (most recent call last):
File "train.py", line 26, in
model.optimize_parameters()
File "/home/nbserver/urbanization-patt
erns/models/pytorch-CycleGAN-and-pix2pix
/models/cycle_gan_model.py", line 159, i
n optimize_parameters
self.backward_G()
File "/home/nbserver/urbanization-patt
erns/models/pytorch-CycleGAN-and-pix2pix
/models/cycle_gan_model.py", line 141, i
n backward_G
self.fake_A = self.netG_B.forward(se
lf.real_B)
File "/home/nbserver/urbanization-patt
erns/models/pytorch-CycleGAN-and-pix2pix
/models/networks.py", line 170, in forwa
rd
return nn.parallel.data_parallel(sel
f.model, input, self.gpu_ids)
File "/usr/local/lib/python2.7/dist-pa
ckages/torch/nn/parallel/data_parallel.p
y", line 105, in data_parallel
outputs = parallel_apply(replicas, i
nputs, module_kwargs)
File "/usr/local/lib/python2.7/dist-pa
ckages/torch/nn/parallel/parallel_apply.
py", line 46, in parallel_apply
raise output
RuntimeError: CUDNN_STATUS_BAD_PARAM

B to A training?

I'm really enjoying the library. Maybe consider having a flag for training from B->A so the dataset doesn't have to be redone.

some problems about loadSize and fineSize

it seems that when only use the loadsize and finesize , the input image size is always squre, so i try to define the loadSize_H and loadSize_W seperately in order to make input image have dfferent H*W ration.
but i got some errror like this : valueerror unknown resampling filter

is it possible to give the loadsize's height and weight different value w.r.t current network? can you give some advise? thanks~ @junyanz

ntrain is ignored

Couldn't find ntrain anywhere else in the code, so I guess it's not taken into account.

Load on laptop CPUs

I have an HP Notebook - 15-ac026tx with Intel i5 (5th gen), 4GB RAM, 2GB AMD Radeon graphics card.Full Specifications
I want to run your code on my laptop. Won't it harm my laptop battery and its hardware?
I read somewhere on the Internet that training high computation models can severely harm laptops?
I just want to know is it okay if I train it on my laptop?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.