mingyuliutw / unit Goto Github PK

View Code? Open in Web Editor NEW

2.0K 2.0K 358.0 202.36 MB

Unsupervised Image-to-Image Translation

License: Other

Python 95.69% Dockerfile 1.06% Shell 3.25%

deep-learning gan image-translation pix2pix

unit's People

Contributors

Stargazers

Watchers

Forkers

codeaudit wanjinchang pinglmlcv ouya-bytes ml-lab clcarwin mydp2017 arujit freaad jskdr ehfo0 shuolongbj liviust fenling kukuruza longjohncoder kastnerkyle lulllabs arnabgho mkabbasi felicia126 xelmirage zhenyangiacas tony32769 ryfan-rs sarathknv andreea7b gqphia eugenesavenko lukeandshuo anantzoid shashwat-vik soulthym physhik spereree junweima psychosomaticdragon baiyancheng20 yanshanjing nirvguy gearchen inkimage compressionmonkey vshan robotics-research-center athon-millane winwinjjiang shubhampachori12110095 adamrlukaitis puppycodes labimage xiuweihe damax18 yuchen1984 racerxdl bowwowxx sadlifealone cclauss shrimo duoergun0729 yux94 sk7711 newxd tangyoubao agtlucas teach-gtav waizfdc gamezpedia hongyunnchen naeluh qihongl josshad watkyns vaseker havrus hanahimi cagkanciloglu pzhang0610 mldl gfederix taktak1 drah-kah-ris cherylzr yanliang0813 minamaged hsm207 truongbuu h7474 tongjilishu christopher-beckham deep0learning frankatmech liusifei amos-zq angzz agent-0007 temerick pilotbear harpreet-sharda mattphillipsphd

unit's Issues

what is the difference between this and cyclegan

Domain adaptation for day to night examples

I like this paper a lot! Do you have the pretrained model or the data available for the street images/day2night examples?

RuntimeError in python3

Hi! Thank you very much for sharing your code. I encountered some error when training from scratch with celeba blond hair translation using python 3 according to the usage. The itertool.izip in python2 is changed to the built-in zip in python3:

Traceback (most recent call last): File "cocogan_train.py", line 88, in <module> main(sys.argv) File "cocogan_train.py", line 56, in main for it, (images_a, images_b) in enumerate(zip(train_loader_a,train_loader_b)): File "/home/fox/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 301, in __iter__ return DataLoaderIter(self) File "/home/fox/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 171, in __init__ self._put_indices() File "/home/fox/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 210, in _put_indices indices = next(self.sample_iter, None) File "/home/fox/anaconda3/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 115, in __iter__ for idx in self.sampler: File "/home/fox/anaconda3/lib/python3.5/site-packages/torch/utils/data/sampler.py", line 50, in __iter__ return iter(torch.randperm(len(self.data_source)).long()) RuntimeError: invalid argument 1: must be strictly positive at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/TH/generic/THTensorMath.c:2033
How can I fix this problem?

Discriminator update rule in domain adaptation and domain translation

Dear M. Liu,

I am studying your paper that I found very interesting, thank you for sharing your research !
I read the other issues/questions asked, where you pointed that some explanations were not given in the paper, about that I would like to ask you more details about the domain adaptation experiment.

Could you explain the discriminator update rule and more precisely when you compute that please ?
feature_loss_a = self._compute_ll_loss(fake_feat_ab - fake_feat_aa, dummy_variable)
feature_loss_b = self._compute_ll_loss(fake_feat_ba - fake_feat_bb, dummy_variable)
(cocogan_trainer_da.py lines 102-103)

Also, my experiments focus more on domain translation than domain adaptation.
For that purpose, is this update rule still relevant ?

Thank you for your answer.
Adrien

how to get cartoon dataset ??

In paper, you use cartoon dataset

i want to get this dataset...
how can I download it ?

What are the values in the normalized x and y coordinates in SVHN -> MNIST model?

In Appendix B, in the paragraph about SVHN -> MNIST, it is written:

For each input image, we created a 5-channel variant where the first three channels were the original RGB images and the last two channels were the normalized x and y coordinates.

It seems to me that this is implement in cocogan_trainer_da.py:

  def _create_xy_image(self, width=32):
    coordinates = list(itertools.product(range(width), range(width)))
    arr = (np.reshape(np.asarray(coordinates), newshape=[width, width, 2]) - width/2 ) / (width/2)
    new_map = np.transpose(np.float32(arr), [2, 0, 1])
    xy = Variable(torch.from_numpy(new_map), requires_grad=False)
    return xy

The way the code is written, the contents of xy is just -1 or 0 because the environment is Python 2. Did you mean to do this? I thought the values should be between -1 and 1.

lists for celeba

Hello and thanks for open-sourcing the code!

I was wondering if you could provide other lists in datasets/celeba/lists to train with. In particular Sunglasses_ON.txt / Sunglasses_OFF.txt.

Thanks!

AttributeError: 'list' object has no attribute 'gen'

Hi,

I am trying to run the "Cat to Tiger Translation" test with python cocogan_translate_one_image.py --config ../exps/unit/cat2tiger.yaml --a2b 1 --weights ../outputs/unit/cat2tiger/cat2tiger_gen_00500000.pkl --image_name ../images/cat001.jpg --output_image_name ../results/cat2tiger_cat001.jpg and get the following error:

AttributeError: 'list' object has no attribute 'gen'

By inspecting the source code in cocogan_translate_one_image.py:42 (https://github.com/mingyuliutw/UNIT/blob/master/src/cocogan_translate_one_image.py#L42) I find that trainer is here indeed first defined as as list. The exec command in the next line seems to turn this into a dict (?) so it seems that this is the source of the error.

What to do? (I am using Python3)
Thx

Tensorboard summary writer error

I am running version_01, and I am facing problems with the summary writer.

this line throws the following error:

AttributeError: 'module' object has no attribute 'FileWriter'

So, I replaced it with the following line:

train_writer = tf.summary.FileWriter("%s/%s" % (opts.log,os.path.splitext(os.path.basename(opts.config))[0]))

Upon doing so the program runs fine, until this line.

The error it throws is:

in add_summary for value in summary.value:
AttributeError: 'Tensor' object has no attribute 'value'

Can anybody please point out the change I need to make.

Could it work with face rotation?

Hi Ming Yu,

I see the cat2tiger samples having cat and tiger facing left / right, does that mean this project work with the face rotation by itself? No need to annotate or list which cat image facing left, which cat image facing right? Thanks!

Best Wishes,
Chi Kiu SO

error training test

thank you for the opening-source code and I follow the instruction in readme and run into this error:

Traceback (most recent call last):
File "train.py", line 232, in
main(sys.argv)
File "train.py", line 182, in main
for it, (images_a, images_b, images_a2, images_b2) in enumerate(itertools.izip(train_loader_a, train_loader_b, train_loader_a2, train_loader_b2)):
File "/home/lz/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 179, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/lz/UNIT-master/src/datasets/dataset_image.py", line 37, in getitem
crop_img = self._load_one_image(self.images[index])
File "/home/lz/UNIT-master/src/datasets/dataset_image.py", line 62, in _load_one_image
crop_img = img[y_offset:(y_offset + self.image_size), x_offset:(x_offset + self.image_size), :]
TypeError: only integer scalar arrays can be converted to a scalar index

can you hint me where i can make this mistake ,thanks a lot

Questions while understanding code and paper

I was trying to implement UNIT to tensorflow code

So I read about paper and pytorch code and I have some questions about code.

In the paper, there are 4 convolution layers and single FC layer for domain classification in discriminator of domain adaptation from SVHN to MNIST, but code uses dropout layers among convolution layers. Why dropout layers are used in code?
In code, feature_losses that computes l1_loss between two features(fake_feat_ab, fake_feat_aa) are used in discriminator loss, but there is no explanation about that loss in paper. Can you explain why these losses are used?

Maybe I missed point in paper, but I can't find out

3 Questions

Hi !
*** Q1 ***
When using Image translation, did you not use augmentation? I want to know about this, but I can not find it in your code

*** Q2 ***
For example, suppose I want to convert from domain_A (dog) to domain_B (cat). (domainA -> domain_B)

Assume that the number of dog images in domain_A is 1000, and the number of cat images in domain_B is 1500. (That means # domainA < # domainB)

If so, how do I train? I think the learning imbalance will happen because the number of data in two domains is different.
Did you make the number of images in both domains the same?

*** Q3 ***
Generator_loss = G_A_loss + G_B_loss
Discriminator_loss = D_A_loss + D_B_loss

When you train Generator A and B, why did you train with sum of two loss(Generator_loss), instead of training about G_A_loss and G_B_loss, respectively?
Likewise, why did the Discriminator do so?

Segmentation fault problem

Hi Dr. Liu,

I ran the code of training attributed-based face images translation.

When the iteration is about 100, the training will end and encounter a segmentation fault problem.

i.e.,

Iteration: 00000092/02000000
Iteration: 00000093/02000000
Iteration: 00000094/02000000
Iteration: 00000095/02000000
Iteration: 00000096/02000000
Iteration: 00000097/02000000
Iteration: 00000098/02000000
Iteration: 00000099/02000000
Iteration: 00000100/02000000
Segmentation fault

the stack-trace information

Iteration: 00000101/02000000
Iteration: 00000102/02000000
Iteration: 00000103/02000000
Iteration: 00000104/02000000
Iteration: 00000105/02000000
Iteration: 00000106/02000000
Iteration: 00000107/02000000
Iteration: 00000108/02000000

Program received signal SIGSEGV, Segmentation fault.
0x0000555555632cb0 in ?? ()
(gdb) where
#0  0x0000555555632cb0 in ?? ()
#1  0x0000555555632d95 in ?? ()
#2  0x0000555555631f45 in ?? ()
#3  0x0000555555629b64 in _PyObject_GC_Malloc ()
#4  0x000055555562962d in _PyObject_GC_New ()
#5  0x000055555567d991 in ?? ()
#6  0x000055555566b87f in PyObject_GetIter ()
#7  0x000055555564ff90 in PyEval_EvalFrameEx ()
#8  0x000055555564d285 in PyEval_EvalCodeEx ()
#9  0x000055555566a08e in ?? ()
#10 0x000055555563b983 in PyObject_Call ()
#11 0x0000555555659460 in PyEval_CallObjectWithKeywords ()
#12 0x00007fff8f37becd in THPFunction_apply (cls=0x5555569afc80, _inputs=0x7ffff342b050) at torch/csrc/autograd/python_function.cpp:721
#13 0x000055555564f1aa in PyEval_EvalFrameEx ()
#14 0x000055555564d285 in PyEval_EvalCodeEx ()
#15 0x0000555555654d49 in PyEval_EvalFrameEx ()
#16 0x000055555564d285 in PyEval_EvalCodeEx ()
#17 0x000055555566a248 in ?? ()
#18 0x000055555563b983 in PyObject_Call ()
#19 0x00005555556516bd in PyEval_EvalFrameEx ()
#20 0x000055555564d285 in PyEval_EvalCodeEx ()
#21 0x000055555566a08e in ?? ()
#22 0x000055555563b983 in PyObject_Call ()
#23 0x00005555556805de in ?? ()
#24 0x000055555563b983 in PyObject_Call ()
#25 0x00005555556de6a7 in ?? ()
#26 0x000055555563b983 in PyObject_Call ()
#27 0x0000555555654c5f in PyEval_EvalFrameEx ()
#28 0x000055555564d285 in PyEval_EvalCodeEx ()
#29 0x000055555566a248 in ?? ()
#30 0x000055555563b983 in PyObject_Call ()
#31 0x00005555556516bd in PyEval_EvalFrameEx ()
#32 0x000055555564d285 in PyEval_EvalCodeEx ()
#33 0x000055555566a08e in ?? ()
#34 0x000055555563b983 in PyObject_Call ()
#35 0x00005555556805de in ?? ()
#36 0x000055555563b983 in PyObject_Call ()
#37 0x00005555556de6a7 in ?? ()
#38 0x000055555563b983 in PyObject_Call ()
#39 0x0000555555654c5f in PyEval_EvalFrameEx ()
#40 0x000055555564d285 in PyEval_EvalCodeEx ()
#41 0x000055555566a248 in ?? ()
#42 0x000055555563b983 in PyObject_Call ()
#43 0x00005555556516bd in PyEval_EvalFrameEx ()
#44 0x000055555564d285 in PyEval_EvalCodeEx ()
#45 0x000055555566a08e in ?? ()
#46 0x000055555563b983 in PyObject_Call ()
#47 0x00005555556805de in ?? ()
#48 0x000055555563b983 in PyObject_Call ()
#49 0x00005555556de6a7 in ?? ()
#50 0x000055555563b983 in PyObject_Call ()
#51 0x0000555555654c5f in PyEval_EvalFrameEx ()
#52 0x000055555564d285 in PyEval_EvalCodeEx ()
#53 0x000055555566a248 in ?? ()
#54 0x000055555563b983 in PyObject_Call ()
---Type <return> to continue, or q <return> to quit---return
#55 0x00005555556516bd in PyEval_EvalFrameEx ()
#56 0x000055555564d285 in PyEval_EvalCodeEx ()
#57 0x000055555566a08e in ?? ()
#58 0x000055555563b983 in PyObject_Call ()
#59 0x00005555556805de in ?? ()
#60 0x000055555563b983 in PyObject_Call ()
#61 0x00005555556de6a7 in ?? ()
#62 0x000055555563b983 in PyObject_Call ()
#63 0x0000555555654c5f in PyEval_EvalFrameEx ()
#64 0x000055555564d285 in PyEval_EvalCodeEx ()
#65 0x000055555566a248 in ?? ()
#66 0x000055555563b983 in PyObject_Call ()
#67 0x00005555556516bd in PyEval_EvalFrameEx ()
#68 0x000055555564d285 in PyEval_EvalCodeEx ()
#69 0x000055555566a08e in ?? ()
#70 0x000055555563b983 in PyObject_Call ()
#71 0x00005555556805de in ?? ()
#72 0x000055555563b983 in PyObject_Call ()
#73 0x00005555556de6a7 in ?? ()
#74 0x000055555563b983 in PyObject_Call ()
#75 0x0000555555654c5f in PyEval_EvalFrameEx ()
#76 0x0000555555654a4f in PyEval_EvalFrameEx ()
#77 0x000055555564d285 in PyEval_EvalCodeEx ()
#78 0x000055555565555b in PyEval_EvalFrameEx ()
#79 0x000055555564d285 in PyEval_EvalCodeEx ()
#80 0x000055555564d029 in PyEval_EvalCode ()
#81 0x000055555567d42f in ?? ()
#82 0x00005555556783a2 in PyRun_FileExFlags ()
#83 0x0000555555677eee in PyRun_SimpleFileExFlags ()
#84 0x0000555555628ee1 in Py_Main ()
#85 0x00007ffff6f14b45 in __libc_start_main (main=0x555555628810 <main>, argc=8, argv=0x7fffffffeba8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffeb98) at libc-start.c:287
#86 0x000055555562870a in _start ()

About

day2night data

hello mingyuliu's teacher
When I saw your essay, I was very interested. I now use Cycle gan to train from day to night translation, but my day2night data werehave limited . Can you share your data with me? My email is: [email protected]

thank you very much

Some questions about the paper and results

Hi,mingyuliu,thanks for your contribution which inspires me a lot!
But I have one question that during translation,whether CNN detects the facial marks/features of the image and just inpaint the color and texture to the target domain while preserving the outline of the original image?
The image below shows that when transfering from a cat to a tiger,ont only the facial features are changed ,but also the face becomes bigger,so can you tell me the reason?

How many epochs to get the SHVN->MNIST result?

In the code, the maximum epochs to train the SHVN->MNIST model is 100,000.

Do you remember what was the epoch that produced the result stated in the paper i.e. 90.53% accuracy?

Inverted colors

Probably worth mentioning that sometimes the images are obtained with inverted colors, the same behavior of the network I observed in the CycleGAN. Helps restart the training.

Training with animal image dataset

@mingyuliutw Nice work! BTW, how can i get the cropped amimal face image to train your model?
I don't want to use the pretrained model. I want to train the model from scratch with the animal data.

Thanks in advance.

Snowy2Summery model training

Hi MingYu,

Instead of training celeba (face attributes dataset), I would like to test Snowy2Summery.
May I know whether I need to find a whole lot of snow images if I would like to train that model myself?

Thank you very much

Pre trained models

Links of pre-trained models for couple of tasks are available in your git. Do you plan on sharing links to other pre-trained models as well?

image list for cat2tiger

Hi,
Thanks for your great paper and open source.
Could you offer the image list for cat2tiger experiment?
Thanks a lot!

No module named 'net_config'

Hi, I tried to run the training with CelebA dataset, and got the following error when I started train :

Traceback (most recent call last):
File "cocogan_train.py", line 6, in
from tools import *
File "/home/paperspace/Downloads/UNIT/src/tools/init.py", line 6, in
from net_config import *
ModuleNotFoundError: No module named 'net_config'

Do you have any hints I could solve it? thanks!

Hyperparameter information

Hi !
I am reproducing your code with tensorflow.
But I do not know the current hyperparameter information. (batch size, input size, dropout rate, etc.)
Could you tell me which code I can check?

Why is noise in GaussianVAE2D a torch Variable?

In the GaussianVAE2D class definition in the file common_net.py, there is a method named sample.

This is how its defined:

  def sample(self, x):

    mu = self.en_mu(x)

    sd = self.softplus(self.en_sigma(x))

    noise = Variable(torch.randn(mu.size(0), mu.size(1), mu.size(2), mu.size(3))).cuda(x.data.get_device())

    return mu + sd.mul(noise), mu, sd

I don't understand why noise is defined as a Variable because we do not need to differentiate the loss function with respect to it, do we?

A puzzle about gen_update function

def gen_update(self, images_a, images_b, hyperparameters):
  self.gen.zero_grad()
  x_aa, x_ba, x_ab, x_bb, shared = self.gen(images_a, images_b)
  x_bab, shared_bab = self.gen.forward_a2b(x_ba)
  x_aba, shared_aba = self.gen.forward_b2a(x_ab)
  outs_a, outs_b = self.dis(x_ba,x_ab)
  for it, (out_a, out_b) in enumerate(itertools.izip(outs_a, outs_b)):
    outputs_a = nn.functional.sigmoid(out_a)
    outputs_b = nn.functional.sigmoid(out_b)
    all_ones = Variable(torch.ones((outputs_a.size(0))).cuda(self.gpu))
    if it==0:
      ad_loss_a = nn.functional.binary_cross_entropy(outputs_a, all_ones)
      ad_loss_b = nn.functional.binary_cross_entropy(outputs_b, all_ones)
    else:
      ad_loss_a += nn.functional.binary_cross_entropy(outputs_a, all_ones)
      ad_loss_b += nn.functional.binary_cross_entropy(outputs_b, all_ones)

The code above is a part of code in cocogan_trainer.py.
I think the
all_ones = Variable(torch.ones((outputs_a.size(0))).cuda(self.gpu))
should be
all_zeros = Variable(torch.zeros((outputs_a.size(0))).cuda(self.gpu))
Because it calculates the loss when the inputs of Discriminator are fakeA and fakeB.
Is my understanding right?
Do I misunderstand it?

Question about code for minimizing the L1 distance between features for Domain Adaptation

In page 7 of the paper, it says:

Also, for a pair of generated images in different domains, we minimized the L1 distance between the features extracted by the highest layer of the discriminators...

And in the code in cocogan_trainer_da,py, this is implemented as follows:

    dummy_variable = Variable(torch.zeros(fake_feat_aa.size()))
    feature_loss_a = self._compute_ll_lloss(fake_feat_ab - fake_feat_aa, dummy_variable)
    feature_loss_b = self._compute_ll_loss(fake_feat_ba - fake_feat_bb, dummy_variable)

Isn't this L2 loss because self._compute_ll_loss is implemented using torch.nn.MSELoss()?

Question about discriminator performance with weights tied for domain adaptation

Hi,
I had a question about how tying the weights affects the discriminator's ability to differentiate between domains when training for domain adaptation (training source domain's discriminator with target and source weights tied for higher layers, and also minimizing l1 loss between features output by them).
Intuitively it feels like that will prevent the discriminators from being good at telling the domains apart and seems counterproductive to the adversarial part of the training (even though it may be helpful for extracting similar features from two domains to get similar results on whatever task is being performed at the end of it all)?
Just wanted to ask why this is not true(or if this is true)?
Thanks!

Test time z vector representation

Hi I couldnt find in the paper or understand from the code what you are doing during test time in the next manner:

On train time you insert noise and you compute the mu and sd and represent z as a random vector.

What happens during test time, cause i dont see you do the same, do you use the average mu and sd or something else? i would love to know.

thanks,
Gal.

About the svhn dataset

Thanks for your source code for domain adaptation! However, I found the training set used in SVHN-MNIST experiment, you choose the 'extra' split of SVHN but many of other paper's settings are the 'train' split with 73257 images. And I change the split into 'train', the training process is much slower than before. It's accuracy on MNIST test set is 0.3773 in 58k iteration. Have you tried the 'train' split? Will it take more time to train? Thanks a lot!

OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor

I seem to be getting this error. AFAIK I correctly did the CelebA pre-processing by first extracting the original CelebA data and running the resize/crop script which generates the img_align_crop_resize_celeba directory.

Here is the error:

$ python cocogan_train.py --config ../exps/unit/blondhair.yaml --log ../logs
self.image_display_iterations=100
self.image_save_iterations=2500
self.snapshot_save_iterations=5000
self.snapshot_prefix='../outputs/unit/celeba/blondhair/blondhair'
self.hyperparameters={'trainer': 'COCOGANTrainer', 'kl_cycle_link_w': 0.1, 'gan_w': 10, 'll_cycle_link_w': 100, 'batch_size': 1, 'll_direct_link_w': 100, 'lr': 0.0001, 'kl_direct_link_w': 0.1, 'max_iterations': 2000000, 'gen': {'ch': 64, 'name': 'COCOResGen2', 'n_gen_front_blk': 3, 'n_enc_front_blk': 3, 'input_dim_a': 3, 'n_enc_shared_blk': 1, 'input_dim_b': 3, 'n_gen_res_blk': 3, 'n_enc_res_blk': 3, 'n_gen_shared_blk': 1}, 'dis': {'n_front_layer': 2, 'n_shared_layer': 4, 'ch': 64, 'name': 'COCOSharedDis', 'input_dim_a': 3, 'input_dim_b': 3}}
self.datasets={'train_a': {'channels': 3, 'scale': 1.0, 'class_name': 'dataset_celeba', 'folder': 'img_align_crop_resize_celeba/', 'crop_image_size': 128, 'root': '../datasets/celeba/', 'list_name': 'lists/Blond_Hair_ON.txt'}, 'train_b': {'channels': 3, 'scale': 1.0, 'class_name': 'dataset_celeba', 'folder': 'img_align_crop_resize_celeba/', 'crop_image_size': 128, 'root': '../datasets/celeba/', 'list_name': 'lists/Blond_Hair_OFF.txt'}}
self.display=1
dataset=dataset_celeba(conf)
dataset=dataset_celeba(conf)
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
OpenCV Error: Assertion failed (scn == 3 || scn == 4) in cvtColor, file /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp, line 3650
Iteration: 00000001/02000000
Iteration: 00000002/02000000
Traceback (most recent call last):
  File "cocogan_train.py", line 88, in <module>
    main(sys.argv)
  File "cocogan_train.py", line 56, in main
    for it, (images_a, images_b) in enumerate(izip(train_loader_a,train_loader_b)):
  File "/u/beckhamc/.conda/envs/pytorch-env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 187, in __next__
    return self._process_next_batch(batch)
  File "/u/beckhamc/.conda/envs/pytorch-env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
cv2.error: Traceback (most recent call last):
  File "/u/beckhamc/.conda/envs/pytorch-env/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/data/work/beckhamc/UNIT/src/datasets/dataset_celeba.py", line 28, in __getitem__
    crop_img = self._load_one_image(self.images[index])
  File "/data/work/beckhamc/UNIT/src/datasets/dataset_celeba.py", line 34, in _load_one_image
    img = cv2.cvtColor(cv2.imread(img_name), cv2.COLOR_BGR2RGB)
error: /home/travis/miniconda/conda-bld/work/opencv-2.4.11/modules/imgproc/src/color.cpp:3650: error: (-215) scn == 3 || scn == 4 in function cvtColor

I think I know what the error means: it seems to be expecting 3/4 channels for a particular image (is it trying to convert a b/w image to RGB??), but I'm just wondering why I'm getting this issue in the first place since nobody else has raised it here!

I'm using the OpenCV suggested in USAGE.md: conda install -y -c menpo opencv

Accuracy

I am getting low accuracy. May I know why ?

$python cocogan_train_domain_adaptation.py --config ../exps/unit/svhn2mnist.yaml --log ../logs

Iteration: 00000010/00200000
Iteration: 00000020/00200000
Iteration: 00000030/00200000
Iteration: 00000040/00200000
Iteration: 00000050/00200000
Iteration: 00000060/00200000
Iteration: 00000070/00200000
Iteration: 00000080/00200000
Iteration: 00000090/00200000
Iteration: 00000100/00200000
Classification accuracy for Test_B dataset: 0.1296
Iteration: 00000110/00200000
Iteration: 00000120/00200000
Iteration: 00000130/00200000
Iteration: 00000140/00200000
Iteration: 00000150/00200000
Iteration: 00000160/00200000
Iteration: 00000170/00200000
Iteration: 00000180/00200000
Iteration: 00000190/00200000
Iteration: 00000200/00200000
Classification accuracy for Test_B dataset: 0.1032
Iteration: 00000210/00200000
Iteration: 00000220/00200000
Iteration: 00000230/00200000
Iteration: 00000240/00200000
Iteration: 00000250/00200000
Iteration: 00000260/00200000
Iteration: 00000270/00200000
Iteration: 00000280/00200000
Iteration: 00000290/00200000
Iteration: 00000300/00200000
Classification accuracy for Test_B dataset: 0.1084
Iteration: 00000310/00200000
Iteration: 00000320/00200000
Iteration: 00000330/00200000
Iteration: 00000340/00200000
Iteration: 00000350/00200000
Iteration: 00000360/00200000
Iteration: 00000370/00200000
Iteration: 00000380/00200000
Iteration: 00000390/00200000
Iteration: 00000400/00200000
Classification accuracy for Test_B dataset: 0.0826
Iteration: 00000410/00200000
Iteration: 00000420/00200000
Iteration: 00000430/00200000
Iteration: 00000440/00200000
Iteration: 00000450/00200000
Iteration: 00000460/00200000
Iteration: 00000470/00200000
Iteration: 00000480/00200000
Iteration: 00000490/00200000
Iteration: 00000500/00200000
Classification accuracy for Test_B dataset: 0.0984
Iteration: 00000510/00200000
Iteration: 00000520/00200000
Iteration: 00000530/00200000
Iteration: 00000540/00200000
Iteration: 00000550/00200000
Iteration: 00000560/00200000
Iteration: 00000570/00200000
Iteration: 00000580/00200000
Iteration: 00000590/00200000
Iteration: 00000600/00200000
Classification accuracy for Test_B dataset: 0.0912
Iteration: 00000610/00200000
Iteration: 00000620/00200000
Iteration: 00000630/00200000
Iteration: 00000640/00200000
Iteration: 00000650/00200000
Iteration: 00000660/00200000
Iteration: 00000670/00200000
Iteration: 00000680/00200000
Iteration: 00000690/00200000
Iteration: 00000700/00200000
Classification accuracy for Test_B dataset: 0.0819

The svhn mnist examples

Could you please provide the source code of domain adaption from svhn to mnist? I guess there are some differences with the trick training face attribute transformation or other tasks. Thanks very much!

Discriminator share

Why did not the weight of discriminators be shared?
Or maybe you tried, but the results were not good?

synthia2cityscape.yaml

In the file ./exps/unit_local/synthia2cityscape.yaml, the sentence "root: /cosmo/datasets/cityscape_1024x512/" is error? I found that it should set as "root: ../datasets/cityscape" for "python cocogan_translate_one_image.py --config ../exps/unit_local/synthia2cityscape.yaml --a2b 0 --weights ../outputs/unit/street_scene/synthia2cityscape_gen_00250000.pkl --image_name ../images/freiburg_000000_000021_leftImg8bit.png --output_image_name ../results/synthetic_freiburg_000000_000021_leftImg8bit.png"

Details about IR/RGB conversion

Hi, Mingyu.
I'm studying your amazing work about UNIT, and I'm very interested in UNIT's
attractive poteintial of IR/RGB conversion. So I tried to repeat the experiment. Since the detailed parameters of training are not provided, I just tried borrowing settings in synthia2cityscape.yaml for the training. But the results were not satisfying.
So I'm wondering is it right for me to do so? If it's not right to borrow settings in synthia2cityscape.yaml, is there any possibility that settings for IR/RGB conversion may be released?
Thanks for your attention and amazing work:)

Loss Function

In which file you have defined loss function such as mean squared error (MSE) and L2 for domain adaptation ?

Several Question

Hi, thanks for making the code public available. It is really an amazing work.
After read the paper and the source code, I have several questions.

In file cocogan_trainer.py, from line 73 to line 82, those member variables, self.gen_xx_loss_xx.
I only find their definition, but I haven't find where they are used. So, what's these variables for?
I try to train CelebA blondhair, and I find the batch size was set to 1, could the batch size be bigger than 1?
I find the generated image of CelebA blondhair is a little blurred. Is this phenomenon normal?

Clarification on the KL Divergence term in the Generator loss for SHVN->MNIST model

I have a question about the _compute_kl function in the class COCOGANDAContextTrainer. The following are the relevant parts of the code:

  def _compute_kl(self, mu, sd):
    mu_2 = torch.pow(mu, 2)
    sd_2 = torch.pow(sd, 2)
    encoding_loss = (mu_2 + sd_2 - torch.log(sd_2)).sum() / mu_2.size(0)
    return encoding_loss

This function was used in gen_update:

    for i, lt in enumerate(lt_codes):
      encoding_loss += 2 * self._compute_kl(*lt)
    total_loss = hyperparameters['gan_w'] * ad_loss + \
                 hyperparameters['kl_normalized_direct_w'] * encoding_loss + \
                 hyperparameters['ll_normalized_direct_w'] * (ll_loss_a + ll_loss_b)

My question is how did you derive the formula to compute the KL divergence term?

I thought it was based on the Auto-Encoding Variational Bayes paper which has the following parts:

and in Appendix B:

I note the following differences between the code and the paper (Auto-Encoding Variational Bayes):

The KL divergence term is multiplied by 2 instead of 1/2. I guess this does not matter much since it just rescales the loss.
There is no - 1 in the encoding_loss. Did you choose not to include this term because it will not change the optimum point anyway?

Feature request: Rectangular images

As far as I understand, in the current version only image sets consisting of images where width == height are properly supported. Current workaround would be to pre- and post-scale aspect ratios or add padding. Niether is ideal.

Did the paper convert MNIST to RGB for the SVHN -> MNIST model?

In appendix B, under the paragraph about SVHN->MNIST, it is written:

We also found spatial context information was useful. For each input image, we created a 5-channel variant where the first three channels were the original RGB images and the last two channels were the normalized x and y coordinates.

It sounds like this paragraph is saying that the MNIST image, which is originally in grayscale, gets converted to RGB so that when we add the spatial features, the total number of channels for the MNIST image is 5. But in the code, the channel of the MNIST image after adding the spatial feature is only 3.

Did I misread the paragraph?

out of memory training

Hi guys, I tried to train on GTX1080Ti, Ubuntu 16.04, cuda 8, cudnn 6 with image sizes 640x480, and config:
train:
snapshot_save_iterations: 5000 # How often do you want to save trained models
image_save_iterations: 2500 # How often do you want to save output images during training
image_display_iterations: 100
display: 1 # How often do you want to log the training stats
snapshot_prefix: ../outputs/unit/night2day/ # Where do you want to save the outputs
hyperparameters:
trainer: COCOGANTrainer
lr: 0.0001 # learning rate
ll_direct_link_w: 100 # weight on the self L1 reconstruction loss
kl_direct_link_w: 0.1 # weight on VAE encoding loss
ll_cycle_link_w: 100 # weight on the cycle L1 reconstruction loss
kl_cycle_link_w: 0.1 # weight on the cycle L1 reconstruction loss
gan_w: 10 # weight on the adversarial loss
batch_size: 1 # image batch size per domain
max_iterations: 500000 # maximum number of training epochs
gen:
name: COCOResGen
ch: 64 # base channel number per layer
input_dim_a: 3
input_dim_b: 3
n_enc_front_blk: 3
n_enc_res_blk: 3
n_enc_shared_blk: 1
n_gen_shared_blk: 1
n_gen_res_blk: 3
n_gen_front_blk: 3
dis:
name: COCODis
ch: 64
input_dim_a: 3
input_dim_b: 3
n_layer: 6
datasets:
train_a: # Domain 1 dataset
channels: 3 # image channel number
scale: 1 # scaling factor for scaling image before processing
crop_image_height: 480 # crop image size
crop_image_width: 640 # crop image size
class_name: dataset_image # dataset class name
root: ../datasets/sg/ # dataset folder location
folder: night/
list_name: lists/night.txt # image list
train_b: # Domain 2 dataset
channels: 3 # image channel number
scale: 1 # scaling factor for scaling image before processing
crop_image_height: 480 # crop image size
crop_image_width: 640 # crop image size
class_name: dataset_image
root: ../datasets/sg/
folder: sunny/
list_name: lists/sunny.txt

However, I encountered out-of memory error as follows:
self.display=1
dataset_image
dataset=dataset_image(conf)
dataset_image
dataset=dataset_image(conf)
Iteration: 00000001/00500000
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "cocogan_train.py", line 88, in
main(sys.argv)
File "cocogan_train.py", line 64, in main
image_outputs = trainer.gen_update(images_a, images_b, config.hyperparameters)
File "/media/ml3/Volume/UNIT/src/trainers/cocogan_trainer.py", line 71, in gen_update
total_loss.backward()
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/function.py", line 91, in apply
return self._forward_cls.backward(self, *args)
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 210, in backward
return grad_output.mul(ctx.constant).mul(var.pow(ctx.constant - 1)), None
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/variable.py", line 339, in mul
return Mul.apply(self, other)
File "/home/ml3/.conda/envs/torch/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 48, in forward
return a.mul(b)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCStorage.cu:66

Thanks for your comments!

Pretrained models.

hello mingyliutw
could you share pretrained models(Day2Night、Snowy2Summery)
thank you very much

Error running testing

Hi, thank you very much for your work.

I have downloaded the pretrained model and tried to run the testing in the README:
./translate_one_image.py --config ../exps/celeba_blond_hair.yaml --image_name ../images/ian.jpg --output_image_name ../results/ian_to_eyeglasses.jpg --weights ../snapshots/celeba_eyeglasses_gen_00500000.pkl --a2b 0

And I face this error:

self.image_save_iterations=500
self.display=1
self.snapshot_prefix='../outputs/celeba_blond_hair/celeba_blond_hair'
self.hyperparameters={'gen': 'CoVAE', 'dis': 'CoDis', 'vae_enc_w': 1e-05, 'vae_ll_w': 0.0001, 'gan_w': 1.0, 'batch_size': 1, 'ch': 64, 'max_iterations': 500000}
self.datasets={'a': {'channels': 3, 'image_size': 128, 'scale': 0, 'class_name': 'dataset_image', 'root': '../datasets/celeba/', 'folder': 'img_align_crop_resize_celeba/', 'list': 'lists/Blond_Hair_ON.txt'}, 'b': {'channels': 3, 'image_size': 128, 'scale': 0, 'class_name': 'dataset_image', 'root': '../datasets/celeba/', 'folder': 'img_align_crop_resize_celeba/', 'list': 'lists/Blond_Hair_OFF.txt'}}
Traceback (most recent call last):
  File "./translate_one_image.py", line 116, in <module>
    main(sys.argv)
  File "./translate_one_image.py", line 86, in main
    trainer = unit_trainer.UNITTrainer(gen_net, dis_net, batch_size, ch, input_dims, image_size)
  File "/home/iis/Documents/UNIT/src/trainers/unit_trainer.py", line 52, in __init__
    exec( 'self.dis = %s(ch, true_input_dims)' % dis)
  File "<string>", line 1, in <module>
NameError: name 'CoDis' is not defined

Can you tell me what did I do wrong?
Thank you beforehand.

Some Puzzle about Shared Latent Representation

In the paper, the author assumes that the image in two different domain can be coded into a common latent representation. What I what to know is that if the "Shared Latent Representation" assumption works for unaligned data. Because in the training process, the model must pick arbitrary image from each distribution, and if the data is unaligned, how to make the two images coded in to a common representation? In other words, whether the shared latent space assumption requires the aligned data?

Is there any reason (either empirical or theoretical) to choose Instance normalization instead of Batch normalization? The paper refer 'resnet' paper for explaining RESBLK but I think 'resnet' does not have instance normalization layer by default.
Additionally, is there any reason to choose RESBLK only in few layers of encoder & decoder?