Git Product home page Git Product logo

phillipi / pix2pix Goto Github PK

View Code? Open in Web Editor NEW
9.9K 322.0 1.7K 2.44 MB

Image-to-image translation with conditional adversarial nets

Home Page: https://phillipi.github.io/pix2pix/

License: Other

Lua 72.65% Python 17.90% Shell 0.98% MATLAB 6.28% TeX 2.19%
computer-vision computer-graphics gan pix2pix dcgan generative-adversarial-network deep-learning image-generation image-manipulation image-to-image-translation

pix2pix's People

Contributors

ag2s1 avatar ajayjapan avatar arthur-qiu avatar brannondorsey avatar dexhunter avatar junyanz avatar kylemcdonald avatar naruto-sasuke avatar phillipi avatar rwv avatar salisbury-espinosa avatar tinghuiz avatar uakfdotb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pix2pix's Issues

Linux Dependencies for Mac: gwc: command not found

When running the train.lua on the training datasets during the process of combining the individual files into a single collection the following error messages appear

load the large concatenated list of sample paths to self.imagePath
cmd..gwc -L '/tmp/lua_hHqpLx' |gcut -f1 -d' '
sh: gwc: command not found
sh: gwc: command not found
/Users/mader/torch/install/bin/luajit: ...rs/mader/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 2 callback] /Users/mader/pix2pix/data/dataset.lua:211: attempt to perform arithmetic on a nil value
Creating train metadata
serial batch:, 	0
table: 0x0360eaa8
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
/tmp/lua_KBxBlh: line 1: gfind: command not found
/tmp/lua_93oBgs: line 1: gfind: command not found

How to do a 'brain scan' on cGAN

What would be the best way to tap into the Generator in order to inspect or manipulate the high-level features created in it? FYI, if I apply cGAN on images of human faces, then I expect that eventually the Generator will create high-level features similar to typical deep CNNs would, perhaps at the level of noses and eyes, etc. This is borne out in my experiment with facial images, where I am able to observe whole eye being scaled and moved around by the Generator during training.

This is of great interest to me, because this amounts to unsupervised learning of the facial parts (i.e., eyes, noses, etc.), since cGAN would then kind of know that a nose in photo A and another nose in photo B are sort of the same type, if both actually activate the same 'nose neuron.' This is a big deal to me, because if this is in fact the case, then once a text label is attached to one nose in one photo (through supervised learning), then we would know right way that the 'nose' label is likely applicable to all those other noses in other photos (kind of a type of one-shot learning isn't it).

So my question here boils down to: how can I detect (after training) that two noses from two photos actually activates the same neuron (or not) in cGAN? I thought that this is somewhat similar to doing fMRI scan on human brain, thus the 'brain scan' analogy.

Needless to say, Lua is quite new to me so any pointer to my question above would be very much appreciated.

Running the test.lua example line throws an error

It looks to me like there might have been a change in the way test.lua works which has not yet been reflected in the download_model.sh script:

when trying to run the example in the docs (after downloading the pre-trained facades model)
DATA_ROOT=/home/mario/Documents/pix2pix/datasets/facades/ name=facades which_direction=AtoB phase=val th test.lua

I get
/home/mario/torch-cl/install/bin/luajit: cannot open </home/mario/Documents/pix2pix/checkpoints/facades/latest_net_G.t7> in mode r at /home/mario/torch-cl/pkg/torch/lib/TH/THDiskFile.c:640

I could fix this by creating a "checkpoints" folder and then copying /models/facades/facades_label2image.t7 to /checkpoints/facades/latest_net_G.t7 - but of course ideally this should work without a workaround

Proper aspect ratio to use when preparing own dataset

Found the following in train.lua:

loadSize = 286, -- scale images to this size
fineSize = 256, -- then crop to this size

but the following in test.lua:

loadSize = 256,           -- scale images to this size
fineSize = 256,           --  then crop to this size

This looks inconsistent. Was '286' actually a typo and should be '256' instead?

I have found that my own dataset (not 1:1 aspect ratio) always got converted to 1:1 aspect ratio during training and look stretched, instead of being scaled proportionally then cropped which shouldn't look stretched. Should I always prepare my dataset in 1:1 aspect ratio?

Question about patchGAN

I read your paper and the implementation. The methods about patchGAN described in your paper seems promising, but how it is used in your code i wonder. You preprocess the data into patches before loading or some other way?

a problem about cuda

No matter I use the test.lua or train.lua, the error always happens.
However, my GPU is not busy for doing this job

Problems:
Updating classList and imageClass appropriately
[==================== 1/1 ===================>] Tot: 0ms | Step: 0ms
Cleaning up temporary files
Dataset Size: 100
checkpoints_dir ./checkpoints
THCudaCheck FAIL file=/home/amax/torch/extra/cutorch/init.c line=261 error=2 : out of memory
/home/amax/torch/install/bin/luajit: /home/amax/torch/install/share/lua/5.1/trepl/init.lua:389: /home/amax/torch/install/share/lua/5.1/trepl/init.lua:389: /home/amax/torch/install/share/lua/5.1/cudnn/find.lua:165: cuda runtime error (2) : out of memory at /home/amax/torch/extra/cutorch/init.c:261
stack traceback:
[C]: in function 'error'
/home/amax/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
/data/lulei/tools/torch/pix2pix/util/util.lua:187: in function 'load'
test.lua:76: in main chunk
[C]: in function 'dofile'
...amax/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Result in encoder_decoder

First, thanks for great work!
I tried your code and met some problem,when I choose the opt.which_model_netG = 'encoder_decoder', the output result is a simple color image like it ,such as red color,after some epochs ,it turn blue or other color.
I tried to watch it after the code fake_B = netG:forward(real_A) on the 221 line, but the fake_B still turn simple color image,I want to ask why 'encoder_decoder' could not process the result like 'unet'. If the opt.which_model_netG='unet' ,the result is normal.
2017-02-25 23 38 48

When I run the demo,the opts I tried is (other opts keep default):
DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA which_model_netG = encoder_decoder th train.lua

using 128px images?

256px images work fine, but when I try loadSize=128 fineSize=128
it errors out with

transferring to gpu...
done
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnGetConvolutionNdForwardOutputDim)
stack traceback:
[C]: in function 'error'
/home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:137: in function 'createIODescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:366: in function 'func'
...e/ubuntu/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...e/ubuntu/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
train.lua:220: in function 'createRealFake'
train.lua:324: in main chunk
[C]: in function 'dofile'
...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

is there another parameter I need to change to use smaller images?

Reason of using one discriminative network

Hi,

Thank you for this cool project. Comparing this work with Pixel level domain transfer(my implementation at https://github.com/fxia22/pldtgan), this work uses only one discriminative network D, instead of a real/fake discriminative network D and an associate/not associate discriminative network. Any reason behind this? Is it because the domain transfer is smaller (input and output well aligned).

Thanks!

Create smooth transitions in the generated images

First of all, amazing work!

In the paper it was mentioned that the use of a z noise vector for the generator (as in the standard GAN) was not effective, so in the end the noise is provided by dropouts instead. In the standard GAN I can do things such as sampling the trained generator along a path in the z space and find smooth transition for those images generated along the path. This has allowed me to create smooth animation of human faces with convincing expressions. Is it fair to say that the randomness in dropouts as used in pix2pix really won't allow me to do the same (i.e., sampling smooth transition in the generated images)?

Feature Request: Add a "transform" mode/script which does not require image pairs

Unless I am missing something obvious it seems like if I want to use a trained model to process a batch of images I still have to prepare the images so they have the "pair" format. That seems to be a bit of a waste since one half will be ignored and the pairs have to be split anyway.

So what I would wish for is a script that I can just point to a folder of unprocessed raw images and it will just produce the "result". And of course it would be extra nice if the images could have any proportion and not just the fixed one given by "aspect_ratio"

Got `internal error in __sub: no metatable` error when I train the sample dataset

I followed Getting Started instruction and installed pix2pix.
Then, downloaded facades dataset by bash ./datasets/download_dataset.sh facades
and run DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua

but I got the following error.

~/torch-cl/install/bin/luajit: ./models.lua:69: internal error in __sub: no metatable
stack traceback:
	[C]: in function '__sub'
	./models.lua:69: in function 'defineG_unet'
	train.lua:110: in function 'defineG'
	train.lua:146: in main chunk
	[C]: in function 'dofile'
	...i/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
	[C]: at 0x010ec81ce0

How can I fix this?

  • run on Mac OSX
  • python2.7

Question regarding continuing of training

Using the "continue_train" option I can continue training a model. What I would like to know is if this is really continuing where the training was interrupted or more of a fine-tuning initialized from a previous model? The reason I ask is that whilst the previous model is loaded the epochs counter gets reset to 1 and I am not sure what happens with the learning rate and other parameters.

Visualizing Training Loss

Hi there,

Is any kind of real-time training plotting visualization in the works for pix2pix?

I'm interested in visualizing Err_G, Err_D, and ErrL1 with the UI display while training with train.lua. I've only poked around with lua and torch with various machine learning projects and have yet to write much of either, although I'm happy to dig-in and try and figure something like this out if it seems like it would be rather trivial/helpful.

My initial thought would be to use display.plot(...) to update the display on each training batch. Anybody more familiar with the code base have any ideas or examples they would like to share?

P.S. Really rad paper + source code, super excited to have access to this research :) Thanks to all who are working on this!

Got error when running combin_A_and_B.py

Got the following message when trying to use the provided tool combin_A_and_B.py for preparing a new dataset:

Traceback (most recent call last):
File "combine_A_and_B.py", line 46, in
im_A = cv2.imread(path_A, cv2.CV_LOAD_IMAGE_COLOR)
AttributeError: 'module' object has no attribute 'CV_LOAD_IMAGE_COLOR'

It turned out that my system has OpenCV3 installed which has some differences from OpenCV2. I changed the offending line to the following and the problem went away:

        im_A = cv2.imread(path_A, cv2.IMREAD_COLOR)

Hope this helps somebody.

more datasets available?

Hello, the codes and tutorials you provided work well. And i also wanna try some other datasets shown in your paper, can you provided these datasets?

run th train.lua error ?

running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
trainCache	/Users/swinghu/prog/torch/pix2pix/cache/_Users_swinghu_prog_torch_pix2pix_datasets_facades_train_trainCache.t7
Creating train metadata
serial batch:, 	0
table: 0x03a7b698
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
/tmp/lua_WkxyhE: line 1: gfind: command not found
/tmp/lua_BMi2XQ: line 1: gfind: command not found
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
cmd..gwc -L '/tmp/lua_JNRyb3' |gcut -f1 -d' '
load the large concatenated list of sample paths to self.imagePath
cmd..gwc -L '/tmp/lua_Ho6Snl' |gcut -f1 -d' '
/Users/swinghu/torch/install/bin/luajit: .../swinghu/torch/install/share/lua/5.1/threads/threads.lua:264: 
[thread 2 callback] /Users/swinghu/prog/torch/pix2pix/data/dataset.lua:215: Could not find any image file in the given input paths
stack traceback:
	[C]: in function 'assert'
	/Users/swinghu/prog/torch/pix2pix/data/dataset.lua:215: in function '__init'
	/Users/swinghu/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/swinghu/torch/install/share/lua/5.1/torch/init.lua:87>
	[C]: in function 'dataLoader'
	/Users/swinghu/prog/torch/pix2pix/data/donkey_folder.lua:171: in main chunk
	[C]: in function 'dofile'
	/Users/swinghu/prog/torch/pix2pix/data/data.lua:39: in function </Users/swinghu/prog/torch/pix2pix/data/data.lua:29>
	[C]: in function 'xpcall'
	.../swinghu/torch/install/share/lua/5.1/threads/threads.lua:231: in function 'callback'
	...rs/swinghu/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...rs/swinghu/torch/install/share/lua/5.1/threads/queue.lua:41>
	[C]: in function 'pcall'
	...rs/swinghu/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
	[string "  local Queue = require 'threads.queue'..."]:13: in main chunk
[thread 1 callback] /Users/swinghu/prog/torch/pix2pix/data/dataset.lua:215: Could not find any image file in the given input paths
stack traceback:

when i use my macbook pro,after download the datasets,run train the model
DATA_ROOT=~/prog/torch/pix2pix/datasets/facades name=facades_generation which_direction=BtoA gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 th train.lua , the error msg displaye, is lua version depended?
my version

lua -v
Lua 5.2.4  Copyright (C) 1994-2015 Lua.org, PUC-Rio

U-net with 480 x 640

Is it possible to somehow use the U-net architecture with images of size 480 x 640?
In its current implementation the U-net seems to only work with images of size 256x256,
due to the receptive fields. Also sizes that are not powers of 2 also don't work.
Is there some work around?

Thank you!

why setting evaluate mode in test.lua yeilding black images?

Thanks for open sourcing this great project.
I saw this line of code in test.lua: --netG:evaluate(), I guess this is the code that enables dropout at testing time. I tried to set evaluate mode when testing, but the model generates black images.

Why is enabling dropout important when testing? Or is it possible that the black images are caused by BN layers, Anybody has insight on this? Thanks.

Error for training

Hi,

I followed you instructions and tried to train your network with the provided dataset. But I get the following error:

/usr/local/torch/install/bin/luajit: ./models.lua:52: nn.SpatialConvolution has no negation operator
stack traceback:
[C]: in function '__unm'
./models.lua:52: in function 'defineG_unet'
train.lua:110: in function 'defineG'
train.lua:147: in main chunk
[C]: in function 'dofile'
...ocal/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Do you know what the problem is?

Problem with loading saved models

Hi,
When I am using continue_train=1 option, The models are getting loaded and transferred into GPU, but was getting the following error at the createRealFake function. It is showing some error in Dropout layer.
output:
Dataset Size: 2500 loading previously trained netG... loading previously trained netD... nn.gModule nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output] (1): cudnn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1) (2): nn.LeakyReLU(0.2) (3): cudnn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1) (4): cudnn.SpatialBatchNormalization (5): nn.LeakyReLU(0.2) (6): cudnn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1) (7): cudnn.SpatialBatchNormalization (8): nn.LeakyReLU(0.2) (9): cudnn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1) (10): cudnn.SpatialBatchNormalization (11): nn.LeakyReLU(0.2) (12): cudnn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1) (13): cudnn.Sigmoid } transferring to gpu... done
/home/ipcv/torch/install/bin/luajit: /home/ipcv/torch/install/share/lua/5.1/nn/Dropout.lua:26: Creating MTGP constants failed. at /tmp/luarocks_cutorch-scm-1-1283/cutorch/lib/THC/THCTensorRandom.cu:33 stack traceback: [C]: in function 'bernoulli' /home/ipcv/torch/install/share/lua/5.1/nn/Dropout.lua:26: in function 'func' /home/ipcv/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' /home/ipcv/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' train.lua:221: in function 'createRealFake' train.lua:325: in main chunk [C]: in function 'dofile' ...ipcv/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

Larger image sizes

It would be great to use pix2pix on larger images, e.g 1024px. I guess one needs to add a couple of layers and change some parameters?

How to turn off the random jitter

I am looking for a way to turn off the 'random jitter' mentioned in the paper (which was applied to some experiments there, but not all). Best I can tell, changing the option 'preprocess' should control this, but I got exception if I set it to something other than 'regular' or 'colorization'.

The reason for needing this is that in my experiments with getting cGAN to fill in missing part (e.g., a whole eye) in human faces, I am getting problem with the repaired eye getting out of position or slightly wrong sized , and I would like to turn off this jittering to see if it helps.

Following is an example from my training session, where the right image is the ground truth image, the left image is the input image with the left eye erased, and at center is the output image with its repaired left eye out of position and slightly too small.
20161203114242.

"maps" dataset is not available

I'm trying to download maps dataset via
bash ./datasets/download_dataset.sh maps and getting error:

HTTP request sent, awaiting response... 404 Not Found
2016-12-07 12:16:47 ERROR 404: Not Found.

Looks like server have not contain dataset archive:
https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/maps.tar.gz
Could you please check it?

Strange phenomenon in my extreme experiments

I am doing what you may consider extreme experiments with your code, as a part of my artistic explorations. See http://liipetti.net/erratic/2016/11/25/imaginary-landscapes-using-pix2pix/

In the images at the end of the post you will see a roughly square shape in the middle, having higher frequency content than the rest of the image. It could result from the nature of the experiment (which includes scaling and filling in the missing part), but I came to think of the PatchGAN method described in your paper. Is it used in the code, at all? By default?

I just happened to think that if the discriminator is looking at an area in the center, then the generator might learn to put more detailed content just there. I don't know yet if this is happening, but it would be a possible explanation for what I am experiencing.

Here's two images as an example, the high frequency "watermark" clearly visible in the center.

15232095_10154880580548729_3647209952325463915_n

15178111_10154880580668729_5562908067157761161_n

Some question about Patch Discriminator

Hi,

It is a really good work. Thanks for sharing.

Here is what I didn't well understand. You said the patch discriminator acts like convolution operation to the whole image (256 x 256) in the paper and you achieved this by change the depth of the discriminator. But how do you know the size of receptive field (ie. 16 x 16, 70 x 70)? Another saying is how can I know the receptive field size when I increase or decrease the depth of the discriminator. Is there a way to calculate the relationship (od = F(rf))between receptive field size (rf x rf) and output size of discriminator (od x od)?

Another one is that the structure of 256 by 256 discriminator seems to be 286 X 286 receptive field not what you wrote 574 X 574 in the paper.

How to manipulate the z vector in pix2pix

@phillipi
I am trying to use the z vector created in pix2pix as a latent representation for my research on various applications related to cGAN. I use mainly test.lua as my main reference. I need to:

  1. Retrieve the z vector out of the model, which I believe should be netG:get(21).output. Is this correct?
  2. Bypass the encoder part of the generator to inject a z vector of my choice, and have the decoder generates an image out of this z vector of mine.

I am not yet sufficiently proficient with Lua/Torch to be certain of what I am doing, and would very much appreciation any pointer for achieving the above.

Small issue with batchSize>0 and save_latest_freq

I just noticed that if you increase the batchSize to a value > 1 it can happen that depending on the value of save_latest_freq the saving of the lastest_net snapshot is never triggered since the modulo value that checks the counter variable never gets to 0.

Black output in colorization tasks (Solved)

When working with colorization tasks the output is always a black image.

I think it is because the code is dividing twice the output image by 255.0 for colorization tasks, once in the function util.deprocessLAB() and again at lines 365 and 366 from train.lua.

if image_out==nil then image_out = torch.cat(util.deprocessL(real_A[i2]:float()),util.deprocessLAB(real_A[i2]:float(), fake_B[i2]:float()),3)/255.0
else image_out = torch.cat(image_out, torch.cat(util.deprocessL(real_A[i2]:float()),util.deprocessLAB(real_A[i2]:float(), fake_B[i2]:float()),3)/255.0, 2) end

Just removing the 255.0 division in the two attached lines solved the problem.

Thanks for this work, it is amazing!

Occasional appearance of persistent artifacts

In particular when training for a long time it can happen that some very annoying artifacts appear on some of the outputs. Usually they are square, are at the same location and look almost identical on different outputs. Once they appear they rarely go away even with longer training.

Here are some examples:

screenshot from 2017-02-16 11-57-08
screenshot from 2017-02-16 11-57-39
screenshot from 2017-02-16 11-58-12
screenshot from 2017-02-16 11-58-37
screenshot from 2017-02-16 11-56-45

I wonder what causes them? Division by zero in a layer? An overflow? And of course it would be great if there was a way to remove them without having to start training from 0. I wonder if there is a way to at least identify from which layer they originate.

Assuming that this is caused by a single ill-defined weight or bias in a relatively deep layer I wonder if there is a mathematical way I could identify that cell and then for example just replace that value with zero or the mean? For example this artifact has always a size of 48x48 pixels, the input/output is 256x256 - I guess by adding/multiplying the kernel sizes of the convolutions one could figure out at which point the accumulated kernel size becomes 48x48 and then backtrack from there?

Output size discriminator

Hi,

why are you using an output size of 1x30x30 for the discriminator? and not just 1x1x1?

Thanks!

Correct util.load for setup without CuDNN

I am training a model with GPU, but not CuDNN.
Then I test it in the same setup and get

torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
	[C]: in function 'error'
	torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
	torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	...e/holger/torch/install/share/lua/5.1/nngraph/gmodule.lua:495: in function 'read'
	torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	pix2pix/util/util.lua:193: in function 'load'
	test.lua:76: in main chunk
	[C]: in function 'dofile'
	...lger/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
	[C]: at 0x00405ea0

It turns out that in util.lua the line require 'cunn' should be moved before the line local net = torch.load(filename):

if opt.gpu > 0 then
		require 'cunn'
        end
	local net = torch.load(filename)
	if opt.gpu > 0 then
		net:cuda()

Then everything works fine.

OT: Adding a Spatial Transformer Layer(s)?

Sorry if this is a bit off-topic but this seems to be the best place to ask:

Do you think that adding one or more spatial transformer layers to the pix2pix architecture could increase the spatial range of the model and reduce "mosaic" artifacts? Right now it seems that there is a certain limit on how far the network can "see" around each pixel - so if the structures in a pair are very different it will not learn something meaningful. If I understand STLs correctly they allow to add moving, scaling and rotation of the input so I would imagine that this could help extending the range or finding the related structures in the paired image.

GPU to CPU model conversion

Unfortunately my GPU is too tiny for the models, and I'm hitting Out of Memory errors. Could anyone point me in the right direction for converting the models for use with a CPU? Google results are warping my mind. Alternatively, is there a way I can scale the net to have it fit? Working with a v.small 1GB GeForce GT 650M. Many thanks.

Same output for all inputs with test.lua

I'm training a model using my own data, and I've made sure to recreate the folder structure required and ran combine_A_and_B.py to create training data. In my both my test and val folders I have 1000 imageA/imageB pairs.

When I run DATA_ROOT=</my/data/root/> name=<my_expt> which_direction=AtoB th test.lua (with my actual experiment name and data root), it produces the correct folder output with latest_net_G_val being the top level folder, and images and index.html

However, every single output image is the same and doesn't correspond at all to the input or target images, but it does look like it's doing a decent job at segmenting some unknown input image from the dataset. There is some very slight variation between each output image, so it seems it's just getting the same single input into the network each time (?) Any help would be greatly appreciated.

Problem for images with 1 channel

Running the code for 1 input and output channel:
input_nc = 1, -- # of input image channels
output_nc = 1, -- # of output image channels

lead to this error:
inconsistent tensor size at /usr/local/torch/pkg/torch/lib/TH/generic/THTensorCopy.c:7 (dataset.lua:357)

luajit out of memory during training

Hi,

I'm trying to train pix2pix on a particular image labeling task. Things worked fine running the facades demo, although I had to use the CPU on my MacBook since the built-in GPU didn't have enough memory for that task.

I've used the combine_A_and_B.py script to generate new image pairs from about 6k pairs of input and label images. When training, I'm getting an error message:
luajit: not enough memory

My command line is below. I've got the display frequency set high so I can see what goes on in early iterations. Would dial that down once more comfortable with what was happening.

Anything I can do about the memory error?

$ DATA_ROOT=./datasets/imageClef/combined name=clef_generation which_direction=AtoB gpu=0 cudnn=0 batchSize=10 save_epoch_freq=5 display_freq=3 th train.lua
{
cudnn : 0
name : "clef_generation"
niter : 200
batchSize : 10
n_layers_D : 0
ndf : 64
which_model_netG : "unet"
save_display_freq : 5000
print_freq : 50
gpu : 0
use_GAN : 1
DATA_ROOT : "./datasets/imageClef/combined"
serial_batch_iter : 1
use_L1 : 1
save_epoch_freq : 5
output_nc : 3
checkpoints_dir : "./checkpoints"
input_nc : 3
beta1 : 0.5
continue_train : 0
which_direction : "AtoB"
phase : "train"
fineSize : 256
condition_GAN : 1
loadSize : 286
lambda : 100
ngf : 64
preprocess : "regular"
which_model_netD : "basic"
display_freq : 3
display : 1
display_id : 10
ntrain : inf
nThreads : 2
lr : 0.0002
flip : 1
save_latest_freq : 5000
serial_batches : 0
}
Random Seed: 276
#threads...2
Starting donkey with id: 2 seed: 278
table: 0x0f12f520
Starting donkey with id: 1 seed: 277
table: 0x0f14f0a8
./datasets/imageClef/combined
./datasets/imageClef/combined
trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7
Creating train metadata
serial batch:, 0
table: 0x0f1ed738
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
trainCache /Users/danielr/Documents/src/pix2pix/cache/_Users_danielr_Documents_src_pix2pix_datasets_imageClef_combined_train_trainCache.t7
Creating train metadata
serial batch:, 0
table: 0x0f0860f8
running "find" on each class directory, and concatenate all those filenames into a single file containing all image paths for a given class
now combine all the files to a single large file
now combine all the files to a single large file
load the large concatenated list of sample paths to self.imagePath
cmd..gwc -L '/tmp/lua_4R6Gpf' |gcut -f1 -d' '
load the large concatenated list of sample paths to self.imagePath
cmd..gwc -L '/tmp/lua_H4yZFT' |gcut -f1 -d' '
5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
[=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
5758 samples found... 0/5758 ...................] ETA: 0ms | Step: 0ms
Updating classList and imageClass appropriately
[=================== 1/1 =====================>] Tot: 2ms | Step: 2ms
Cleaning up temporary files
Cleaning up temporary files
Dataset Size: 5758
define model netG...
define model netD...
nn.gModule
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> output]
(1): nn.SpatialConvolution(6 -> 64, 4x4, 2,2, 1,1)
(2): nn.LeakyReLU(0.2)
(3): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1)
(4): nn.SpatialBatchNormalization (4D) (128)
(5): nn.LeakyReLU(0.2)
(6): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1)
(7): nn.SpatialBatchNormalization (4D) (256)
(8): nn.LeakyReLU(0.2)
(9): nn.SpatialConvolution(256 -> 512, 4x4, 1,1, 1,1)
(10): nn.SpatialBatchNormalization (4D) (512)
(11): nn.LeakyReLU(0.2)
(12): nn.SpatialConvolution(512 -> 1, 4x4, 1,1, 1,1)
(13): nn.Sigmoid
}
running model on CPU
/Users/danielr/torch/install/bin/luajit: not enough memory

nn.SpatialConvolution has no negation operator

Running

DATA_ROOT=./datasets/facades name=facades_generation which_direction=BtoA th train.lua

gives

/home/hannu/torch/install/bin/luajit: ./models.lua:52: nn.SpatialConvolution has no negation operator

Reinstalled Torch, still get the same.

CPU-only support

Are there any possibilities to get this to work with only CPU?

Filling in missing features through training

I am attempting to use this cGAN implementation for building the visual neural model of a specific person from photos or videos. One experiment that I did was to remove some features in the input image (say, an eye), and then see what it takes for cGAN to recover the missing features through learning. While monitoring the progress of such a training session, I am surprised to see that cGAN is attempting to move one or more whole eyes around seemingly looking for a fit, like this:

20161127171224

My question here is: what mechanism in this cGAN implementation would explain this behavior? I could imagine that the whole eye is actually represented as a high-level feature somewhere in the higher layer of the generator, and that the fractionally-strided convolution allows for some translation manipulation, but not sure why we would see two eyes sometimes, why the fitting process appears to be somewhat aimless and often off the mark by quite a bit (yes, it did take quite a while with the eye being moved all over the place) during training, why cGAN would decide that adding an eye way off position is worth trying (since the error from that would seem fairly high).

Anybody has insight on this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.