Git Product home page Git Product logo

stylegan-encoder's Introduction

StyleGAN — Encoder for Official TensorFlow Implementation

Python 3.6 TensorFlow 1.10 cuDNN 7.3.1 License CC BY-NC

Teaser image

These people are real – latent representation of them was found by using perceptual loss trick. Then this representations were moved along "smiling direction" and transformed back into images

Short explanation of encoding approach: 0) Original pre-trained StyleGAN generator is used for generating images

  1. Pre-trained VGG16 network is used for transforming a reference image and generated image into high-level features space
  2. Loss is calculated as a difference between them in the features space
  3. Optimization is performed only for latent representation which we want to obtain.
  4. Upon completion of optimization you are able to transform your latent vector as you wish. For example you can find a "smiling direction" in your latent space, move your latent vector in this direction and transform it back to image using the generator.

New scripts for finding your own directions will be realised soon. For now you can play with existing ones: smiling, age, gender. More examples you can find in the Jupyter notebook

Generating latent representation of your images

You can generate latent representations of your own images using two scripts:

  1. Extract and align faces from images

python align_images.py raw_images/ aligned_images/

  1. Find latent representation of aligned images

python encode_images.py aligned_images/ generated_images/ latent_representations/

  1. Then you can play with Jupyter notebook

Feel free to join the research. There is still much room for improvement:

  1. Better model for perceptual loss
  2. Is it possible to generate latent representations by using other model instead of direct optimization ? (WIP)

Stay tuned!

Original Readme:

This repository contains (no longer) official TensorFlow implementation of the following paper:

A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA)
http://stylegan.xyz/paper

Abstract: We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

For business inquiries, please contact [email protected]

For press and other inquiries, please contact Hector Marinez at [email protected]

Resources

All material related to our paper is available via the following links:

Link Description
http://stylegan.xyz/paper Paper PDF.
http://stylegan.xyz/video Result video.
http://stylegan.xyz/code Source code.
http://stylegan.xyz/ffhq Flickr-Faces-HQ dataset.
http://stylegan.xyz/drive Google Drive folder.

Additional material can be found in Google Drive folder:

Path Description
StyleGAN Main folder.
├  stylegan-paper.pdf High-quality version of the paper PDF.
├  stylegan-video.mp4 High-quality version of the result video.
├  images Example images produced by our generator.
│  ├  representative-images High-quality images to be used in articles, blog posts, etc.
│  └  100k-generated-images 100,000 generated images for different amounts of truncation.
│     ├  ffhq-1024x1024 Generated using Flickr-Faces-HQ at 1024×1024.
│     ├  bedrooms-256x256 Generated using LSUN Bedroom at 256×256.
│     ├  cars-512x384 Generated using LSUN Car at 512×384.
│     └  cats-256x256 Generated using LSUN Cat at 256×256.
├  videos Example videos produced by our generator.
│  └  high-quality-video-clips Individual segments of the result video as high-quality MP4.
├  ffhq-dataset Raw data for the Flickr-Faces-HQ dataset.
└  networks Pre-trained networks as pickled instances of dnnlib.tflib.Network.
   ├  stylegan-ffhq-1024x1024.pkl StyleGAN trained with Flickr-Faces-HQ dataset at 1024×1024.
   ├  stylegan-celebahq-1024x1024.pkl StyleGAN trained with CelebA-HQ dataset at 1024×1024.
   ├  stylegan-bedrooms-256x256.pkl StyleGAN trained with LSUN Bedroom dataset at 256×256.
   ├  stylegan-cars-512x384.pkl StyleGAN trained with LSUN Car dataset at 512×384.
   ├  stylegan-cats-256x256.pkl StyleGAN trained with LSUN Cat dataset at 256×256.
   └  metrics Auxiliary networks for the quality and disentanglement metrics.
      ├  inception_v3_features.pkl Standard Inception-v3 classifier that outputs a raw feature vector.
      ├  vgg16_zhang_perceptual.pkl Standard LPIPS metric to estimate perceptual similarity.
      ├  celebahq-classifier-00-male.pkl Binary classifier trained to detect a single attribute of CelebA-HQ.
      └ ⋯ Please see the file listing for remaining networks.

Licenses

All material, excluding the Flickr-Faces-HQ dataset, is made available under Creative Commons BY-NC 4.0 license by NVIDIA Corporation. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicating any changes that you've made.

For license information regarding the FFHQ dataset, please refer to the Flickr-Faces-HQ repository.

inception_v3_features.pkl and inception_v3_softmax.pkl are derived from the pre-trained Inception-v3 network by Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. The network was originally shared under Apache 2.0 license on the TensorFlow Models repository.

vgg16.pkl and vgg16_zhang_perceptual.pkl are derived from the pre-trained VGG-16 network by Karen Simonyan and Andrew Zisserman. The network was originally shared under Creative Commons BY 4.0 license on the Very Deep Convolutional Networks for Large-Scale Visual Recognition project page.

vgg16_zhang_perceptual.pkl is further derived from the pre-trained LPIPS weights by Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The weights were originally shared under BSD 2-Clause "Simplified" License on the PerceptualSimilarity repository.

System requirements

  • Both Linux and Windows are supported, but we strongly recommend Linux for performance and compatibility reasons.
  • 64-bit Python 3.6 installation. We recommend Anaconda3 with numpy 1.14.3 or newer.
  • TensorFlow 1.10.0 or newer with GPU support.
  • One or more high-end NVIDIA GPUs with at least 11GB of DRAM. We recommend NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • NVIDIA driver 391.35 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.3.1 or newer.

Using pre-trained networks

A minimal example of using a pre-trained StyleGAN generator is given in pretrained_example.py. When executed, the script downloads a pre-trained StyleGAN generator from Google Drive and uses it to generate an image:

> python pretrained_example.py
Downloading https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ .... done

Gs                              Params    OutputShape          WeightShape
---                             ---       ---                  ---
latents_in                      -         (?, 512)             -
...
images_out                      -         (?, 3, 1024, 1024)   -
---                             ---       ---                  ---
Total                           26219627

> ls results
example.png # https://drive.google.com/uc?id=1UDLT_zb-rof9kKH0GwiJW_bS9MoZi8oP

A more advanced example is given in generate_figures.py. The script reproduces the figures from our paper in order to illustrate style mixing, noise inputs, and truncation:

> python generate_figures.py
results/figure02-uncurated-ffhq.png     # https://drive.google.com/uc?id=1U3r1xgcD7o-Fd0SBRpq8PXYajm7_30cu
results/figure03-style-mixing.png       # https://drive.google.com/uc?id=1U-nlMDtpnf1RcYkaFQtbh5oxnhA97hy6
results/figure04-noise-detail.png       # https://drive.google.com/uc?id=1UX3m39u_DTU6eLnEW6MqGzbwPFt2R9cG
results/figure05-noise-components.png   # https://drive.google.com/uc?id=1UQKPcvYVeWMRccGMbs2pPD9PVv1QDyp_
results/figure08-truncation-trick.png   # https://drive.google.com/uc?id=1ULea0C12zGlxdDQFNLXOWZCHi3QNfk_v
results/figure10-uncurated-bedrooms.png # https://drive.google.com/uc?id=1UEBnms1XMfj78OHj3_cx80mUf_m9DUJr
results/figure11-uncurated-cars.png     # https://drive.google.com/uc?id=1UO-4JtAs64Kun5vIj10UXqAJ1d5Ir1Ke
results/figure12-uncurated-cats.png     # https://drive.google.com/uc?id=1USnJc14prlu3QAYxstrtlfXC9sDWPA-W

The pre-trained networks are stored as standard pickle files on Google Drive:

# Load pre-trained network.
url = 'https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ' # karras2019stylegan-ffhq-1024x1024.pkl
with dnnlib.util.open_url(url, cache_dir=config.cache_dir) as f:
    _G, _D, Gs = pickle.load(f)
    # _G = Instantaneous snapshot of the generator. Mainly useful for resuming a previous training run.
    # _D = Instantaneous snapshot of the discriminator. Mainly useful for resuming a previous training run.
    # Gs = Long-term average of the generator. Yields higher-quality results than the instantaneous snapshot.

The above code downloads the file and unpickles it to yield 3 instances of dnnlib.tflib.Network. To generate images, you will typically want to use Gs – the other two networks are provided for completeness. In order for pickle.load() to work, you will need to have the dnnlib source directory in your PYTHONPATH and a tf.Session set as default. The session can initialized by calling dnnlib.tflib.init_tf().

There are three ways to use the pre-trained generator:

  1. Use Gs.run() for immediate-mode operation where the inputs and outputs are numpy arrays:

    # Pick latent vector.
    rnd = np.random.RandomState(5)
    latents = rnd.randn(1, Gs.input_shape[1])
    
    # Generate image.
    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)
    

    The first argument is a batch of latent vectors of shape [num, 512]. The second argument is reserved for class labels (not used by StyleGAN). The remaining keyword arguments are optional and can be used to further modify the operation (see below). The output is a batch of images, whose format is dictated by the output_transform argument.

  2. Use Gs.get_output_for() to incorporate the generator as a part of a larger TensorFlow expression:

    latents = tf.random_normal([self.minibatch_per_gpu] + Gs_clone.input_shape[1:])
    images = Gs_clone.get_output_for(latents, None, is_validation=True, randomize_noise=True)
    images = tflib.convert_images_to_uint8(images)
    result_expr.append(inception_clone.get_output_for(images))
    

    The above code is from metrics/frechet_inception_distance.py. It generates a batch of random images and feeds them directly to the Inception-v3 network without having to convert the data to numpy arrays in between.

  3. Look up Gs.components.mapping and Gs.components.synthesis to access individual sub-networks of the generator. Similar to Gs, the sub-networks are represented as independent instances of dnnlib.tflib.Network:

    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    

    The above code is from generate_figures.py. It first transforms a batch of latent vectors into the intermediate W space using the mapping network and then turns these vectors into a batch of images using the synthesis network. The dlatents array stores a separate copy of the same w vector for each layer of the synthesis network to facilitate style mixing.

The exact details of the generator are defined in training/networks_stylegan.py (see G_style, G_mapping, and G_synthesis). The following keyword arguments can be specified to modify the behavior when calling run() and get_output_for():

  • truncation_psi and truncation_cutoff control the truncation trick that that is performed by default when using Gs (ψ=0.7, cutoff=8). It can be disabled by setting truncation_psi=1 or is_validation=True, and the image quality can be further improved at the cost of variation by setting e.g. truncation_psi=0.5. Note that truncation is always disabled when using the sub-networks directly. The average w needed to manually perform the truncation trick can be looked up using Gs.get_var('dlatent_avg').

  • randomize_noise determines whether to use re-randomize the noise inputs for each generated image (True, default) or whether to use specific noise values for the entire minibatch (False). The specific values can be accessed via the tf.Variable instances that are found using [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')].

  • When using the mapping network directly, you can specify dlatent_broadcast=None to disable the automatic duplication of dlatents over the layers of the synthesis network.

  • Runtime performance can be fine-tuned via structure='fixed' and dtype='float16'. The former disables support for progressive growing, which is not needed for a fully-trained generator, and the latter performs all computation using half-precision floating point arithmetic.

Preparing datasets for training

The training and evaluation scripts operate on datasets stored as multi-resolution TFRecords. Each dataset is represented by a directory containing the same image data in several resolutions to enable efficient streaming. There is a separate *.tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well. By default, the scripts expect to find the datasets at datasets/<NAME>/<NAME>-<RESOLUTION>.tfrecords. The directory can be changed by editing config.py:

result_dir = 'results'
data_dir = 'datasets'
cache_dir = 'cache'

To obtain the FFHQ dataset (datasets/ffhq), please refer to the Flickr-Faces-HQ repository.

To obtain the CelebA-HQ dataset (datasets/celebahq), please refer to the Progressive GAN repository.

To obtain other datasets, including LSUN, please consult their corresponding project pages. The datasets can be converted to multi-resolution TFRecords using the provided dataset_tool.py:

> python dataset_tool.py create_lsun datasets/lsun-bedroom-full ~/lsun/bedroom_lmdb --resolution 256
> python dataset_tool.py create_lsun_wide datasets/lsun-car-512x384 ~/lsun/car_lmdb --width 512 --height 384
> python dataset_tool.py create_lsun datasets/lsun-cat-full ~/lsun/cat_lmdb --resolution 256
> python dataset_tool.py create_cifar10 datasets/cifar10 ~/cifar10
> python dataset_tool.py create_from_images datasets/custom-dataset ~/custom-images

Training networks

Once the datasets are set up, you can train your own StyleGAN networks as follows:

  1. Edit train.py to specify the dataset and training configuration by uncommenting or editing specific lines.
  2. Run the training script with python train.py.
  3. The results are written to a newly created directory results/<ID>-<DESCRIPTION>.
  4. The training may take several days (or weeks) to complete, depending on the configuration.

By default, train.py is configured to train the highest-quality StyleGAN (configuration F in Table 1) for the FFHQ dataset at 1024×1024 resolution using 8 GPUs. Please note that we have used 8 GPUs in all of our experiments. Training with fewer GPUs may not produce identical results – if you wish to compare against our technique, we strongly recommend using the same number of GPUs.

Expected training time for 1024×1024 resolution using Tesla V100 GPUs:

GPUs Training time
1 5 weeks
2 3 weeks
4 2 weeks
8 1 week

Evaluating quality and disentanglement

The quality and disentanglement metrics used in our paper can be evaluated using run_metrics.py. By default, the script will evaluate the Fréchet Inception Distance (fid50k) for the pre-trained FFHQ generator and write the results into a newly created directory under results. The exact behavior can be changed by uncommenting or editing specific lines in run_metrics.py.

Expected evaluation time and results for the pre-trained FFHQ generator using one Tesla V100 GPU:

Metric Time Result Description
fid50k 16 min 4.4159 Fréchet Inception Distance using 50,000 images.
ppl_zfull 55 min 664.8854 Perceptual Path Length for full paths in Z.
ppl_wfull 55 min 233.3059 Perceptual Path Length for full paths in W.
ppl_zend 55 min 666.1057 Perceptual Path Length for path endpoints in Z.
ppl_wend 55 min 197.2266 Perceptual Path Length for path endpoints in W.
ls 10 hours z: 165.0106
w: 3.7447
Linear Separability in Z and W.

Please note that the exact results may vary from run to run due to the non-deterministic nature of TensorFlow.

Acknowledgements

We thank Jaakko Lehtinen, David Luebke, and Tuomas Kynkäänniemi for in-depth discussions and helpful comments; Janne Hellsten, Tero Kuosmanen, and Pekka Jänis for compute infrastructure and help with the code release.

stylegan-encoder's People

Contributors

puzer avatar tkarras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stylegan-encoder's Issues

Latent layers importance

Hey, great analysis! :)
I've learned a lot from reading it.

Just a quick comment: the W matrix produced by the mapping network contains a single vector w_v tiled in with respect to layers (so w[0] = w[1] = ... = w[n_layers - 1]).
The layer-wise affine transformation happens on the synthesis network

The notebook however operates over this tiled w, which is why you saw the surprising behavior (since all the layers are identical).
I imagine that running it again over the transformed Ws would show something much different.

P.S: This bug also affects the result of the non-linear model

Generated images color was deformed

Hi everyone,

I got the results as follow. I tried to convert RGB with a couple of way but they didn't work. Is there anyone who has an opinion what cause this ?

image

ValueError: Shape of a new variable (ref_img_features) must be fully defined, but instead was (?, 64, 64, 256)

When I run “python3 encode_images.py aligned_images/ generated_images/ latent_representations/”,I get the errer:

Traceback (most recent call last):
File "encode_images.py", line 86, in
main()
File "encode_images.py", line 61, in main
perceptual_model.build_perceptual_model(generator.generated_image)
File "/media/data2/laixc/stylegan-encoder/encoder/perceptual_model.py", line 41, in build_perceptual_model
dtype='float32', initializer=tf.initializers.zeros())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1328, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1090, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 764, in _get_single_variable
"but instead was %s." % (name, shape))
ValueError: Shape of a new variable (ref_img_features) must be fully defined, but instead was (?, 64, 64, 256).

error while running encode.py

Facing this error

Traceback (most recent call last):
File "encode_images.py", line 80, in
main()
File "encode_images.py", line 53, in main
generator = Generator(Gs_network, args.batch_size, randomize_noise=args.randomize_noise)
File "/content/gdrive/My Drive/RI/stylegan-encoder/encoder/generator_model.py", line 35, in init
self.generator_output = self.graph.get_tensor_by_name('G_synthesis_1/_Run/concat:0')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3783, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3607, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3649, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'G_synthesis_1/_Run/concat:0' refers to a Tensor which does not exist. The operation, 'G_synthesis_1/_Run/concat', does not exist in the graph."

Basic Usage Question

I apologize if I overlooked it and if this isn't an "issue", but I couldn't find any basic steps/commands needed to reproduce the change in expression found in the teaser image:
https://raw.githubusercontent.com/Puzer/stylegan-encoder/master/teaser.png

I was able to encode an image successfully using the commands in the readme but I can't seem to find any documentation on modifying the expression or angle of a face after doing so. All I'm trying to do is tinker with adjustments on a custom image, am I misunderstanding the functionality of this repo?

Generate random image

I am a bit confused on how to generate a random image, as the image generation in generate_image() seems to be quite different from the main stylegan example code in pretrained_example.py.

I naively tried the following:
generate_image(np.random.rand(18, 512))

which does not seem to work.

Question: running on CPU?

Do you know if it's possible to run stylegan on a GPU machine? Been trying to run a version of stylegan I've trained on my desktop machine and would like to mess around with the checkpoints while it's training. Keep seeing that the operations are specifically set to use GPU still.

Specifically, this error:

:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
         [[Node: Gs/_Run/Gs/latents_in = Identity[T=DT_FLOAT, _device="/device:GPU:0"](Gs/_Run/split)]]```

Error while using encodie_images.py file when trying to run it in colab

Traceback (most recent call last): File "encode_images.py", line 80, in <module> main() File "encode_images.py", line 51, in main generator_network, discriminator_network, Gs_network = pickle.load(f) _pickle.UnpicklingError: invalid load key, '\x0a'.

I have duplicated pickle file in my Google drive as the quota was exceeding. But, now I am getting this error. Is is because of copying or something else ?

According to this Stackoverlow post , maybe the file itself has been corrupted. If it's the case , Is there any mirror link for the same file which can be used safely ?

Interpolation between 2 faces in dlatent space not as meaningful as it is in qlatent space

Hi!

First, thanks for your work!

I tried to interpolate between 2 faces in the dlatent space (18, 512) and the result seems to be not as meaningful as it is if interpolating between 2 vectors in the qlatent space (512). It kinda works but some transient images contain strange artifacts or do not look like very valid face. Did you notice this effect? Seems like not all points along the linear path in the dlatent space correspond to real faces, though in the qlatent space they do.

Just wandering if it possible somehow to get latent representations in the original qlatent space to compare interpolation quality.

tensorflow.python.framework.errors_impl.InvalidArgumentError

  • I am try to run encode_image.py, and get 2 root error(s) found:
    (0) Invalid argument: You must feed a value for placeholder tensor 'G_synthesis_1/_Run/dlatents_in' with dtype float
    [[{{node G_synthesis_1/_Run/dlatents_in}}]]
    (1) Invalid argument: You must feed a value for placeholder tensor 'G_synthesis_1/_Run/dlatents_in' with dtype float
    [[{{node G_synthesis_1/_Run/dlatents_in}}]]
    [[G_synthesis_1/_Run/concat/concat/_8419]]
  • I am try to encode my 2 images, someone can help me for this issues ?

How to find the attribute direction?

Hey, Puzer, you did a nice work. I am working on generating more meaningful face images and controlling the the attributes by myself. And I found that you got the attribute direction, like smiling, age, gender. How do I get more attribute direction, like hair, color of skin or other facial expressions and so on. Do you have some script or any way? Thank you.

Getting black output images

Hi,

When I try to optimize d_latent as in the code, I start from a nice image but it transforms into a completely black image.
Adding an L2 loss between the reference and the generated image helps, but the resulting image is not similar to the reference.

Any ideas?

Thanks!

Using the stylegan-encoder for the LSUN bedroom dataset

@Puzer you did some great work here! I'm trying to apply your encoder to the bedroom network, however the generated images are not of the same quality as the results of the FFHQ network. Initially, I changed the SGD optimizer to the Adam optimizer, because the loss decreased faster and the generated images looked more realistic. I also tried to tweak the hyperparameters, but only the number of iterations has a major impact on the result. Do you have any ideas to improve the results?

Command Original Results
--iterations 1000 --batch_size 1 --lr 0.1 content 1 content 1-1000itr
--iterations 2000 --batch_size 4 --lr 0.1 content 1 content 1-2000itr-4batch
--iterations 10000 --batch_size 1 --lr 0.1 content 1 content 1-10000itr
Command Original Results
--iterations 1000 --batch_size 1 --lr 0.1 style 2 style 2-1000itr
--iterations 2000 --batch_size 4 --lr 0.1 style 2 style 2-2000itr-4batch
--iterations 10000 --batch_size 1 --lr 0.1 style 2 style 2-10000itr

The following results are generated while using the SGD optimizer:

Command Original Results
--iterations 2000 --batch_size 2 --lr 1 bedroom2-0 bedroom2-2000itr-2batch
--iterations 3000 --batch_size 3 --lr 1 --randomize_noise bedroom2-0 bedroom2-3000itr-3batch-rn
--iterations 4000 --batch_size 4 --lr 1 bedroom2-0 bedroom2-4000itr-4batch

Results of style transfer:

Fine style mixing

image

Coarse style mixing

image

Both results are significantly different from the original StyleGAN style transfer results for bedrooms. Because of the lower quality of the images it was expected to have slightly less stunning images, however the original images are not derivable in these results.

Fast embeddings for head pose

Hey just wondering if there was a simple way of getting fast embeddings just for the pose/coarse style (face landmarks & headPose). I was thinking of trying a realtime generator responding to movement through a webcam input - has anyone tried this & do you think it could be possible?

The error of load image

Hi, I'm trying to use 'python align_images.py raw_images/ aligned_images/' to see the latent space of my owe image data. But it threw an error reminding which is "Unknown image file format: Unable to load image in file raw_images/.ipynb_checkpoints". I run this model on colab. Can anyone help me to fix this problem? Many thanks!!

Syntax error in encode_images.py

Getting this when trying to learn new vectors and running python encode_images.py aligned_images/ generated_images/ latent_representations/:

File "/content/stylegan-encoder/encode_images.py", line 73
img.save(os.path.join(args.generated_images_dir, f'{img_name}.png'), 'PNG')

I have read that f'{} requires python3, but I cannot run python3 without tensorflow 2.x, while this model requires Tensorflow 1.14. I am not sure how to get past this.

Can anyone please help me? Thank you!

Scaled loss / magic number

Hi! Great project :)

I would like to ask where does the scale in perceptual model loss come from. I mean that line:
self.loss = tf.losses.mean_squared_error(self.features_weight * self.ref_img_features, self.features_weight * generated_img_features) / 82890.0

I tried 82890.0 to factorize it 2 * 3^3 * 5 * 307 to check if it's the size of output features or anything similar, but apparently no. Also I searched through all the Github to find exactly the same scale in other projects, but found no good match.

Issue with session and api

I have created an API to run this github.
For the first client, everything is executing perfectly and giving good result. But for the second client on the same session giving error like
ValueError: Tensor("Const:0", shape=(3,), dtype=float32) must be from the same graph as Tensor("strided_slice:0", shape=(1, 256, 256, 3), dtype=float32)

Trying to resolve the issue.
Help me for this.
Thanks & Regards,
Sandhya

I just found a project that allows controlling a bunch of StyleGAN features through UI knobs:

I just found a project that allows controlling a bunch of StyleGAN features through UI knobs:
https://github.com/SummitKwan/transparent_latent_gan

Being a total newbie at machine learning, I'm wondering, what are the main differences between Puzer's approach and transparent_latent_gan?

Another issue - transparent_latent_gan is using the smaller CelebA dataset, so that might be the reason why sometimes its features get entangled too much and StyleGAN gets stuck when you try to lock and combine too many features (try to adjust the sliders to create an old, bald, non-smiling, bearded man with eyeglasses).

I'm wondering if Puzer's approach could work better? I tried current age direction and noticed that at some point it tries to add glasses and beard. I guess, those two features got entangled with age and I'm not sure what could be done to disentangle them - I hope to get only wrinkles and receding hairline for age direction.

Also, when encoding images, I found out that sometimes align works incorrectly cropping away top of a head. And for some of my images, the optimal encoder combination seems to be learning rate of 4.0 and image size of 512. With default settings (learning rate of 1 and image size 256) it got some tricky images (old black&white photos) or complex scenarios (large mustache over lips) totally corrupted, and for some less complex images it lost enough tiny details to make the photo feel too "uncanny" to consider to be exact match, especially, for younger people who don't have enough deep wrinkles or beards and also when images are shot with lots of light, so those tiny details and shadows matter a lot.

Of course, 4.0 @ 512 can take pretty long time to train, and sometimes 1000 iterations are not enough. With one specific tricky image I went as far as to 4000 iterations to get satisfactory results, while for some other images such high learning rate + iterations led to washed-out images (overfitting?).

Originally posted by @progmars in #5 (comment)

Join Stylegan encoder research

Dear Puzer,
I am currently working on stylegan encoder for architecture design opportunities. I was wondering how can I join your research lab and collaborate with you guys.
I am a PhD candidate in Chung Ang University, South Korea.
contact me on: [email protected]
best regards
Ahmed Khairadeen Ali

Reverse the changes of alignment

Hi! Thank you for this project. I am using this project with video frames instead of images. I notice that during facial alignment, if there is a tilt in the image etc, it is straightened out. Can I reverse these effects in the image after the image has been moved along the direction of the latent vector? If yes, how?

Cannot load network snapshot for restarting: corrupted pickles?

I'm trying to restart a job by loading a network snapshot pickle (called network-snapshot-001283.pkl). The pickle is written during training of the previous run and loaded at the beginning of the restart by the functions at the beginning of training/misc.py.

I get the following error when trying to load the snapshot:

_pickle.UnpicklingError: invalid load key, '<'.

This error appears for all my snapshots produced training with different data on different runs, as well as when trying to use generate_figures.py.
Either the pickle written during training is corrupted, or I'm doing something wrong, although I haven't change any line corresponding to this part.
Any idea?

Thanks

Why the latent code from GAN inversion methods can be manipulated by the boundary

Hi, thanks for sharing this great work!

I have some question as follow:

  1. The shape of 'donald_trump_01.npy' is (18,512) which has different values of 18 layers. However, your "smile.npy" has the same values of 18 layers. The meaning of 18 layers of 'donald_trump_01.npy' is different because the values are different, why it can edit by the same smile boundary?
    Do you know why it can also work?

  2. In move_and_show function, new_latent_vector[:8] = (latent_vector + coeff*direction)[:8]. Why just edited the first eight layers?

Thank you!

shape (1, 12, 512) to (1, 18, 512)?

how can I create stylegan model with (1, 18, 512)
my stylegan model is creating shape (1, 12, 512) and I cannot find the latent space developed by Puzer because of shape difference

in more details:
my model produce shape with (1, 12, 512) using (https://github.com/NVlabs/stylegan)
but when I use stylegan encoder (https://github.com/Puzer/stylegan-encoder) to find latent space it requires (1, 18, 512), do you have any idea how can I produce (1, 18, 512) model shapes instead of (1, 12, 512)?

Question about the direction for an attribute

Does this only work for binary labels, like smile vs. no-smile or male vs. female? Or is it possible to do it for multi-class labels like ethnicity (ie. white, black, asian)? Thanks!

why is an optimizer used in something that is not a neural network?

Can someone tell me the intuition of applying an optimizer (descending gradient, adam) to the latent code.

the optimizer looks for the image in the latent space, the latent space is updated instead of the weights of a neural network.

Why does it work in this case, since it is not the weights of a neural network that is updated?

How does the optimizer know the latent code that represents the input image for the generator?

KeyError: "The name 'G_synthesis_1/_Run/concat:0' refers to a Tensor which does not exist. The operation, 'G_synthesis_1/_Run/concat', does not exist in the graph."

Getting this error after running the following with TensorFlow 1.x in Google Colab on a GPU runtime:
!python encode_images.py aligned_images/ generated_images/ latent_representations/ --iterations 1000

Full error:
Traceback (most recent call last):
File "encode_images.py", line 80, in
main()
File "encode_images.py", line 53, in main
generator = Generator(Gs_network, args.batch_size, randomize_noise=args.randomize_noise)
File "/content/stylegan/encoder/generator_model.py", line 35, in init
self.generator_output = self.graph.get_tensor_by_name('G_synthesis_1/_Run/concat:0')
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3783, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3607, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3649, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'G_synthesis_1/_Run/concat:0' refers to a Tensor which does not exist. The operation, 'G_synthesis_1/_Run/concat', does not exist in the graph."

Any help would be greatly appreciated!

encode_images.py runs out of memory on image sequence

When encoding a large number of images the time to set reference images to the perceptual model takes longer, and eventually the script crashes with the following error:

019-02-25 21:02:48.244097: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 320.00MiB. Current allocation summary follows.
2019-02-25 21:02:48.252031: W tensorflow/core/common_runtime/bfc_allocator.cc:275] ****************************______
2019-02-25 21:02:48.257082: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:937 : Resource exhausted: OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/Conv2DBackpropInput}} = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/ShapeN, G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/mul, gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/add_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

When training on a batch of 1 image at a time it takes about 250 images before it crashes. When training on a batch of 5 images at a time it crashes on the 10th batch (50 images).

Is the perceptual model holding onto previous images? Could there be a memory leak somewhere? As far as I can tell the crash happens on the self.sess.run command of the optimize method. I also tried removing tqdm from the script but it still crashes during training.

Reduce encode_images.py time by using one model instance

Hi, I am trying to decrease generation time. So far I am getting 2 min and 20 seconds per image (generating 10 images as output for age).
What I am realizing is that encode_images.py is taking this long for each input image:

  1. Initializing generator : 7.2106 secs
  2. Creating PerceptualModel : 9.0305 secs
  3. Loading Resnet model : 23.0473 secs
  4. Loop loss : 1.0582 secs
  5. Loop loss : 0.0619 secs
  6. Loop loss : 0.0630 secs
  7. Loop loss : 0.0618 secs
  8. Loop loss : 0.0628 secs
  9. Loop loss : 0.0621 secs

So I am trying to initialize the generator, create the perceptual model and load the resnet model at once at the beginning of my script and pass as parameters to encode_images.py so steps 1 to 3 are not being done for each image.

But I have no idea if that's the right way to do it. I defined an auxiliar() function instead of calling the script directly and passing same flags and parameters:

New defined function

auxiliar(optimizer='lbfgs', face_mask=True, iterations=6, use_lpips_loss=0, use_discriminator_loss=0, output_video=False, src_dir='aligned_images/', generated_images_dir='generated_images/', dlatent_dir='latent_representations/')

Former script call

python encode_images.py --optimizer=lbfgs --face_mask=True --iterations=6 --use_lpips_loss=0 --use_discriminator_loss=0 --output_video=False aligned_images/ generated_images/ latent_representations/

So far I am getting this error:
ValueError: Tensor(“Const_1:0”, shape=(3,), dtype=float32) must be from the same graph as Tensor(“strided_slice:0", shape=(1, 256, 256, 3), dtype=float32).

At this point of the code that used to be in encode_images.py:

perceptual_model = PerceptualModel(args, perc_model=perc_model, batch_size=batch_size)
perceptual_model.build_perceptual_model(generator, discriminator_network)

I am trying to obtain the same result with pytorch.

I did the encoding like the author of this github has mentioned.
VGG = Perceptual model
G = Generator model
G.style() is the fully connected encoder in generator which gives dlatent code.

First question
After optimizing perceptual loss between G and VGG, do I need to save the weights of G.style(), or pass random vector to G.style() save the output from G.style() which is dlatent.

Second Question
If the first approach is right, than I proceed by getting attributes of the face using microsoft cognitive api. I tried on just 860 images to begin with.
I do get the result but it is wrong.
image

Getting numpy array of custom feature directions

Hi,
I modified the non-linear model inputs to get different feature directions. How can I get numpy array of those feature vectors for non-linear model?
For linear model, I realized it can be obtain by making minor changes here:

clf = LogisticRegression(class_weight='balanced')
clf.fit(X_data.reshape((-1, 18*512)), y_gender_data)
gender_dircetion = clf.coef_.reshape((18, 512))

Thank you!

Can't download LATENT_TRAINING_DATA

I can't download LATENT_TRAINING_DATA in Learn_direction_in_latent_space.ipynb .

HTTPError: 404 Client Error: Not Found for url: https://drive.google.com/uc?id=1xMM3AFq0r014IIhBLiMCjKJJvbhLUQ9t

Thanks.

invalid syntax for dnnlib/__init__.py

submit_config: SubmitConfig = None # Package level variable for SubmitConfig which is only valid when inside the run function. is error with message SyntaxError: invalid syntax, should I just ignore this error? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.