Git Product home page Git Product logo

fashion-gan's Introduction

Fashion Generation through Controllable StyleGAN

pca

GANs for All: Supporting Fun and Intuitive Exploration of GAN Latent Spaces

Author: Wei Jiang, Richard Lee Davis, Kevin Gonyop Kim, Pierre Dillenbourg

https://proceedings.mlr.press/v176/jiang22a.html

Abstract: We have developed a new tool that makes it possible for people with zero programming experience to intentionally and meaningfully explore the latent space of a GAN. We combine a number of methods from the literature into a single system that includes multiple functionalities: uploading and locating images in the latent space, image generation with text, visual style mixing, and intentional and intuitive latent space exploration. This tool was developed to provide a means for designers to explore the "design space" of their domains. Our goal was to create a system to support novices in gaining a more complete, expert understanding of their domain's design space by lowering the barrier of entry to using deep generative models in creative practice.

Dataset

We use Zalando dataset which can also be downloaded from Zalando images and Zalando Text Image Pairs. The dataset itself consists of 8732 high-resolution images, each depicting a dress from the available on the Zalando shop against a white-background.

Models

Train StyleGAN Model from scratch

!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip"

If the resume the model from a checkpoint, we can --resume

!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip" --resume "training_runs/00015-square_256_imgs-auto1-resumecustom/network-snapshot-000400.pkl"

Finetune DALL-E Model

!python "DALLE-pytorch/train_dalle.py" --vae_path "DALLE-pytorch/wandb/vae-final.pt" --image_text_folder "data/text_images"

Download the models in the following links and save them in your Google Drive.

Model Download
Pretrained fashion GAN fashion-gan-pretrained.pkl
Finetuned DALL-E model DALLE-finetuend.pkl

Explore Latent Space

We applied PCA analysis to identify the semantically meaningful directions in latent space. By exploring the first 10 principle components, we found sleeve, pattern, etc.

Open In Colab

To project the image into latent space, we employ SGD with perceptual loss + pixel-by-pixel MSE loss between two images. This loss noticeably improved our tool’s ability to embed out-of-sample examples in the latent space of the GAN.

$$ w^{*} = \min_{w} L(w) = \min_{w} \lVert f(G(w)) - f(I) \rVert_2^2 + \lambda_{pix} \lVert G(w) - I \rVert_2^2 $$

Text-to-image Generation

We implemented two methods to locate the design. The first method was to randomly sample images from the latent space, then to pass these along with the text description through a CLIP. model to find a small number of images which most closely matched the text. The second method was to fine-tune a DALL-E model on the Feidegger dataset, and then to pass the text descriptions to DALL-E and let it generate designs. We compare it with other models:

  • FahionGAN: realistic, diverse but low resolution.
  • DALLE: diverse, creative but less accurate.
  • Stable Diffusion: accurate, high resolution but not diverse (when given specific text with only changing background and models).

text-to-image

WebApp

We have built a website for user testing: generarive.fashion

https://generative.fashion

YouTube Demo

To run it in Google Colab: Open In Colab

The interface of our neural design space exploration tool. Users can upload images in the workplace on the left or generate random image through random button. Also, they can generate examples via text descriptions using the text box. Users can drag these examples to the style-mixing region or save them in the workplace. Users can selectively combine elements from three designs using the visual style-mixing panel. The output image is shown in the center of the canvas on the right. The 2D-dimensional canvas represents the design space for two attributes in the horizontal and vertical axes, and these attributes can be changed by using a drop-down menu for each axis. Dragging the image within the canvas is equivalent to moving through the latent space of the GAN in semantically meaningful directions.

interface_part1 interface_part2

Citation

@InProceedings{pmlr-v176-jiang22a,
  title =     {GANs for All: Supporting Fun and Intuitive Exploration of GAN Latent Spaces},
  author =    {Jiang, Wei and Davis, Richard Lee and Kim, Kevin Gonyop and Dillenbourg, Pierre},
  booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track},
  pages =     {292--296},
  year =      {2022}
}

Acknowledgements

This project and application is a semester project at EPFL CHILI Lab

fashion-gan's People

Contributors

jiang15 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.