CycleGAN (Horse-to-Zebra & Zebra-to-Horse Image Translation)
This repository details the implementation of CycleGAN, a model that stands at the forefront of image translation technology. CycleGAN operates as a generative adversarial network (GAN) that facilitates bidirectional image translation between horse and zebra domains without the need for paired examples. This innovative model allows for the seamless transformation of horse images to zebra images and vice versa. The project was developed as part of the VISION AND PERCEPTION course within my Master’s in Artificial Intelligence and Robotics at the Sapienza University of Rome, showcasing the practical application of advanced concepts in a real-world scenario.
The dataset used for this implementation consists of paired horse and zebra images. Unfortunately, we cannot provide the dataset directly within this repository due to size and licensing constraints. However, you can acquire the dataset from public sources or repositories and organize it accordingly. Or download the dataset from Kaggle repository Horse2zebra Dataset
In CycleGAN, three distinct loss functions are utilized: adversarial loss, cycle consistency loss, and optional identity loss.
- The adversarial loss aims to train the generator to produce realistic images by distinguishing them from the real images using Mean Squared Error (MSE) loss.
- For the cycle consistency loss, the Wasserstein loss is employed instead of the traditional L1 norm loss. It ensures that translating an image from one domain to another and then back again generates a reconstructed image that closely resembles the original.
- Lastly, the identity loss, although optional, aims to preserve the identity of an input image from the target domain. Notably, the original paper did not utilize identity loss, leaving it as an optional component.
Here are some samples of the translated images generated by the trained CycleGAN model:
Due to limitations in the training environment (e.g., computational resources, time constraints, or limitations in the Google Colab platform), the model was trained for a reduced number of epochs, stopping at 128 epochs instead of the originally intended 200 epochs. That's the main reason, the model is not quite good.
This implementation is based on the original CycleGAN paper:
Zhu, J., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).