This project implements a image-to-image translation method as described in the paper - Image-to-Image Translation with Conditional Adversarial Networks by Phillip Isola et al. (arXiv:1611.07004)
It was made as the final project for CS 763 - Computer Vision course in Spring 2019 at Indian Institute of Technology (IIT) Bombay, India.
pix2pix uses a conditional generative adversarial network to efficiently design a general-purpose image-to- image translation system. Image-to-image translation involves learning a mapping from images from one distribution to corresponding images in another distribution. Many kinds of problems can be viewed as an image-to-image translation problem, including image colorization, edges to object visualization, style transfer etc.
For example, an output for Satellite-to-Maps view would be
All the image output files in this project will be of the above format i.e.
[Source - Target_Ground_Truth - Target_Generated]
I had tested this project with the following datasets released public by the authors (link in Acknowledgements section)
- Facades
- Maps (satellite-to-map)
- Maps (map-to-satellite)
Follow the instructions below to get our project running on your local machine.
- Clone the repository and make sure you have prerequisites below to run the code.
- Run
python src/main.py --help
to see the various options available to specify. - To train the model, run the command
python src/main.py ...
along with the flags. For example, to run on the maps (map-to-satellite) dataset, you may run
python src/main.py --mode train --data_root '../datasets/maps' --num_epochs 100 --data_invert
- All the outputs will be saved to
src/output/[timestamp]
where[timestamp]
is the time of start of training.
-
Python 3.7.1 or above
-
PyTorch 1.0.0 or above
-
CUDA 9.1 (or other version corresponding to PyTorch) to utilize any compatible GPU present for faster training
[The code is tested to be working with the above versions on a Windows 10 machine with GTX 1070. It may also work for other lower versions.]
Code of the various modules can be found in the modules.py file.
- Generator
- I had used a
U-Net
(arXiv:1505.04597) like architecture for the generator, which is simply an encoder-decoder architecture with skip connections in between them.
- I had used a
[Image Courtesy: Author's paper]
-
Precisely, the encoder channels vary as
in_channels -> 64 -> 128 -> 256 -> 512 -> 512 -> 512 -> 512
and the decoder's channel sizes vary accordingly. -
Discriminator
- For the discriminator, a
PatchGAN
is used. APatchGAN
is similar to a common discriminator, except that it tries to classify each patch of N × N size whether it is real or fake. - In our case, we take N = 70. This is in our code achieved by using a Convolutional network whose receptive field is 70 on the input image to the discriminator. Mathematically, this can be checked to be equivalent to what has been described in the paper.
- The channel sizes in our
PatchGAN
vary asin_channels -> 64 -> 128 -> 256 -> 512 -> out_channels
.
- For the discriminator, a
-
Hyperparameters
- I had used the default parameters mentioned in the code of
main.py
. You may easily test on other values by suitably changing the flags.
- I had used the default parameters mentioned in the code of
All the results shown here are on test data.
As a sanity check, I would like to point out that on the training set, the model was able to give good outputs as shown below, indicating that it's capacity was quite sufficient.
For the Facades dataset,
Generator Loss [Training] | Discriminator Loss [Training] |
---|---|
- Vamsi Krishna Reddy Satti - vamsi3
- I would like to thank the authors of the paper for the amazing public dataset found here.
This project is licensed under MIT License - please see the LICENSE file for details.