Preview Details of Implementation Networks Hyperparameters - Losses - Tools Results References Useful Resources
The goal of this project is to color the black and white portraits. This project is based on the deep learning concepts and uses Pix2Pix GAN which is a general approach for image-to-image translation.
This dataset consists of about 5,000 people's images including family portraits, wedding portraits, and so on.
The images have different dimensions. The average dimension is (636 x 788).
1- The Black & White images were removed from the dataset.
2- The dimension of the images was changed to (768 x 768).
3- These RGB color space images were converted to Lab color space images.
- Why? In the Lab color space only the second(a) and third(b) channels color the image. Thus, for the training of the Generator, the L-channel can be ignored. Removing one channel definitely makes the problem easier.
4- At the end, some sort of normalization was taken:
-
For all values in the L-channel like x:
-
For all values in the a-channel or b-channel like x:
-
(These two steps led to having a 3-dimensional matrix (a tensor of the image in Lab color space) in which all the values are in the range of [-1,1])
-
- Takes the L-channel of the image as an input and generates the ab-channels as an output.
-
- A batch of images [L-channel]:
- A tensor of shape : (Batch size, 1, 768, 768)
- A batch of images [L-channel]:
-
-
The predicted values [ab-channels]:
- A tensor of shape : (Batch size, 2, 768, 768)
-
-
-
Unet1
-
Encoder : EfficientNet-b5
-
ImageNet pre-trained weights were used.
-
The input channel was set to 1.
-
The output channel was set to 2.
-
-
The L-channel concatenates to ab-channels generated from the generator, and the colored image is obtained. Therefore the generator generates the colored image.
-
-
- Takes an image(in Lab color space) and then generates a 2-dimensional matrix (94 x 94) in which each value represents that the corresponding part of the image is real or fake (a number that indicates the probability of being Real)
-
-
- A batch of images that contains the original images (in a Lab color space) and the images generated from the generator:
- A tensor of shape : (Batch size*2, 3, 768, 768)
- A batch of images that contains the original images (in a Lab color space) and the images generated from the generator:
-
- The generated matrix:
- A tensor of shape : (Batch size*2, 1, 94, 94)
- The generated matrix:
-
-
-
- A batch of images generated from the generator:
- A tensor of shape : (Batch size, 3, 768, 768)
- A batch of images generated from the generator:
-
- The generated matrix:
- A tensor of shape : (Batch size, 1, 94, 94)
- The generated matrix:
-
(1) Conv → BN → ReLU
(2) Conv → BN → ReLU
(3) Conv → BN → ReLU
(4) Conv → BN → ReLU
(5) Conv → Sigmoid
(Please check the source code for more information, especially the number of filters and their sizes)
-
- 3
-
- Approximately : 0.9 | 0.1
-
-
- ADAM
- ADAM
-
-
0.5 . BCE(Discriminator(X), Y)
- If X is a real image, then Y is a tensor in which all the values are 1.
- If X is a fake image (the image is derived from the generator), then Y is a tensor in which all the values are 0.
-
-
-
-
- ADAM
- ADAM
-
-
BCE(Discriminator(XGenerator), 1T) + 10 . |XReal - XGenerator|1
- 1T is a tensor in which all the values are 1.
- XReal is the real image, and XGenerator is the fake colored version of XReal derived from the generator.
-
-
-
- Python - Pytorch (Using Google Colab Pro)
Training finished after 46 epochs (on Tesla T4). In every three epochs, the weights of both networks were saved. After reviewing the Test images generated by the generator at every three epochs, the epoch led to the best images was obtained. In Epoch 27, the generator generated more realistic Test images.
The Generator and Discriminator Losses (for Train images) at each epoch are shown in the figures below.
[1] Segmentation Models Pytorch - Pavel Iakubovskii
[2] How to Develop a Pix2Pix GAN for Image-to-Image Translation