Simultanous Super Resolution of Underwater Images using Cycle Generative Adversarial Networks
Enhancing underwater imagery using GANs
Computer vision is an attractive option when it comes to underwater monitoring, especially at shallower depths. But, light-distortion factors due to refraction, absorption, and suspended particles lose many quality features in an image, with the resulting image being noisy and distorted. In this blog, we talk about using Generative Adversarial Networks (GANs) to improve underwater images’ resolution. We discuss recent breakthroughs in GANs in underwater imagery from scientific papers such as Deep SESR (we would reference all discussed papers below). We also discuss the possibility of using GANs to generate full-fledged datasets for underwater image restoration.
When we talk about automating the process of colorizing greyscale images, we usually have paired training data because any color image can be converted to black and white. With underwater images, we lack the ground truth, which hinders us from using a similar approach.
The major problem is acquiring training data. Those of you doing vision in underwater environments know the massive lack of open and annotated datasets. To bypass this problem, we use a type of GAN called the CycleGAN. CycleGANs are extremely useful in generating a dataset of image pairs. CycleGAN learns a mapping G: X-> Y such that images from G(X) are indistinguishable from images in Y, as well as a mapping F: Y -> X such that images sampled in from F(Y) are indistinguishable from images sampled in X.
- Generator-A -> Domain-A
- Generator-B -> Domain-B
Here the Generator models perform image translation where the image generation process is conditional on some input image (precisely image from the other domain). Generator-A takes an image from Domain B as input, and Generator-B takes an image from Domain-A as input.
- Domain-B -> Generator-A -> Domain-A
- Domain-A -> Generator-B -> Domain-B
Each generator has a corresponding discriminator model. The first discriminator model (Discriminator-A) takes real images from Domain-A and generates Generator-A images, and predicts whether they are real or fake. The second discriminator model (Discriminator-B) takes real images from Domain-B and generated Generator-B images and indicates whether they are real or fake.
- Domain-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
- Domain-B -> Discriminator-B -> [Real/Fake]
- Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]
So we construct two domains, A and B, where A contains the distorted images and B contains non-distorted images taken from select subsets in ImageNet. CycleGAN is then used to generate sets of image pairs such that they appear to be underwater.
There are two components to the CycleGAN objective function: An adversarial loss and cycle consistency loss. The adversarial loss is precisely like it sounds: Both generators attempt to “Trick” each other’s discriminator into being less able to distinguish between real-versions. For the adversarial loss, we can use least square loss :
However, the adversarial loss itself cannot produce acceptable images since it leaves the model constrained. It enforces the generated output to be of an appropriate domain but does not enforce input and output are recognizably the same. To address this issue, we use cycle consistency loss. It relies on the fact that if you convert an image to others and back again by successively feeding it through both generators, you should be able to get something similar to what you put in.
We compare patches on local images as one source of evaluation, as the original image has no specific ground truth.