In the last decade the availability of high resolution displays has steadily increased. Furthermore, there have been numerous advancements in the field of deep learning based image processing. This project explored one particular area within deep learning based image processing, namely imagesuper-resolution (SR).
The learning curve of the participants of this project were split into three phases. In the first phase classical image processing theory and algorithms are explored.
During the second phase machine learning algorithms and neural networks are explored. In the last phase of the project the first two phases are combined and supervised learning methods for super-resolution are explored and built.
The goal of image super-resolution is to create a high resolution image(HR) from a low resolution image (LR). The result: a super-resolution image can be applied to a range of real-world applications. One can think of medical imaging, security and surveillance, among many others.
This article serves as a very high level summary of what we have done with what kind of tools and the end result.
The aim of super-resolution is to recover HR images from LR images. The LR image is a degraded version of the HR image. A function exists that maps the LR image to the HR image. If this exact degradation function is known, the inverse of this function can be applied to the LR image to get a perfect HR image.
Generally speaking this function is not known and only the LR image
is given. Estimating the inverse of the function is, as is often with inverse problems, ill-posed. This is because there are multiple HR images that can correspond with an LR image. More often than not there is not an unique solution.
The toolkit of a software engineer can either make or break their product. This especially applies to computer science subdomains like artificial intelligence , because small differences in a tool’s functionality can greatly alter the outcome of a prediction. Since this project will make extensive use of digital image processing and deep learning , picking the right tools is of the utmost importance.
As of June 2020 the general-purpose programming language Python is undeniably one of the most important languages in the field of machine learning (ML) . The ease of use and shallow learning curve of Python allowed for the creation of a rich ecosystem that is geared toward many different domains, including AI.
This does not mean that there are no alternatives that are well suited for ML, which R is a prime example of. However, given our prior experience with Python and the beginner-friendly ecosystem it provides, it seemed like the most solid and straightforward choice.
Right away we noticed that the package management of Python formed an obstacle. Since we each had different Python versions installed on our machines, we got conflicting errors. We ended up solving this problem by using a virtual environment manager calledpipenv. This ensured that all our Python versions and packages were identical. After solving the package management problem, the development with Python turned into a breeze. The weakly typed nature of Python allowed us to rapidly prototype different types of ML models. Most bugs we encountered were easily solved, because of the vast amount of discussion and reporting done by the Python machine learning community.
Google Colab is an online interactive development environment that supports many languages, including Python . Out-of-the-box it supports most libraries that are needed for ML, including Tensorflow, Keras, NumPy, etc. Moreover, it supports multiple developers simultaneously developing inside the same environment. Because of the rich support for ML and simultaneous development, we decided to make use of this for rapid prototyping.
Google Drive supports Google Colab integration. We made use of this by creating a Drive folder dedicated to Google Colab and created a file we could all collaborate in. The out-of-the-box functionality was indeed as great as we expected, but the amount of computing power Google provides, and the way Google Colab stores variables quickly turned the development process into a bug ridden experience. We chose not to use Google Colab any longer after running into these problems.
TensorFlow is a ML library that provides a great deal of flexibility for its users. It provides support for DL and many of its neighbouring concepts. The low-level capabilities that TensorFlow offers are not used directly. Instead it will it act as a foundation for the project. Keras, a high level API that is built on top of Tensorflow, is used. This way all power of the platform is utilised, without having to worry about minor implementation details. Since TensorFlow comes out-of-the-box with all these features and enables us to use Keras, we chose this as our foundation.
It is possible for TensorFlow to either make use of a CPU or GPU when training models. Since the GPU version of TensorFlow is much harder to get running, we opted for the CPU variant instead. We quickly discovered that training image data sets containing gigabytes worth of images on our CPUs made for a slow training process. After realizing that training our models on a CPU could take many days, we chose to try out the GPU version of TensorFlow instead. After solving several GPU-related compatibility issues we managed to get it to recognize our GPUs. This boosted the training process significantly and functioned accordingly ever since. Nonetheless, the training took a long time, but significant gains were seen when training on a GPU.
Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation, adding several layers of abstraction in the process. Therefore you do not have to worry about low-level details, but can focus on the product instead. This is am ideal solution for prototyping. Rapid prototyping will be applied during the development process, therefore we chose to built the project using Keras.
Apart from the occasional hiccups like bugs, the prototyping phase was frictionless. The only part that did not work out as well as we thought was the save modeltool that is included with Keras. The weights of the optimizers did not get saved correctly when saving a model, which meant we had to start over every time we wanted to train a model. We ended up circumvent- ing this problem by creating a workaround that properly saves and loads a trained model, including the optimizer weights.
PSNR, an abbreviation forpeak signal-to-noise-ratio, is a way to measure the fidelity of a copied signal to its original source. In the case of imagery, PSNR can calculate the noise of a copied image, giving the accuracy of the copy in the form of dB as a result. The higher the dB are, the more accurate the copy is. We chose this method so we could compare the pixel-wise accuracy of our model’s predictions to other scaling methods, like nearest neighbour or bicubic interpolation.
After comparing the PSNR of our model’s results to the other scaling methods, we quickly realized that it did not give the result that we were expecting. Since our model focuses heavily on features, which often results in more accurate looking images, we expected our model to also have the best PSNR. After 100 epochs the dB of both nearest neighbour and bicubic interpolation were better, meaning the pixel-wise accuracy of our model is worse than the traditional methods.
Since humans focus more on features than individual pixels, and thus perceive images differently than computers, it seems that PSNR might not be the ideal metric for our generated images. If we had more time we would use a combination of PSNR and SSIM. This way we get both the pixel-wise and structural accuracy.
NumPy is a library that among other things enables interaction with large arrays and matrices. By wrangling our data we can maximize the information that we learn from our data set. Since the data set for this project purely exists out of images, and images essentially are matrices, we need something that enables us to interact with these matrices. We chose NumPy because it is the most well-known and feature-rich library that fulfills our needs for this problem. Furthermore, it integrates nicely with OpenCV and Keras.
The data set generator generates the images used for training and testing the model. Alongside with various other instances, NumPy is used here to manipulate the images. Throughout the project we make extensive use of NumPysndarray, which is an array object that represents a multidimensional, homogeneous array of fixed-size items. The output of OpenCV image reading, is in this same format.
Since this project is focused on image processing, a library that enables us to extract data from images and to display them is needed. There are multiple widely used libraries in the Python ecosystem that support this, but none are as extensive and easy to use as OpenCV. It comes out-of-the-box with more than 2500 optimized algorithms for image processing. Some of these algorithms are specifically geared towards interpolation, which can help speed up the development process for this project.
Alongside speeding up the development process, OpenCV delivers stellar performance. The OpenCV codebase is written entirely in C++, which generally outperforms Python in image processing. By using a Python wrapper around the C++ codebase, only a negligible amount of performance is lost. The Python wrapper enables programmers to use a high-level language to control the performant low-level libraries. Provided with all the reasons stated above, OpenCV is a viable option as image processing tool.
OpenCV is used in muliple instances in the solution. All image reading and writing is done with OpenCV. Every prediction the GAN makes, is saved using OpenCV. There were no issues we encountered with this tool, besides some minor IDE linter issues, which were easily solved.
For the project the methods described by Ledig et al. in Photo-Realistic
Single Image Super-Resolution Using a Generative Adversarial Network are
The general idea behind this method is a GAN, where the goal of the generative model Gis to fool a differentiable discriminator D that is trained to distinguish super-resolved images from real images.
The aim of super-resolution is to recover HR images from LR images. SR GAN turned out to be a well-performing method during the exploration of possible methods. It performs better when SR GAN is combined with an object detection network such as the VGG network. The recovery of HR images needs to be measured with an objective metric.
For the measurement of SR images, PSNR can be used as an objective quality metric because the SR image can be compared with the ground truth. It can be compared because the ground truth gets down-scaled to a LR image and later it gets up-scaled with SR GAN. PSNR gives the difference between the compression of a generated image and the compression of the ground truth.
The network is evaluated using the DIV2K validation set. The test images were prepared in the same manner as for the training images.
The table below states the network’s generative capability. These results are measured with PSNR. These numbers in the figure present the mean of every image in the test set expressed in PSNR. The header ’nearest’ stands for nearest neighbor interpolation and the header ’bicubic’ stands for bicubic interpolation.
epoch 25 nearest bicubic SRGAN HR PSNR 23.61 25.38 22.65 ∞ epoch 50 nearest bicubic SRGAN HR PSNR 23.61 25.38 22.99 ∞ epoch 75 nearest bicubic SRGAN HR PSNR 23.61 25.38 23.31 ∞ epoch 100 nearest bicubic SRGAN HR PSNR 23.61 25.38 23.48 ∞
As can be observed in the results of the last section, the SR GAN generates SR images that are worse in terms of PSNR performance compared to the other methods. This can be easily attributed to the fact that the loss function described in section 4.3.1, focuses heavily on perceptual quality and not on pixel wise comparisons.
Moreover, in the paper of Ledig et al. can be seen that in terms of pixelwise objective metrics other methods outperform SR GAN. However when looking at subjective metrics, in this case MOS, SR GAN outperforms the other methods by a considerable margin. The same can be said for this implemented network.
For this project resources for extensive mean opinion score testing with sufficient subjects N, were not readily available and therefore such tests were not performed.
The problem statement of this project is enhancing low resolution images by applying a generative adversarial network to produce super resolved images.
The performance of the implemented SRGAN was measured
against the baseline model, which is bicubic interpolation. The performance of the SR GAN showed slightly worse performance in terms of the quality metric PNSR, when compared with bicubic interpolation. This was not the expected result.
However PSNR has been shown to perform poorly compared to other quality metrics when it comes to estimating the quality of images as per-
ceived by humans.
The main for the project was to improve perceptual quality of the SR images, hence the chosen perceptual loss function. Due to circumstance wide-scale mean opinion score testing was not viable. Most likely this would have shown the perceptual quality gained.