Vignesh Ravikumar | Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring

Citation: S. Nah, S. Son, and K. M. Lee, “Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8094-8103, doi: 10.1109/CVPR.2019.00829.

Brief Summary:
This paper suggests a novel video deblurring method that beats the state-of-the-art performance using Recurrent Neural Networks (RNN). This real-time deblurring method exploits the temporal relationship between the past frame and the current frame to improve the accuracy of the model. This method does not use any additional modules, which helps in updating the weights of the iterative model in a stochastic manner. Experimental results on the GOPRO dataset show the Peak Signal-to-Noise Ratio to be 29.97.

Main Contributions:
The model produces better results without modifying the architecture by updating the hidden states multiple times internally during a single time step.
More optimal hidden state update strategies such as partial recurrence and different types of intra-frame iteration strategies are explained clearly.
The data term in the loss function minimizes the restoration error and a prior term that favors a shorter computation path.
The deblurring accuracy and the computational efficiency beats the state-of-the-art models. This can be inferred from the experimental results in GOPRO dataset.

Strengths:
Exploits both inter-frame and intra-frame recurrent schemes which are simple yet effective.
The previous literature is explained comprehensively, which gives good background knowledge about deblurring to the reader.
Strong experimental results and algorithms help in understanding the concept clearly.
The figures and tables used in the paper convey the technical work intuitively.

Weakness:
The paper does not contain code resources for reproducing the results. The captioning of the figures is not clear and the paper contains many grammatical errors. The paper heavily relies on abbreviations which might confuse the readers.

In-Depth Analysis of Strengths and Weaknesses:
The main strength of the paper is to exploit the temporal relationship between neighboring frames to improve the accuracy of the deblurring model. The single-cell and dual-cell approaches differ in which parameters are used for updating the hidden states. The former uses the same parameters to estimate both the initial hidden state and the updated hidden state, while the latter uses two RNN cells where the second cell is used to update the hidden states and predict latent frames.
The previous literature on video deblurring, burst deblurring, and stochastic neural networks was comprehensively explained. The experimental results on the GOPRO dataset and Deep video deblurring for the hand-held camera dataset were solid, which is explained in the next section.
Even though the paper has several strengths, it has some major flaws. The inability to reproduce the results without coding resources is difficult for researchers to further improve on the same line of research. Additionally, the presence of grammatical errors and confusing abbreviations make the paper less readable.

Experimental Results:
The experiments were conducted on the GOPRO dataset with 2103 training samples from 22 sequences, and 1111 evaluation samples from 11 sequences. From this dataset, blur and sharp image pairs were generated. The original video resolution was downsampled from 1280 × 720 to 960 × 540 before averaging to suppress the noise and video compression artifacts. All the models are implemented using PyTorch 0.4.1, which was created with CUDA 9.2 and cuDNN 7.1, on NVIDIA GTX 1080 Ti GPUs. Another dataset consisting of 61 sequences containing 5708 training pairs and 10 sequences including 1000 evaluation pairs was synthesized from 240fps videos. PSNR and SSIM were used to evaluate the models.
Comparisons on GOPRO Dataset:
In the GOPRO dataset, they achieved a deblurring accuracy of 29.97 PSNR and 0.8947 SSIM. This result is better than the existing state-of-the-art, and this improved result is due to the proposed intra-frame iteration scheme and the stochastic training methods.
Comparisons on Deep video deblurring for hand-held cameras Dataset and Real Videos:
For this dataset, the deblurring accuracy was 30.80 PSNR and 0.8991 SSIM. The result on real videos clarifies blurred textures.

Extensions:
The architecture of this model can be further improved by the latest architectures such as Long Short Term Memory (LSTM) and transformers.

Comments:
This paper paves way for research in image deblurring that can be used in applications such as medical imaging, microscopy, remote sensing, and planetary imaging.