1. Motivation

Deep learning based approaches usually require a large number of ground-truth images for training. Most of them are trained on synthetic hazy datasets (e.g., NYU Depth dataset and Make3D dataset). As these synthetic hazy datasets contain limited image categories and image depths, the performance of existing deep learning based algorithms is usually limited to synthetic training datasets and cannot be well generalized to real-world hazy images.

2. Contribution

The proposed algorithm applies a deep Convolutional Neural Network (CNN) containing a supervised learning branch and an unsupervised learning branch.

1. We propose a semi-supervised algorithm to learn the relationship between synthetic and real-world hazy images. The proposed network consists of a supervised branch and an unsupervised branch.

2. We exploit conventional image priors as unlabeled losses to train the unsupervised branch with real training data.

3. We conduct extensive experiments and demonstrate that the proposed semi-supervised dehazing method performs favorably against the state-of-the-art dehazing approaches both on the synthetic datasets as well as real hazy images.

3. Network

We use an encoder-decoder architecture with skip connections which has been shown effective for low-level tasks . We show the architecture and configurations of the proposed network in Figure 2 and Table I. The encoder contains three scales and each consists of three stacked residual blocks. Similar to the work by Nah et al, we do not use any normalization layer in the residual blocks.

Proposed semi-supervised learning framework for single image dehazing. The proposed method consists of two branches sharing the same weights. The supervised branch is trained using labeled synthetic data and loss functions based on mean squared, perceptual, and adversarial errors. The unsupervised branch is trained using unlabeled real data and loss functions based on dark channel loss and total variation.

table 1

4. Details

4.1 Loss fuctions

We combine supervised losses, unsupervised losses, and the adversarial loss to train the proposed network

$L= L_{c} + \lambda L_{p} + \gamma L_{t} + \mu L_{d} + \eta L_{a}$

mean squared loss + perceptual loss + total variation loss + dark channel loss + GAN loss

4.2 Training Details

We alternatively update the generator and discriminator by updating one while fixing the other. More specifically, we update the discriminator once after updating the generator five times. When updating the generator, we optimize the network parameters in a semi-supervised way. We use the Pytorch toolbox [25] and Adam [10] solver to optimize both the generator and discriminator. We set $\beta$ 1 = 0.9, $\beta$ 2 = 0.99, and the weight decay as $10^ {-4}$ . The network is trained for 300 epochs. The learning rate is set to be $10^{-4}$ at the first 150 epochs, and decreased linearly to $10^{-6}$ within the following 150 epochs by $l_r = 10^{-4} - \frac{10^{-4} - 10^{-6}}{150} (E-150)$ ,where E denotes the number of the training epoch.

We train the network by randomly choosing both labeled and unlabeled samples from the RESIDE dataset, which contains the ITS (Indoor Training Set), OTS (Outdoor Training Set), SOTS (Synthetic Object Testing Set), URHI (Unlabeled real Hazy Images), and RTTS (real Task-driven Testing Set).For labeled data, we select 4000 synthetic hazy images, 2000 from the ITS set and 2000 from the OTS set. For unlabeled data, we randomly choose 2000 real hazy images from the URHI dataset. We set the batch size to 4, and apply the following strategies to randomly augment the training data: 1) flipping horizontally and vertically, 2) rotating for −90 or 90, and 3) adding Gaussian noise with the sigma of 0.01. Then we randomly crop the images to the size of 256 × 256 and normalize the pixel values to [−1, 1].

We set the patch size as 35 × 35 when computing the DC loss. The loss weights are set as: $\lambda = 10^{-2}, \gamma = 10^{-5}, \mu=10^{-5}, \eta =10^{-3}.$