Objective Image And Video Quality Assessment With Applications

Objective Image And Video Quality Assessment With Applications

Date

2009-09-16T18:16:39Z

Publisher

Electrical Engineering

Abstract

Objective Image and Video Quality Assessment (IQA/VQA) aims to automatically measure the quality degradation perceived by the human eyes. It is of fundamental importance to address a wide variety of problems in image and video processing. Based on the availability of the information about the reference image, IQA/VQA models can be classified into Full-reference (FR), Reduced-reference (RR) and No-reference (NR) IQA/VQA methods. This dissertation focuses on FRIQA/VQA, RRIQA/VQA, as well as their applications in perceptual image coding and video interpolation.First, we propose novel metrics for FRIQA/VQA based on Structural SIMilarity (SSIM) and the information theoretical weighting. The spatial information weights for image and the spatial-temporal information weights for video are computed respectively in an information theoretical framework. For FRIQA, the spatial information weight is computed as the mutual information using Natural Scene Statistics (NSS) models. For FRVQA, we incorporate the prior and likelihood models of human visual speed perception to compute the spatial-temporal information weight as a sum of information content and perceptual uncertainty. Moreover, our metrics employ the perceptual weights for multiscale SSIM based on subjective tests.Second, we propose general-purpose RRIQA algorithms which estimate perceptual image quality degradations with partial information about the "perfect-quality" reference image. Considering the dependence in the natural images, joint statistical model is applied to RRIQA, which can handle more general distortions than marginal statistics. A novel RRIQA method is proposed using the statistics of the perceptually and statistically motivated image representation. By using a Gaussian scale mixture statistical model of image wavelet coefficients, we compute a divisive normalization transformation (DNT) for images and evaluate the quality of a distorted image by comparing a set of reduced-reference statistical features extracted from DNT-domain representations of the reference and distorted images, respectively. This leads to a generic or general-purpose RRIQA method, in which no assumption is made about the types of distortions occurring in the image being evaluated. To address the problem of RRVQA, a novel statistical prior to measure the motion regularity of the natural image sequences is adopted. We investigate the temporal variations of local phase structures in the complex wavelet transform domain. It is observed that natural image sequences exhibit strong prior of temporal motion smoothness, by which local phases of wavelet coefficients can be well predicted from their temporal neighbors. We study how such a statistical regularity is interfered with "unnatural" image distortions and demonstrate the potentials of using temporal motion smoothness measures for RRVQA.Third, we apply our IQA/VQA methods for perceptual image coding and video interpolation. Typically, perceptual image coding algorithms impose perceptual modeling in a preprocessing stage. A perceptual normalization model is often used to transform the original image signal into a perceptually uniform space, in which all the transform coefficients have equal perceptual importance. Standard coding schemes are then applied uniformly to all coefficients. Here we use a different approach, in which we iteratively reallocates the available bits over the image space based on a maximum of minimal structural similarity criterion. We demonstrate the proposed method by incorporating it with the bitplane coding scheme in the set partitioning in hierarchical trees algorithm. Finally, we propose a video frame interpolation method by using the prior knowledge about temporal motion smoothness measured in the complex wavelet domain. This allows us to avoid the time-consuming motion estimation process, and thus largely reduces the computational complexity of video interpolation.