This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

1

Probability Distribution Estimation for Autoregressive Pixel-predictive Image Coding Andreas Weinlich, Peter Amon, Andreas Hutter, Member, IEEE and Andr´e Kaup, Fellow, IEEE

Abstract—Pixel-wise linear prediction using backwardadaptive least-squares or weighted least-squares estimation of prediction coefficients is currently among the state-of-the-art methods for lossless image compression. While current research is focused on mean intensity prediction of the pixel to be transmitted, best compression requires occurrence probability estimates for all possible intensity values. Apart from common heuristic approaches, we show how prediction error variance estimates can be derived from the (weighted) least-squares training region and how a complete probability distribution can be built based on an autoregressive image model. The analysis of image stationarity properties further allows deriving a novel formula for weight computation in weighted least-squares proofing and generalizing ad-hoc equations from literature. For sparse intensity distributions in non-natural images a modified image model is presented. Evaluations were done in the newly developed C++ framework Vanilc (Volumetric, artificial, and natural image lossless coder), which can compress a wide range of images, including 16-bit medical 3-D volumes or multi-channel data. A comparison with several of the best available lossless image codecs proofs that the method can achieve very competitive compression ratios. In terms of reproducible research, the source code of Vanilc has been made public. Index Terms—Vanilc, weighted least-squares weighting function, parametric distribution model, variance estimation, 16 bit lossless medical 3-D image compression, histogram sparsification.

I. I NTRODUCTION A. Applications for Lossless Image Compression

F

OR a large part, research efforts in visual data compression are presently focused on non-reversible (lossy) coding of consumer images and videos, which is attributable to its high market share. This comprises mostly recordings of natural content, produced either with professional cameras or just by users themselves with their mobile imaging devices. Also screen content coding like used for text overlays in videos, screencasts, remote desktop services, computergenerated graphics, or document archiving becomes increasingly popular [1]. However, the example of screen content demonstrates a domain where reversible (lossless) compression is often desirable since compression loss is very noticeable in such type of visual data. Another application area for lossless coding is raw material that is meant for postprocessing in digital cinema or in professional photography A. Weinlich and A. Kaup are with the Institute of Multimedia Communications and Signal Processing, Friedrich-Alexander Universit¨at ErlangenN¨urnberg, Erlangen, Germany. E-mail: {weinlich, kaup}@lnt.de A. Weinlich, P. Amon, and A. Hutter are with Sensing and Industrial Imaging, Siemens Corporate Technology, Munich, Germany. E-mail: {p.amon, andreas.hutter}@siemens.com Manuscript received February 25, 2015; revised December 1, 2015.

where even greater bit depth is often demanded. Compression loss is also not desirable when computer vision analysis shall be applied to recordings. Areas where lossy algorithms may even be prohibited are safety-critical domains like in automotive applications or diagnostic medical image compression. B. Overview of Existing Schemes The history of lossless image coding dates back to the 1980s when compression algorithms for general data became popular due to the widespread use of the Internet. One popular type of such algorithms is based on the Lempel-Ziv decorrelation method [2] where previously observed data sequences are transmitted only once if they occur repeatedly. In image coding it is used, e. g., for the GIF and PNG image formats. A recent implementation—LZMA—is still among the best compression methods for digital data [3]. One of its current main competitors is PAQ [4] which can be seen as a successor of the prediction by partial matching (PPM) [5] technique. Occurrence probabilities are computed for each bit based on many different contexts formed out of already transmitted bits. These are encoded using an arithmetic entropy coder. A current implementation is ZPAQ which can also be configured for lossless image coding. Such a two-stage approach is also common for most algorithms that are explicitly focused on image compression: First, the zero-order entropy is reduced by exploiting inter-pixel dependencies, then entropy coding is applied to the resulting lower-entropy data. The advantage of general purpose algorithms is their broad applicability to all types of data with impressive compression performance. However, with measured data like natural images, containing specific types of noise as well as having correlations in other than the first dimension, their performance drops since these characteristics are hidden to them. The most popular image compression standard JPEG [6] in its original form is only capable of lossy compression. It partitions the image into blocks, which are then transformed using a discrete cosine transform (DCT) in order to reduce interpixel correlations within these blocks. The new standard JPEG XR [7], as well as single intra image coding in the popular video compression formats H.264/AVC [8], HEVC/H.265 [9], and Google VP9 [10], do also apply block processing where typically similar transforms like integer variants of the DCT or of the discrete sine transform (DST) are used. Contrary to JPEG, these other coders also allow for lossless compression even if this is not their primary goal. Since the DCT transform matrix is fixed, it is not required to transmit or estimate it at decoder side from pixel values that may be distorted in case of

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

lossy coding. Furthermore, this scheme simplifies irrelevancy reduction [11], so for lossy coding it works very well. On the other hand, it suffers from a bias-variance trade-off: Particularly with larger blocks there is suboptimal decorrelation due to non-stationarity. With smaller blocks it suffers from the wrong assumption of circular stationarity within a block and insufficient decorrelation between blocks [12]. Wavelet coders like JPEG 2000 [13] try to handle the trade-off by applying a whole-image transform. For both, note that it is not simple to find accurate statistics for the resulting transformed data in order to drive subsequent entropy coding. All of these drawbacks make transform coding not much attractive for use in lossless compression. Even in the most recent range extensions for the modern compression standard HEVC/H.265 [14] the transform is deactivated in lossless applications and instead merely prediction is applied. Note, however, that this prediction can not quite reach the compression performance of dedicated lossless approaches since lossless coding has not been its primary area of application [15], [16]. Lossless pixel-predictive coders like lossless JPEG, CALIC [17], or JPEG-LS [18] apply a more efficient scheme for dependency reduction. Pixels of an image are transmitted one after another in a line-wise raster scan order, i. e., from the upper left of an image, line by line down to the lower right. Thus, at each pixel position it may be assumed that all previous (causal) pixels are also known at decoder side. Several of these pixels most closely located to the current pixel are called neighborhood (see for example Fig. 1). At both enand decoder, neighborhood pixel values are processed by an heuristic non-linear function to form a prediction value that should be as similar to the still unknown true intensity value as possible. For example, in JPEG-LS the prediction value is obtained by using the larger value from pixels −κ1 and −κ2 (the neighboring left and top pixel) if both of them are larger than −κ3 , by using the smaller one of them if both are smaller than −κ3 , and the sum of both minus −κ3 otherwise. This predictor is called median edge detection (MED). A slightly more complex scheme is applied in CALIC using gradient-adjusted prediction (GAP). Both methods rely on a subsequent error-feedback loop where the prediction is refined using all previously transmitted pixels. The difference between true intensity and rounded prediction is called prediction error and should be less spatially correlated than pixel intensities are. It is entropy coded using some model for its probability distribution. The decoder can reconstruct the original intensity by adding the prediction error to the prediction computed in the same way as at the encoder. A more recent predictor that claims to combine simplicity of MED with the efficiency of GAP is called gradient edge detection (GED) [19]. An advantage of these approaches is their low computational complexity and even so the pretty good compression ratio for lossless coding. However, the latter one can still be improved. C. Least-Squares Pixel Prediction A more recent approach that typically achieves even better predictions using higher computational complexity is called autoregressive least-squares (LS) method. The prediction value

2

-κ15

neighborhood

-κ14 -κ10 -κ6 -κ16 -κ9

-κ3

-κ17 -κ13 -κ7

-κ1

-κ2

-κ0

-κ8 -κ12 -κ18 -κ4

-κ5 -κ11 -κ19

current pixel

Fig. 1. Example of a causal pixel neighborhood from which a current pixel is predicted. For smaller prediction order only the vectors κ1 , . . . , κq are used.

is computed by a linear combination of causal neighborhood pixel values where in different image regions different sets of coefficients are estimated for this linear combination. Note that this can also mean the computation of an individual coefficient set for each single pixel. Linear predictive models using autoregression are well known in statistical time series analysis [20] (the first edition of this book was published in 1970) and have successfully been applied in audio coding [21] and econometric forecasting [22] for decades. Even the application to image compression is rather old [23] and was since successfully employed in forward-adaptive methods like TMW [24] or MRP 0.5 (minimum-rate predictors) [25] where linear prediction coefficients must be sent as a side information. In order to restrict the amount of side information, groups of pixels with similar structure must be chosen such that the same predictor can be used for all of them. For this purpose, e. g., MRP uses quad-tree partitioning of an image similar to the current HEVC approach. Recently, also an extension to 3-D prediction has been presented in [26], but without an implementation being publicly available. In TMW such groups are not strictly separated from each other but instead, different predictors are “blended” together in terms of the probability distribution for each pixel. Forward-adaptive methods have the advantage of low decoder complexity, however, the restricted number of predictors as well as the additional side information impairs coding efficiency. A fast and efficient technique for backward-adaptive image coding was first proposed in 1992 [27] and became more popular with the EDP (edgedirected prediction) algorithm [28], available on the internet. Backward-adaptive methods use only strictly causal pixels for the estimation of linear prediction coefficients. In this way the coefficients can as well be estimated at decoder side such that no side information is required any more. As an optimal predictor can be estimated for each single pixel, predictions are generally even better than in forward-adaptive methods at the expense of decoder complexity. Since 1998, numerous refinements have been presented. One popular pretty efficient open source implementation is Glicbawls [29], making use of a weighting scheme for predictor estimation dependent on the spatial distance to the pixel to be predicted. Further complexity reduction can be achieved by estimating a predictor only when necessary like in the run-length and adaptive linear predictive (RALP) proposal [30]. In [31] it has been shown that also

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

resolution scalability may be realized. A current development is MDL-PAR (minimum description length piecewise autoregression) [32]. Its idea is to choose both the neighborhood and the estimation region individually for each pixel based on the minimum description length criterion for optimal prediction order. Unfortunately, no implementation has been made public. Even higher precision can be achieved by using weighted least-squares (WLS) schemes [33], [34] where heteroscedastic training samples are assumed, i. e., the stationarity assumption within the training region is relaxed. The goal of our contribution is to further enhance backward-adaptive LS and WLS schemes both for theoretical and practical purpose due to their top of the class prediction performance. D. Probability Distribution Modeling Approaches All of the mentioned methods primarily aim for a most accurate prediction value estimation. Anyway, a subsequent entropy coder requires probability estimates for all conceivable prediction error values. There are two common options to address this issue: The first option is so called context modeling [17], [34]: A pixel’s prediction error is regarded as a random variable that can be described approximately by one out of a small number of distributions (contexts), typically with different variances. Which one to choose is decided by an empirical error energy estimate value based on gradients and prediction errors of neighboring pixels. After transmission of a pixel, the chosen distribution histogram is updated using the new observation. Note that this scheme does not require a specific error variance estimate—there must merely exist some relation between the rough error energy estimate and the distribution shape, for which no assumptions are necessary. Yet, because of the use of only a few distributions, it is not possible to model arbitrary image structures instantaneously. Moreover, adaptation is rather lazy since a lot of probabilities need to be learned for each distribution. Therefore, the probability model is poor in boundary regions and for fast changing image characteristics. Furthermore, it requires an offline training of certain threshold parameters and therefore does not generalize out of the box to arbitrary images characteristics, e. g., different bit depths. Finally, dependencies on the prediction value intensity are neglected, which is crucial, e. g., when image noise level and thus prediction error depend on the exposure of a current image region like in high-dynamic-range (HDR) imaging. This renders the method also inefficient for redundancies in screen content where many of the possible intensity values never occur in an image. The second option for probability estimation is to postulate a family of model distributions. Typical choices are discrete normal [35], Laplace [32], or similar empirical [24], [29] distributions. In this way only a few parameters, usually mean and variance, need to be determined in order to obtain the whole distribution. For example, for variance estimation it is sufficient to compute some average value from squared prediction error samples observed within a local neighborhood. Since in this manner an individual distribution, fit to local image characteristics, can be computed for each pixel, it is

3

not necessary to make a selection out of a finite number of distributions. The adaptation to local image characteristics is faster and no offline training is required. Additionally, a slightly modified coding method can be applied: Instead of coding prediction errors by using zeromean distributions, pixel intensities are coded directly. The prediction value is then used as mean parameter for the model distribution. Advantages are that the range of values is not doubled and that fractional predictions need not to be rounded. The latter turns out to be particularly beneficial for distribution estimates with small variance. Nevertheless, the need for some engineering parameters restricts the general applicability of the model distribution method. An example is the choice of how many prediction error values shall be used for variance estimation. A large number will distort estimates due to the non-stationarity of images and jeopardize fast adaptation. With a small number, at least inevitable for the first image lines, efficiency suffers from noisy estimates. In any case, variance estimation from observed prediction errors does not consider the fact that predictions become better towards the image center due to larger training and neighborhood windows. Finally and most importantly, the efficiency of this method considerably depends on the fit of the assumed model distributions: To mention an extreme case, for screen content the typical model distributions given above are as inefficient as context modeling. Naturally, a third option for probability estimation is to combine the two distribution estimation techniques: a finite number of model distributions, e. g., Laplace [18] or generalized Gaussians [25], can be adapted like in context modeling. This reduces computational complexity and maintains fast adaptivity up to a certain degree. E. Overview of Proposed Method and Paper Contents In Section II we will present a fairly general image model and give an own concise derivation of the LS method from scratch based on entropy minimization. Afterwards, WLS is introduced as a means for dealing with non-stationarity. Novel insights on the impact of non-stationarity in images are then used to derive a generalization of the WLS weighting functions proposed in literature. In order to overcome most of the drawbacks of traditional distribution modeling approaches, in Section III we suggest a closed approach towards estimation of parameters for model distributions like variance in LS and WLS predictive coding. For LS the basic theory is well known in econometrics literature and is frequently applied for hypothesis tests. It has also been generalized to WLS [22], [36]. However, for autoregression in images several specific considerations are necessary because of their unique characteristics. Therefore, we present a practical generalization for this case to further improve the estimation. Compared to the traditional method of estimating variances from observed prediction errors, adaptation to local image characteristics is even better and engineering parameters are no longer required. Thus, there is no trade-off choosing the number of past prediction error values used for variance estimation. Such estimation also solves the problem with

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

generating model n(k)

F

encoder x(k k )

ˆ−1 F

4

decoder e(k)

ˆ F

x(k k )

n

y

e

y

innovation process

image

prediction error

image

Fig. 2. Block diagram of image model, encoder, and decoder.

worse predictions in image border regions. It is further shown that the used parametric model distribution is optimal for the given image model. For screen content and a wide class of other artificial images which do not match the model exactly, we suggest to sparsify the probability mass functions of the distribution in order to exploit the characteristic of images with never-occurring intensity values. The presented methods have been implemented in our Vanilc (Volumetric, artificial, and natural image lossless coder) C++ framework and were compared to publicly available lossless coders (Section IV). Since lossless compression is especially desirable for medical image coding, in contrast to most other implementations this coder is also capable of predicting from 3-D neighborhoods in medical volume images and in images with up to 16 bits of bit depth. Consequently, to exploit correlations between channels, multi-channel and color images are treated as 3-D volumes as well, without any need for a color transform. In terms of reproducible research we have made our implementation public in terms of an open source project [37]. II. P IXEL - PREDICTIVE I MAGE C ODING U SING AUTOREGRESSION A. Image Model Image data is modeled as a random field x(k ) where the pixel intensities x are random variables, jointly distributed according to a multivariate discrete normal distribution. Vector k represents a d-dimensional pixel position with, e. g., d = 2 for conventional images or d = 3 for volume images. From two perspectives the normal distribution assumption is reasonable for natural images: First, the data originates from a complex superposition of continuous random processes, for example, from reflected light, sampled by a photographic camera. According to the central limit theorem, under certain common conditions, such a superposition ends up in an approximate normal distribution. Second, in most coding techniques from literature, a normal distribution is implicitly presumed as well. In transform coding a Fourier-like decorrelating linear transform like the DCT could not achieve the smallest sum of pixel-entropies for non-normally distributed data [38]. Similarly, in predictive coding a minimization of prediction error energy is only useful under the assumption of Gaussian distributions, as described later. For the moment let us further assume x(k ) is a stationary process. In this case the image process could be modeled as shown in the left part of Fig. 2 (i. e., the generating model)

where k is the scalar index of pixel k k , n(k) is a sequence of independent, identically distributed (i. i. d.) normal samples called innovation process, and F is a linear time-invariant (LTI) system. Besides, in practice we can assume that n(k) is a zeromean process since an observed mean value in x(k ) may also have resulted from an accidental shock in the past of n(k) on which, according to F, all observations might depend. Without loss of generality [20], the output of F may be written as the limit q → ∞ of a recursive function q X x(k k ) = bi x(k k − κi ) + n(k) ∀k ∈ N, (1) i=1

where −κi are vectors pointing from k k towards neighboring causal pixels (see Fig. 1 for a 2-D example). Please note that for illustration purpose here we do not consider boundary treatment in detail—for the implementation we made use of a technique like in [39]. This generating model is called an autoregressive (AR) model of infinite order. The output x(k ) is called an AR (∞) process. However, empirical experiments suggest that images may readily be approximated by a low-order AR (q) process, too. This is due to the fact that characteristics of typical images are rather more “locally predictable” than “colored-noise-like”. So, with regard to the current pixel intensity, pixels far away do not provide much information which could not equally be obtained from closer positions. Thus, images usually reveal only a few peaks in frequency domain. In matrix notation with y = (x(k k ))⊤ k∈N , b = (bi )⊤ , X = (x(k − κ )) , and i k∈N, i∈{1,...,q} k i∈{1,...,q} n = (n(k))⊤ , (1) looks as k∈N F:

n → y = Xb + n.

(2)

B. Ordinary Least-Squares Prediction Note that n features the smallest possible zero-order entropy among all invertible representations of this sequence since the samples are supposed to be i. i. d. Thus, if the coefficients b, representing F, would be known, a perfect coder could simply apply the inverted system F−1 to y and encode the resulting least-entropy sequence n = y − Xb using entropy coding. ˆ of the coefficients Yet, since F is unknown, an estimate b needs to be computed from image data by requiring the zeroorder entropy H(ˆ n ) of each sample n ˆ in the output sequence ˆ (cmp. Fig. 2) to become minimal e=n ˆ = y − Xb ˆ = argmin H(ˆ n ))] , n ) = argmin E [− ld (Pr (ˆ b

(3)

ˆ b

ˆ b

where E [.] is the expectation operator. With the assumed zeromean normal distributions this results in      2 n ˆ2 1 √ exp − 2 ˆ . = argmin E n argmin E − ld 2σ σ 2π ˆ ˆ b b (4) Note that the discrete normal distribution has been approximated by sampling a continuous normal distribution at integervalued positions. Since i. i. d. sequences are ergodic, the expectation can be replaced by an average, equivalent to ˆ k22 = (X ⊤ X)−1 X ⊤ y. ˆ = argminkˆ n k22 = argminky − X b b ˆ b

ˆ b

(5)

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

5

C. Weighted Least-Squares Prediction

neighborhoods of training positions

current neighborhood

training region T

Fig. 3. Causal training region for backward-adaptive prediction and current pixel (black) with its neighborhood, both in typical sizes for WLS estimation. Neighborhoods of two example training positions are shown in the upper left.

In forward-adaptive coders these normal equations are solved once at the encoder and then the coefficients are sent to the decoder as a side information. However, because natural images may not be expected to be stationary altogether but rather only to be locally stationary within small regions, efficient implementations like [25] use this coding method merely for single patches of an image. If these patches are small enough to account for the instationarity, a significant amount of side information is generated. In order to get rid ˆ can alternatively be estimated in of this side information, b exactly the same way at the decoder, too. As for this reason only causal pixels are allowed to be used, such an approach is referred to as backward-adaptive: after each pixel transmission the estimation process (5) is restarted in order to refine ˆ k = argminkˆ b n k k22 = (Xk⊤ Xk )−1 Xk⊤ yk

(6)

ˆk b

by incorporating the additional observation of x(k k−1 ). Notice that in this case the current pixel x(k k ) to be sent is never involved in the optimization procedure: the entries in yk = (x(k k − κj ))⊤ j∈T and Xk = (x(k k − κj − κi ))j∈T, i∈{1,...,q} are only taken from a causal training region T (see Fig. 3) which comprises |T| = p training positions. T should be neither too large due to instationarities, nor too small as a small number of values to be averaged in (6) increases the ˆ k . It shall be further mentioned that since in variance of b a backward-adaptive method the residual sequence n ˆ k is reestimated for each image position, the output prediction error sequence e = (e(k))⊤ ˆ k but does k∈N is no longer the same as n now consist of ˆ e(k) = x(k k ) − x ˆ(k k ) = x(k k ) − x⊤ k bk ,

(7)

where x ˆ(k k ) is the prediction value and xk = (x(k k − κi ))⊤ i∈{1,...,q} are the current neighboring pixels. In the following we will only deal with the backward-adaptive method, so to keep notation simple we omit index k.

For such an LS model fast implementations are possible [40] and used in Vanilc. But notice that the assumption of stationarity within training regions of fixed shape does still not reflect true image behavior: There are cases where an image is almost stationary over large homogeneous regions like singlecolored or regularly-textured areas. On the other hand, there are regions with complex structures where stationarity and constant variance of the innovation process may only be assumed for a few pixels or along an edge. The topology of such regions can even be unconnected: sometimes, stationarity is also appropriate among non-locally recurring patterns. In ˆ for the current pixel with small bias, order to estimate model b it is desirable to use only training pixels which were generated by a similar model, i. e., belong to the same stationarity region. Unfortunately, for a continuously changing model, it is typically impossible to make hard decisions whether training points are worth to be incorporated or not. Yet, if the desired true model b for the current pixel would be applied to predict at a dissimilar training position, the error variance of such a prediction would be high. This heteroscedasticity assumption of different error variance among observations can be incorporated in (5) by using the normalized Euclidean distance instead of the Euclidean norm, which weights all residuals in n ˆ differently. This results in ˆ = (X ⊤ Σ−1 X)−1 X ⊤ Σ−1 y, b

(8)

where Σ is a diagonal matrix containing relative variances   for all observations with respect to the variance E n2 of the current pixel. Their inverses in Σ−1 are called weights and need to be guessed by some heuristic scheme. If chosen identically, this WLS method reduces to ordinary LS. Compared to LS, in WLS significantly larger training regions and in consequence also larger neighborhoods can be used, thus increasing prediction accuracy. This comes at the price of a fast implementation like in [40] no longer being possible. Consider for example the training region from Fig. 4 (a) and a neighborhood of just one pixel to the left of the current pixel (q = 1). In this simple case it is possible to plot each training pixel value with respect to its neighboring pixel value into the 2-D diagram in Fig. 4 (b). The gray dots in the lower left represent training positions in the dark region whereas dots in the upper right represent those in the bright region. Because of an intensity gradient from bright to dark, training positions located on the image edge are plotted below the diagonal. The LS estimate of an affine model, plotted as a solid line, fits both bright and dark regions but cannot properly represent an edge pixel, e. g., with x = 100 (dashed line). A WLS estimate like illustrated in Fig. 4 (c) can better represent this edge value when the weights Σ−1 are chosen according to the vertical standard deviation error bars. This means, in this case, edge training positions with mean-gray neighborhood intensities close to 100 were preferred and thus have had more influence to the estimated model since they were assumed to belong to the same stationarity region. It can be seen that it is reasonable to expect the current true model b to depend on the current neighborhood x:

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

6

⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗⊗⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗

x(k k -κj )

150

⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗

100



50

100 x(k k -κj -κ1 )

(a)

⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗

100



⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗

⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗⊗⊗⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗

50

⊗ ⊗ ⊗⊗ ⊗⊗ ⊗

(c)

⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗

100

⊗ ⊗

50 ⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗ ⊗⊗⊗ 150



⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗

⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗

⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗

100 x(k k -κj -κ1 )

⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗⊗⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗

150

x(k k -κj )

x(k k -κj )

150



150

(b) ⊗⊗ ⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗

⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗

⊗ ⊗ ⊗⊗ ⊗⊗ ⊗

⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗⊗⊗⊗ ⊗ ⊗ ⊗⊗⊗ ⊗ ⊗⊗⊗⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗⊗ ⊗⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗⊗⊗ ⊗ ⊗⊗⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗⊗⊗⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗⊗ ⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗

50

50 ⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗⊗ ⊗⊗⊗



⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗

⊗ ⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗⊗⊗⊗ ⊗⊗⊗⊗⊗ ⊗⊗ ⊗ ⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗⊗⊗⊗⊗⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗⊗ ⊗⊗ ⊗⊗ ⊗⊗⊗ ⊗ ⊗⊗ ⊗⊗

50

⊗ ⊗ ⊗⊗ ⊗⊗ ⊗

⊗⊗ ⊗ ⊗ ⊗ ⊗ ⊗⊗ ⊗ ⊗ ⊗ ⊗

100 x(k k -κj -κ1 )

150

(d)

Fig. 4. (a) Typical image patch with an edge; (b) current intensity values vs. left neighbor values in gray dots, affine LS fit as solid line; (c) WLS fit as solid line, estimated by assuming standard deviations in the measurement (see error bars) increasing with distance to the current neighbor’s intensity x (dashed line); (d) same fit as before, but squared standard deviation increase does better match the measurements.

for a brighter neighborhood value around 150 a steeper line would be more reasonable. Vice versa, not only the training pixels y but also their corresponding neighborhoods X depend on b: if b would be represented by a steeper line in our example, an observation of neighboring pixel values around 150 would become more probable given that model. This overall insight, that for a fixed, yet unknown, model b, neighborhoods X depend on and will probably occur around the current neighborhood x, gives rise to the following new procedure for weight estimation, explaining and generalizing approaches from literature: The current neighborhood x is taken as an estimate for the mean of the neighborhood distribution conditioned on the current model. For simplicity we first assume that innovation process variances stay constant within a neighborhood such that neighborhood variances can be used for estimating the diagonal entries σj2 (j ∈ T) of Σ. If x would be the true mean, an unbiased estimate of the absolute variances would be Pq 2 2 σ ˆabsj = i=1 α (x(k − κj − κi ) − x(k − κi )) , (9)

with α = 1q . Yet, in fact, x is distorted by the current innovation values ni that need to be subtracted from x, such that a correct expectation of the relative variances is given by hP i q 2 E α (x(k − κ − κ ) − x(k − κ ) + n ) j i i i i=1 σ ˆj2 = E [n2 ]  2  Pq 2 E n + i=1 α (x(k − κj − κi ) − x(k − κi )) . (10) = E [n2 ]

Unfortunately,   at this point it is difficult to determine the correct E n2 , so in an implementation it may be set to a fixed estimate of the image noise or simply to a squared 1 fraction of the maximum image intensity value (in Vanilc 120 ). −1 The weights in Σ are then given by the inverse of (10). Motivated  in  a more empirical way, this weighting function with E n2 = α was first suggested in [33]. In order to further refine it, the simplification of constant variances within neighborhoods can be replaced by an assumption that the variances increase with the distance to the current position, Pq −1 2 which yields α = kκi k−1 . In the special 2 / l=1 kκl k2 case of a maximum image intensity value of 255 and the 18pixel neighborhood used in [34], (10) is the same as the WLS weighting function proposed there since a constant factor does not matter as long as no prediction error variance estimation is considered (like in the next section). Finally, Fig. 4 illustrates that a linear model cannot perfectly model image behavior. An appropriate fitting of more complex models would require a higher number of training points. Even if this is not affordable, Fig. 4 (d) indicates that at least a squared increase of standard deviation with respect to the distance to x might turn out to be useful. Indeed, our experiments proved that such a squared weighting function can improve the prediction error variance estimate described in the following section. For estimation of b, it is only useful with very small neighborhoods since the higher variation in the weights cuts down the equivalent number of sampling (training) points [41]. In Fig. 5 (b) and (c) the absolute prediction error amplified by a factor of four is shown for LS and WLS respectively when the lighthouse image in Fig. 5 (a) is being predicted using Vanilc. Especially at complex structures like the fence, WLS can achieve remarkably better predictions. III. D ISTRIBUTION M ODEL AND VARIANCE E STIMATION For most efficient coding, at this point the goal is to find the most accurate probability distribution for the current prediction error e to be sent in order to optimally drive entropy coding. First, variance estimation for non-weighted LS is considered. While (14) and (16) are well known in econometrics literature, in the presented compact form the given derivations have not been found elsewhere. Also the subsequent extension to WLS has not explicitly been described in the literature. To the best of our knowledge such distribution estimation has never been applied to improve entropy coding. A. Ordinary Least-Squares Variance Estimation Substituting (2) into (6) and (6) into (7) gives e = x − x⊤ b − x⊤ (X ⊤ X)−1 X ⊤ n.

(11)

Random variables x and n are normally distributed by assumption. All other variables are assumed to be deterministic in classic LS prediction, thus e must also feature a normal distribution. For current position k, in vector notation (1) looks as x = x⊤ b + n. Consequently, the mean value of e may be computed from (11) as   E [e] = E n − x⊤ (X ⊤ X)−1 X ⊤ n = 0 (12)

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

(a)

7

(b)

(c)

(d)

(e)

Fig. 5. (a) Lighthouse test image; (b) absolute value of LS prediction error amplified by factor four; (c) same but with WLS; (d) estimate of prediction error standard deviation based on an averaging of squared prediction errors in a causal neighborhood; (e) more precise estimate based on the proposed method.

since both n and n are assumed to be zero-mean which is not altered by a linear transform. What remains to be estimated is the variance of e for which we compute the expectation       E e2 = E n2 − 2 E x⊤ (X ⊤ X)−1 X ⊤ nn   +x⊤ (X ⊤ X)−1 X ⊤ E nn ⊤ X(X ⊤ X)−1 x. (13)

Considering that n and all entries of n are mutually uncorrelated, expectation term becomes zero and   the  second  E nn ⊤ = E n2 I, where I is the identity matrix. Therefore,       E e2 = E n2 + E n2 x⊤ (X ⊤ X)−1 x, (14)

which can be interpreted as follows: The prediction error variability introduced by the first term is caused by the variability in the current innovation process sample. The variability from ˆ . It the second term is caused by the stochastic estimate b is proportional to the variability in the innovation process, too, but furthermore, it depends on the quality of observed training patches X—many similar training patches lead to unstable estimates—as well as on the current neighborhood x which causes uncertainty in the estimate if structures are very different to the trained ones. An unbiased estimate of the innovation process variance (sample variance) is given from observed residuals n ˆ as [42] ˆ )⊤ (y − X b ˆ)   (y − X b ˆ ⊤n ˆ ˆ n2 = n = . E p−q p−q

(15)

Note that simple averaging of the squared residuals in n ˆ would underestimate the variance because according to (5) the system has been estimated in a way that the residual variance becomes as small as possible, i. e., smaller than the innovation process variance. Multiplication in the numerator yields

constant time complexity due to the fact that y⊤ y is obtained in one step with X ⊤ X by the fast P2AR [40] algorithm. However, this may not directly be compared to the traditional squared prediction error method, the complexity of which in first place depends on the chosen causal estimation region. B. Finding a Parametric Distribution Model In a fast approximation,   a normal distribution with the ˆ e2 may be used for entropy coding. estimated variance E   ˆ n2 is also to be However, a closer look reveals that E regarded as a random variable because of its dependence on random image values. Being a sum of squared normal residuals, it follows a χ2 -distribution. Hence, with the available information, the normalized prediction error √ ˆe 2 is best E[e ]

described by Student’s t-distribution with p − q degrees of freedom (DOF). For a large number of DOF, a t-distribution looks similar to a normal distribution. For a small number of DOF, it has heavier tails similar to a Laplace or generalized normal distribution1 . For entropy coding, in order to get the probability of the current pixel attaining a discrete integer intensity x0 ,     x0√ −ˆ x−0.5 x+0.5 − t (17) Pr (x = x0 ) = tp−q x0√−ˆ p−q ˆ 2 ˆ 2 E[e ]

E[e ]

is evaluated, where tp−q (.) denotes the cumulative tdistribution function with p − q DOF. By this means, instead of transmitting prediction errors, pixel values can be coded directly like described in Section I. C. Weighted Least-Squares Variance Estimation

For WLS, derivations are similar to LS, except that (8) is to be consulted in place  of (6).It follows again that e is zero⊤ ˆ⊤X ⊤y + b ˆ⊤X ⊤Xb ˆ ˆ ⊤ X ⊤ y mean. With E nn ⊤ = E n2 Σ, the variance becomes  2  y⊤ y − 2b y y − b ˆ n = E =       p−q p−q E e2 = E n2 + E n2 x⊤ (X ⊤ Σ−1 X)−1 x, (18) (16)   ˆ where (5) was substituted for the last b . where E n2 may be estimated using With (14) and (16) a computationally efficient estimator ˆ ⊤ X ⊤ Σ−1 y is available for the prediction error variance: The expression  2 n ˆ ⊤ Σ−1 n ˆ y⊤ Σ−1 y − b ˆ E n = = . (19) (X ⊤ X)−1 x can be solved in a single step along with (6) and p−q p−q 2 has a computational complexity of O q , the inner product 1 Such behavior may also be observed in histograms of normalized predicwith x⊤ has a complexity of O (q) arithmetic operations. ⊤ tion errors. In general, heavier-tailed histograms occur when obtained from ⊤ ˆ X y is already available from (6), the inner product with b normally distributed observations with varying variances. For example, a requires another O (q) operations. All other operations have Laplace histogram occurs when the variance is exponentially distributed [43].

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

8

However, this estimate is very sensitive to the accuracy of matrix Σ−1 , being only a rough guess. In practice, the linear model often also fits training neighborhoods significantly different from the current x. The accuracies of such training   ˆ n2 are underestimated. points and in consequence variance E A convenient approach to circumvent this issue is to scale the weights to an average of one [44]. This leads to a tractable overestimation in the lower single-digit percentage range. Furthermore, as could be seen from Fig. 4 (d), the assumption of a squared variance increase can additionally improve the result. With respect to the original weights, the final estimate of the innovation process variance can be written as   ˆ n2 = E

n ˆ ⊤ Σ−2 n ˆ p · , −2 tr(Σ ) p−q

(20)

where tr (.) denotes the trace of a matrix. Without squared weights, the approach has the same complexity as in the LS case, except that, like for WLS in general, the P2AR algorithm cannot be used to obtain y ⊤ Σ−1 y. Using squared weights, (19) cannot be applied for a fast estimate, so an evaluation of (20) involves a higher complexity of O (pq), which is but still nonessential compared to the complexity O pq 2 for computing the covariance matrix in (8). Fig. 5 (d) shows a standard deviation estimate using the traditional squared prediction error method. It can be seen that the WLS approach in Fig. 5 (e) achieves less blurry estimates at sharp edges and more homogeneous estimates at noiselike structures like the grass. Also the slight overestimation   ˆ e2 is visible. of E IV. I MPLEMENTATION AND E VALUATION For running experiments and evaluations, the presented methods have been implemented in a C++ software framework for predictive image compression, called Volumetric, artificial, and natural image lossless coder (Vanilc), referring to its scope of application. To allow for reproduction of given results and to encourage further progress in lossless coding, the source code has been made public together with this article [37]. A. Vanilc Implementation Overview Vanilc is a platform-independent codec which supports LS prediction using the P2AR algorithm, WLS prediction, as well as other predictors like MED from JPEG-LS or the NLM predictor [45]. For variance estimation different methods like the conventional prediction error estimator or the analytic approach from Section III, with or without squared weights, may be selected. A distribution maker creates probability distributions for the arithmetic coder, so no binarization is required, which would typically impair compression ratio. Next to the model distribution, also a regularizing distribution may be supplemented in order to avoid very small probabilities that might lead to numerical issues. Typically it should be heavily tailed and have large variance, being then mixed to the model distribution, carrying small weight. Furthermore, such a distribution will improve the compression of sparsely occurring innovation pixels within an image like at the outset of an object. If even faster compression as well as equal encoding

and decoding run times are desired, a combined Rice-Golomb and run-length coder may replace arithmetic coding, however it is naturally restricted to Laplace distributions. A modular object-oriented design was chosen for the implementation, such that coding blocks can be combined almost arbitrarily and in order to support reusability. The usage of OpenCV [46] offers easy matrix handling, a selection of efficient linear equation solvers, and the support for common image file formats. Detailed settings can be made without recompiling the source code using YAML [47] configuration files. Alternatively, settings may be provided on command line for batch processing. The build system CMake allows for various compilers, where the GNU Compiler Collection (GCC) for Linux and Microsoft Visual Studio for Windows were tested so far. Thanks to a reviewer we can also report that it works with MAC OS X 10.10.5. Building should work out of the box without installing additional software except for some OpenCV dependencies. An extensive FAQ is provided for more detailed information regarding the implementation. With a medical application in mind, images of higher bit depths up to 16 bits per pixel (bpp) can be coded without special settings or tuned engineering parameters. Although, the presented estimation techniques may be applied for arbitrary image dimensionality, Vanilc is restricted to 2-D and 3-D images. This is due to the limited fields of application and intractable computational complexity that comes with higher dimensions. Neighborhood and training region sizes in all dimensions can be chosen arbitrarily. Multi-channel images like RGB color images are regarded as a special case of 3-D images where the channels are coded one after another. In this case the training region is still 2-D but the neighborhood is chosen to comprise also one to five pixels from each causal channel. A similar approach was suggested in [48] for a forward-adaptive scheme. Some additional prediction accuracy can be gained here by adding an additional “dummy” channel with constant value to fit an affine model. The affine model was already mentioned in [49], but has been widely adopted since. Compared to a simple decorrelating color transform, the used approach can better exploit inter-channel dependencies adaptively and does not expect any knowledge about the type of channels (RGB, YUV, . . . ). For solving the linear, symmetric, positive definite system of normal equations, first a Cholesky decomposition is attempted. If it fails due to a non-regular system, a QR decomposition is applied which is slower but can find a minimum-norm solution. In fact, singular matrices often occur in boundary regions because of the small number of training positions and in artificial images because of the missing image noise. In Vanilc, the additional time effort for the QR decomposition can be circumvented by using a Tikhonov regularization that can selectively be activated for boundary and inner image region. B. Histogram Sparsification Non-natural images have particularly challenging characteristics as they do not originate from light sampled by a camera. Often the central limit theorem is not fulfilled, so they may not always be assumed to be normally distributed. A typical

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

9

TABLE I C OMPRESSED RATE IN BITS / PIXEL FOR WATERLOO GRAY TEST SET [50], 8 BIT; TOP : SET 1; BOTTOM : SET 2

JPE

G

JPE

Va ni

lc

Va ni

lc

Va nil cW W LS LS LS D D P Va ni

MR

ZP AQ

3.630 6.012 4.570 0.928 1.066 5.516 0.231 4.755 2.983 1.342 0.163 4.215

3.471 5.790 4.314 0.153 0.386 5.281 0.094 4.581 2.723 1.571 0.077 1.632

3.238 5.584 3.998 0.132 0.051 5.098 0.016 4.189 2.353 0.859 0.013 3.175

4.062 6.368 4.766 0.230 0.212 5.821 0.122 5.644 3.335 1.504 0.177 0.496

3.282 5.716 4.234 0.756 0.449 5.217 0.109 4.239 2.712 1.034 0.057 5.826

3.289 5.755 4.269 1.045 0.357 5.236 0.143 4.246 2.729 1.070 0.077 6.076

2.782 5.642 4.081 0.081 0.021 5.142 0.021 4.180 2.413 1.017 0.007 0.556

Whole Set Average

2.951 2.951

2.506 2.506

2.392 2.392

2.728 2.728

2.802 2.802

2.858 2.858

barb boat france3 frog goldhill2 lena2 library3 mandrill mountain peppers2 washsat zelda

4.669 4.415 2.035 6.267 4.847 4.326 5.712 6.119 6.712 4.629 4.441 4.001

4.733 4.250 1.413 6.049 4.712 4.244 5.101 6.037 6.422 4.489 4.129 4.005

3.910 3.872 0.603 -4 4.465 3.923 4.765 5.679 6.221 4.196 4.147 3.632

5.672 4.965 0.422 3.356 5.283 5.066 4.487 6.369 4.493 5.095 2.290 4.920

3.995 4.068 2.182 5.947 4.580 3.963 5.329 5.726 6.255 4.285 4.283 3.695

Whole Set Average

4.806 4.848

4.592 4.632

-

4.264 4.368

4.494 4.526

200

bird bridge camera circles3 crosses3 goldhill1 horiz3 lena1 montage3 slope3 squares3 text3

0

LS

P

LS

P

JPE

lc

G-

Set

TABLE II C OMPRESSED RATE IN BITS / PIXEL FOR WATERLOO COLOR TEST SET [50], 8 BIT.

G

JPE

Va Va Va n Va nil nil nil c W ilc W c c LS LS LS LS aL IC D D P P

Gr

G-

LS

ZP AQ

5.224 3.452 4.532 2.999 3.416 3.552 3.413 3.320

2.432 2.019 4.535 3.764 3.917 5.204 1.567 4.181

1.539 0.402 4.751 3.429 3.389 4.241 0.420 4.085

2.133 0.676 4.083 2.668 3.098 3.376 0.688 3.018

4.681 2.250 4.086 3.252 3.468 4.081 2.517 3.596

4.734 2.248 4.056 3.274 3.464 4.113 2.475 3.623

4.421 1.438 3.996 2.773 2.996 3.412 1.426 3.095

4.370 1.196 3.950 2.744 2.974 3.399 1.188 3.063

3.773 3.739

2.986 3.452

2.061 2.782

1.962 2.468

3.289 3.491

3.298 3.498

2.680 2.945

2.560 2.861

Set

200

2.749 5.596 3.995 0.043 0.016 5.090 0.015 4.123 2.363 0.960 0.007 0.621

clegg3 frymire3 lena3 monarch peppers3 sail serrano3 tulips Whole Set Average

2.162 2.162

2.131 2.131

4.003 4.091 2.265 5.948 4.584 3.940 5.458 5.737 6.273 4.293 4.294 3.688

3.890 3.959 1.218 5.105 4.497 3.912 5.066 5.652 5.233 4.214 1.921 3.645

3.871 3.928 1.159 5.106 4.463 3.868 4.911 5.678 5.215 4.174 1.890 3.633

4.514 4.548

3.965 4.026

3.934 3.991

Vanilc with a radius of 60 pixels. An advantage is that it may quickly adapt to local image characteristics, so also mixed content like in screen shots can be processed efficiently. Another property of artificial images are repeating identical patterns with larger distances to each other, for example letters in text images or textures in artificial 3-D scenes. First, this distance requires large training regions to capture at least one repetition. In this case the number of training samples can be restricted to the ones with greatest weight to prevent noisy estimates. And finally, the neighborhood size for weight computation by matching can be made larger than the neighborhood size for prediction in order to find larger structures like letters. Note that the last two features for repeating patterns have not been used in our evaluation.

property of artificial images is the sparse histogram, i. e., not all possible intensity values are actually used, exploited for instance in indexed color image formats like GIF. Reasons can be a post-processing like contrast enhancement of natural images or that the image creator (artist for drawings, 3-D designer for renderings, programmer for screen content, etc.) did only choose a limited amount of colors. Since our other assumptions are mostly fulfilled, the predictive approach is still applicable. We suggest to simply set the probabilities for non-occurring intensities to zero after the distribution has been estimated. In the backward-adaptive framework these intensities are guessed from a causal image histogram such that no side information needs to be transmitted. The case of a wrong guess is catched by the regularizing distribution. Zero entries in the histogram are regarded as never occurring except if small-valued entries are found nearby. In the latter case there is a high chance that no systematic reason but only a low probability caused their absence such that the probability for these intensities should be kept. An ad-hoc technique to achieve the detection in Vanilc looks as follows: Replace all non-zero entries in the histogram function by their inverse and normalize this function such that its average is one. Afterwards, convolve it with a Gaussian kernel and keep probabilities at all positions where the function exceeds the smallest non-zero histogram entry. This scheme is applied once for the full causal image and once for a local region, e. g., in

0

C. Experimental Results For evaluation of coding efficiency, a number of test images have been encoded and decoded using Vanilc as well as several other freely available codecs. In order to use a wide range of different images, three test sets were chosen: The Waterloo test set [50] encompasses low-resolution images ranging from 256 × 256 pixels up to 1118 × 1105 pixels in gray and in color. It includes also several classic test images although they may differ to the widely used JPEG set which is not publicly available. More up-to-date high-resolution images ranging from 2268 × 1512 pixels to 7216 × 5412 pixels are comprised in Sachin Garg’s new test images corpus [51], also in gray and in color. In both of these test sets also several artificial images can be found. Finally, four 16-bit medical volume images were tested2 : A 723 × 557 × 117 voxel magnetic resonance imaging (MRI) brain acquisition, a 512 × 512 × 125 voxel computed tomography (CT) scan of a thorax, a 512×512×127 voxel cardiac CT volume which is part of a dynamic 3-D + t sequence with ten volumes, and finally ten temporal slices from this sequence at position z = 50, rearranged as a 512×512×10 voxel volume. 2 The

medical data was kindly provided by Siemens Healthcare. / artificially generated image 4 Supports only image sizes that are multiples of eight 5 Wrong reconstruction image 6 Image only available in gray 3 Non-natural

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

10

TABLE III C OMPRESSED RATE IN BITS / PIXEL FOR NEW TEST IMAGES [51], 8

BIT.

Gray

JPE

G

Color

G-

MR

Va Va Va n Va nil JPE nil nil c W ilc W JP Gr G c c ZP LS L 200 EG-L S D LS AQ aLIC LS P D S P 0

JPE

Va Va Va n Va nil nil nil c W ilc W Gr c c ZP L LS S D LS AQ aLIC LS P D P

Set

200

artificial3 big building big tree bridge cathedral deer fireworks flower foveon hdr leaves iso 200 leaves iso 1600 nightshot iso 100 nightshot iso 1600 spider web zone plate3

1.197 3.655 3.805 4.193 3.710 4.582 1.654 2.198 2.344 4.083 4.681 2.300 4.038 1.908 5.755

0.798 3.592 3.732 4.148 3.570 4.659 1.465 2.038 2.175 3.820 4.486 2.130 3.971 1.766 7.429

0.517 -4 -4 -4 3.260 -4 1.301 -4 1.854 3.400 4.186 1.839 3.743 1.349 2.834

0.673 4.335 4.413 4.725 4.239 4.728 1.555 2.464 2.589 4.743 5.260 2.576 4.268 2.364 5.943

-5 3.177 3.408 3.870 3.190 -5 1.250 -5 -5 3.263 4.072 1.824 3.661 -5 0.862

0.887 3.350 3.573 3.949 3.357 4.462 1.442 1.824 1.930 3.718 4.381 1.973 3.879 1.493 1.265

0.977 3.355 3.571 3.939 3.359 4.452 1.467 1.830 1.929 3.784 4.403 1.980 3.874 1.492 1.249

0.762 3.263 3.490 3.864 3.323 4.397 1.388 1.769 1.883 3.606 4.296 1.892 3.807 1.427 0.995

0.682 3.243 3.468 3.842 3.302 4.376 1.364 1.747 1.873 3.537 4.243 1.875 3.782 1.422 0.991

0.980 3.559 4.389 4.534 3.748 5.320 1.781 2.137 2.533 3.692 4.494 2.433 4.390 1.902 -6

0.774 3.812 4.310 4.528 3.774 5.223 1.521 2.031 2.458 3.917 4.635 2.195 4.202 1.814 -6

0.346 4.014 4.596 4.652 3.841 5.141 1.418 1.943 2.512 3.866 4.627 2.291 4.089 1.997 -6

0.316 3.129 3.786 4.118 3.083 4.784 1.265 1.372 2.110 3.071 3.802 1.829 3.463 1.342 -6

0.562 3.427 4.072 4.248 3.437 4.936 1.442 1.483 2.095 3.548 4.306 1.986 3.918 1.469 -6

0.589 3.430 4.074 4.235 3.442 4.924 1.464 1.486 2.089 3.596 4.322 1.993 3.913 1.465 -6

0.456 3.218 3.986 4.148 3.326 4.905 1.350 1.363 2.002 3.265 4.136 1.850 3.997 1.331 -6

0.410 3.203 3.959 4.124 3.305 4.882 1.337 1.352 1.989 3.208 4.093 1.842 3.967 1.327 -6

Whole Set Average

3.470 3.340

3.429 3.319

-

3.902 3.658

-

3.032 2.766

3.039 2.777

2.946 2.678

2.922 2.650

3.542 3.278

3.546 3.228

3.642 3.238

2.986 2.676

3.240 2.924

3.244 2.930

3.114 2.810

3.092 2.785

0

LS

P

From a large number of freely available lossless codecs, five representatives have been chosen for comparison: Kakadu 7.4 [52] is an implementation of the JPEG 2000 standard [13] based on wavelet transform coding which is widely used in practice, in particular for medical image coding. The very fast HP reference implementation [53] of JPEG-LS [18] applies a pixel-predictive approach and typically performs slightly better than JPEG 2000 for lossless compression. The open source codec MRP 0.5 [54], described in [25], is a modern and efficient forward-adaptive LS implementation. ZPAQ 6.19 [55] is a general purpose open source data archiver also capable of image coding using the bmp j4 configuration. After all, GraLIC (Graystone Lossless Image Compressor) 1.11 demo [56] is known to be among the coders with best overall compression performance. Unfortunately, neither source code, nor a technical description are available. These coders have been compared to four different configurations of Vanilc: First, a fast configuration using ordinary LS prediction from a neighborhood of nine pixels and 112 training positions. In color images, one additional neighborhood pixel was added for each of the previously coded channels, including the “dummy” channel. For entropy coding, normal distributions without sparsified histograms were assumed. Second, an efficient configuration using WLS prediction from a neighborhood of 13 pixels and 544 training positions. In color images, five additional cross-shaped neighborhood pixels were added per channel. For entropy coding, t-distributions with sparsified histograms were assumed. The variance estimate was computed using (20). These two configurations were evaluated using both the traditional variance estimation approach from observed prediction errors within a radius of 4 pixels (suffix “P” in the tables) and the proposed distribution estimation approach (suffix “D”). In all tests, an arithmetic coder with a Laplace distribution as a regularizing distribution was used. The settings for medical volume coding were similar, except

that neighborhood and training region were extended to 3D and no histogram sparsification was applied. In order to allow for a reproduction of the results, the original YAML configuration files are provided together with the source code of Vanilc. Program calls are given in the appendix. All testing was done on one core of an Intel Xeon E5-1620 3.70 GHz SUSE Linux machine with 32 GB random-access memory. Since GraLIC is only available as a binary Microsoft Windows executable, Wine [57] was used to run it on Linux. The compression results for the different data sets in bpp are given in Tables I–IV. For color images the values are to be understood as bits per pixel per color channel. Note that Whole Set shows the bpp values for the overall data set whereas Average are the means among previous table entries. MRP and GraLIC were not able to encode all given input images: MRP can only compress gray images with width and height being multiples of eight. GraLIC could not read the Waterloo gray images and even after replacing their header information by more conventional ones, all reconstruction images except for bridge were corrupted. This occurred also with some other gray images, so GraLIC works only reliably on color images. For natural images, for which the name in the first column of the tables is not equipped with a footnote mark, it can be seen that the established standards JPEG 2000 and JPEGLS perform very similar. Because of its missing inter-channel prediction, JPEG-LS tends to compress more efficiently only on gray images. However, for the majority of images both cannot compete with more recent codecs like MRP, GraLIC, or Vanilc even in its faster LS version. ZPAQ as a general data compressor seems to be not much optimized for natural images and often provides comparable or worse results with respect to JPEG 2000 and JPEG-LS. Compared to Vanilc WLS, MRP reaches slightly better compression ratios on the new test images, but slightly worse results on the waterloo test set. Even so, in this case for the color images, GraLIC reaches slightly

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

11

V. C ONCLUSION

TABLE IV C OMPRESSED RATE IN BITS / PIXEL FOR MEDICAL VOLUMES .

JPE

G

JPE

G-

Va ni

lc

LS

Va ni

lc

Va ni LS

lc

Va ni

lc

WL

WL

Set

200

brain MRI thorax CT cardiac CT temporal CT

3.524 5.489 4.156 4.445

1.623 5.522 5.089 5.168

1.120 4.930 3.300 3.506

1.146 4.926 3.288 3.504

1.074 4.884 3.077 3.297

1.100 4.883 3.063 3.285

Whole Set Average

4.282 4.379

3.803 4.241

2.879 3.147

2.885 3.150

2.778 3.022

2.784 3.023

0

LS

P

D

SP

SD

better ratios on the new test images, but slightly worse on the waterloo set. For most images the Vanilc LS rate is a bit larger than with WLS prediction. It is also shown that the rates for both variance estimation methods in Vanilc are very similar. For ordinary LS they tend to be up to half a percent greater with the proposed approach due to the higher non-stationarity within the larger region used for variance estimation. This nonstationarity does not matter much for WLS, so the proposed method reduces the rate by about one percent here. On non-natural images, JPEG-LS performs generally better than JPEG 2000 even if JPEG 2000 could be slightly tuned here by turning off the frequency decomposition (Clevels=0 in Kakadu; thanks go to a reviewer for this hint). In the same way Vanilc WLS, including the sparsified histograms, outperforms Vanilc LS significantly. Considering the most efficient methods, ZPAQ, GraLIC, MRP, and Vanilc WLS, the results are very different depending on the image content: To see this, compare for example the bpp numbers for the images squares, france, clegg, and zone plate. Overall, ZPAQ seems to provide really good results in this discipline, so there is still room for improvement for the heuristic histogram sparsification method in Vanilc. Medical volume images are typically stored with a raw intensity resolution of 16 bpp. From the compared codecs, only JPEG 2000, JPEG-LS, and Vanilc support this bit depth. Moreover, with JPEG-LS each slice from a volume had to be compressed separately since it cannot handle 3-D volumes. This is why the JPEG-LS data rate was rather high for the cardiac CT data set, which features noticeable inter-slice correlation. On the other hand, JPEG 2000 was not able to exploit the large black background in the brain MRI volume. However, the applied JPEG 2000 implementation was able to exploit inter-slice correlations using 3-D transforms. Overall, even the faster Vanilc LS method achieves rate savings of 24% over JPEG-LS and 33% over JPEG 2000. For WLS these become 27% and 35%, respectively. Finally, a short look on computation time: With Vanilc WLS it takes about one minute to compress a gray 512 × 512 pixel image. The same image can be compressed in about one and a half second by Vanilc LS. Coding and decoding of the whole new test images set took about one day with Vanilc WLS, six hours with MRP, 39 minutes with Vanilc LS, 16 minutes with ZPAQ, three minutes with GraLIC, 22 seconds with JPEG 2000, and 13 seconds with JPEG-LS.

Based on the fairly general finite-order autoregressive image model, assuming a multivariate normal distribution and local stationarity, it was shown that the LS predictive model may provide a prediction error sequence with least zeroorder entropy. By refining the assumptions on local stationarity characteristics in images and relaxing the requirement of constant innovation process variance, it was possible to derive the WLS approach using a new weighting function that generalizes ad-hoc schemes from literature. Further on, statistical methods from econometrics were adopted to arrive at the t model distribution where also an efficient scheme for variance estimation was given. For its enhanced WLS version, a refined weighting function was proposed in order to cope with the non-stationarity characteristics in images. Advantages comprise a more precise variance estimate, no need for an additional engineering parameter, and intrinsic adaptation to different neighborhoods and training regions, e. g., in border regions. For some applications like if a first part of the image is already known to the decoder or for the method introduced in [35], another advantage is that the variance estimate does not depend on the previous knowledge of prediction error values that require preliminary pixel predictions within a neighborhood. In general, the proposed distribution estimation approach could be incorporated to any backward-adaptive LS codec. Hereafter, with Vanilc an example open source coder implementation was presented in order to compare the compression efficiency of the described methods. A comparison to some of the currently most efficient coding schemes proves its state-of-the-art compression performance. To the best of our knowledge, currently there is no other backward-adaptive codec available which can achieve better overall compression efficiency. In contrast to most other state-of-the-art codecs, it can handle 16-bit images, 3-D volume images, gray and color images, natural and artificial images, as well as various types of input image file formats. Being a platform-independent open source C++ project based on OpenCV with a comprehensive selection of backward-adaptive coding features, it encourages further contributions to the field of lossless compression so as to further approach image entropy. The strategy of extending the pixel neighborhood to neighboring color channels or volume slices allows to exploit a fairly wide range of inter-image dependencies without special parameter settings. Thus, the Vanilc framework is often also applicable when other types of similar images shall be compressed like in videos, multiview images, depth maps, enhancement layers, near-infrared, or hyperspectral data. However, there is still room for enhancements: Histogram sparsification could be refined, the image scan-order could be chosen adaptively, prediction order could be selected pixel-wise, e. g., based on minimum description length [32], [58], a nearlossless method could be incorporated [35], parallelization could be realized by using a wave-front encoding strategy, and finally the program could be speeded up by using problemspecific Cholesky updates [59].

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

A PPENDIX E XAMPLE E NCODER C OMMAND - LINES FOR E VALUATION Vanilc LS P 2-D: ./vanilc -c=../configs/config_fast.yml --variance=RESIDUAL -i=image.ppm -b=image.vanilc Vanilc LS D 2-D: ./vanilc -c=../configs/config_fast.yml -i=image.ppm -b=image.vanilc Vanilc WLS P 2-D: ./vanilc --variance=RESIDUAL --neighborhood_buffer=0 -i=image.ppm -b=image.vanilc Vanilc WLS D 2-D: ./vanilc -i=image.ppm -b=image.vanilc Vanilc LS D 3-D: ./vanilc -c=../configs/config_3d.yml -i=volume.rawl -b=volume.vanilc -d=16x512x512x125 Vanilc WLS D 3-D: ./vanilc -d=16x512x512x125 -c=../configs/config_efficient_3d.yml -i=volume.rawl -b=volume.vanilc JPEG 2000: ./kdu_compress -num_threads 0 -i image.ppm -o image.j2c Creversible=yes -rate -,1,0.5,0.25 JPEG 2000 3-D: ./kdu_compress -num_threads 0 -i "volume.rawl*125@524288" -o "volume.jpx" -jpx_layers "*" -jpx_space sLUM Creversible=yes Sdims=\{512,512\} Clayers=16 Mcomponents=125 Msigned=no Mprecision=12 Sprecision=12,12,12,12,12,13 Ssigned=no,no,no,no,no,yes Mvector_size:I4=125 Mvector_coeffs:I4=2048 Mstage_inputs:I25=\{0,124\} Mstage_outputs:I25=\{0,124\} Mstage_collections:I25=\{125,125\} Mstage_xforms:I25=\{DWT,1,4,3,0\} Mnum_stages=1 Mstages=25 JPEG-LS 8 bit: ./locoe image.ppm -oimage.jls JPEG-LS 16 bit: ./loco16e image.ppm -oimage.jls MRP: ./encmrp image.pgm image.mrp ZPAQ: ./zpaqd cnist bmp_j4 image.zpaq image.ppm GraLIC: wine Gralic111d.exe c image.gra image.ppm

ACKNOWLEDGMENT The research leading to these results has received funding from the European Union’s Seventh Framework Programme ([FP7/2007-2013]) under grant agreement n◦ 288502 (CONCERTO).

R EFERENCES [1] ISO/IEC JTC 1/SC 29/WG 11, “Requirements for an extension of HEVC for coding of screen content,” San Jose, USA, Jan. 2014, doc. N14174. [2] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 337–343, May 1977. [3] K. G. Morse, Jr., “Compression tools compared,” Linux J., vol. 2005, no. 137, p. 3, Sep. 2005. [4] M. V. Mahoney, “Adaptive weighing of context models for lossless data compression,” Florida Tech. Technical Report, vol. CS-2005-16, 2005. [5] J. G. Cleary and I. Witten, “Data compression using adaptive coding and partial string matching,” IEEE Trans. Commun., vol. 32, no. 4, pp. 396–402, Apr. 1984. [6] G. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consum. Electron., vol. 38, no. 1, pp. xviii–xxxiv, Feb. 1992. [7] F. Dufaux et al., “The JPEG XR image coding standard,” IEEE Signal Process. Mag., vol. 26, no. 6, pp. 195–199, 204–204, Nov. 2009. [8] T. Wiegand et al., “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, 2003. [9] G. J. Sullivan et al., “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Sep. 2012. [10] D. Mukherjee et al., “The latest open-source video codec VP9 - an overview and preliminary results,” in Proc. Picture Coding Symp., Dec. 2013, pp. 390–393. [11] V. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Process. Mag., vol. 18, no. 5, pp. 9–21, Sep. 2001.

12

[12] A. N. Netravali and J. O. Limb, “Picture coding: A review,” Proc. IEEE, vol. 68, pp. 366–406, Mar. 1980. [13] A. Skodras et al., “The JPEG 2000 still image compression standard,” IEEE Signal Process. Mag., vol. 18, no. 5, pp. 36–58, Sep. 2001. [14] G. Sullivan et al., “Standardized extensions of high efficiency video coding (HEVC),” IEEE J. Sel. Topics in Signal Process., vol. 7, no. 6, pp. 1001–1016, Dec. 2013. [15] Q. Cai et al., “Lossy and lossless intra coding performance evaluation: HEVC, H.264/AVC, JPEG 2000 and JPEG LS,” in Proc. Signal Inform. Process. Assoc. Annu. Summit and Conf., Dec. 2012, pp. 1–9. [16] K. Kim et al., “Improvement of implicit residual DPCM for HEVC,” in Proc. Int. Conf. Signal-Image Technol. and Internet-Based Syst., Nov. 2014, pp. 652–658. [17] X. Wu and N. Memon, “Context-based, adaptive, lossless image coding,” IEEE Trans. Commun., vol. 45, no. 4, pp. 437–444, Apr. 1997. [18] M. J. Weinberger et al., “The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS.” IEEE Trans. Image Process., vol. 9, no. 8, pp. 1309–1324, Jan. 2000. [19] A. Avramovic and B. Reljin, “Gradient edge detection predictor for image lossless compression,” in Proc. ELMAR, Sep. 2010, pp. 131–134. [20] G. E. Box et al., Time series analysis: forecasting and control, 4th ed. John Wiley & Sons, Jul. 2008. [21] B. Atal and M. Schroeder, “Adaptive predictive coding of speech signals,” The Bell System Tech. J., vol. 49, no. 8, pp. 1973–1986, Oct. 1970. [22] R. Stone, The measurement of consumers’ expenditure and behaviour in the United Kingdom, 1920-1938. University Press, 1954. [23] E. Delp et al., “Image modelling with a seasonal autoregressive time series with applications to data compression,” in Proc. IEEE Conf. Patt. Recog. Image Process., 1978, pp. 100–104. [24] B. Meyer and P. Tischer, “TMW - a new method for lossless image compression,” in Picture Coding Symp., Sep. 1997. [25] I. Matsuda et al., “Lossless coding using variable block-size adaptive prediction optimized for each image,” in 13th European Signal Process. Conf., Sep. 2005, pp. 30–33. [26] J. M. Santos et al., “Contributions to lossless coding of medical images using minimum rate predictors,” in Proc. IEEE Int. Conf. Image Process., Sep. 2015. [27] N. Kuroki et al., “Lossless image compression by two-dimensional linear prediction with variable coefficients,” IEICE Trans. Fundamentals Electron. Commun. Comput. Sci., vol. E75-A, no. 7, pp. 882–889, Jul. 1992. [28] X. Li and M. T. Orchard, “Edge-directed prediction for lossless compression of natural images,” IEEE Trans. Image Process., vol. 10, no. 6, pp. 813–817, Jun. 2001. [29] B. Meyer and P. Tischer, “Glicbawls - grey level image compression by adaptive weighted least squares,” in Proc. Data Compression Conf., Snowbird, UT, Mar. 2001, p. 503. [30] L.-J. Kau and Y.-P. Lin, “Least-squares-based switching structure for lossless image coding,” IEEE Trans. Circuits and Syst. I: Reg. Papers, vol. 54, no. 7, pp. 1529–1541, Jul. 2007. [31] J. Taquet and C. Labit, “Hierarchical oriented predictions for resolution scalable lossless and near-lossless compression of CT and MRI biomedical images.” IEEE Trans. Image Process., vol. 21, no. 5, pp. 2641–2652, May 2012. [32] X. Wu et al., “Adaptive sequential prediction of multidimensional signals with applications to lossless image coding,” IEEE Trans. Image Process., vol. 20, no. 1, pp. 36–42, Jan. 2011. [33] H. Ye et al., “A weighted least squares method for adaptive prediction in lossless image compression,” in Proc. Picture Coding Symp., Sep. 2003, pp. 489–493. [34] G. Ulacha and R. Stasiski, “Enhanced lossless image coding methods based on adaptive predictors,” in Int. Conf. Syst. Signals Image Process., vol. 10, 2010, p. 1. [35] A. Weinlich et al., “Near-lossless compression of computed tomography images using predictive coding with distortion optimization,” in Proc. SPIE, vol. 8669, 2013, pp. 86 691G–86 691G. [36] J. W. Hooper and A. Zellner, “The error of forecast for multivariate regression models,” Econometrica, vol. 29, no. 4, pp. 544–555, Oct. 1961. [37] A. Weinlich, “Volumetric, artificial, and natural image lossless coder,” Feb. 2015. [Online]. Available: https://github.com/siemens/vanilc [38] J.-F. Cardoso, “Dependence, correlation and Gaussianity in independent component analysis,” J. Mach. Learn. Res., vol. 4, pp. 1177–1203, Dec. 2003.

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2522339, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. ?, NO. ?, ? 2016

[39] A. Weinlich et al., “Massively parallel lossless compression of medical images using least-squares prediction and arithmetic coding,” in Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 1680–1684. [40] X. Wu et al., “Piecewise 2D autoregression for predictive image coding,” in Proc. IEEE Int. Conf. Image Process., Nov. 1998, pp. 901–904. [41] R. F. Potthoff et al., “Equivalent sample size and equivalent degrees of freedom refinements for inference using survey weights under superpopulation models,” J. of the American Statistical Assoc., vol. 87, no. 418, pp. 383–396, Jun. 1992. [42] W. H. Greene, Econometric analysis, 5th ed. Upper Saddle River: Pearson Education Inc., 2003. [43] E. Lam and J. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. [44] S. L. Thomas and R. H. Heck, “Analysis of large-scale secondary data in higher education research: Potential perils associated with complex sampling designs,” Research in Higher Educ., vol. 42, no. 5, pp. 517– 540, Oct. 2001. [45] E. Wige et al., “Pixel-based averaging predictor for HEVC lossless coding,” in Proc. IEEE Int. Conf. Image Process., Sep. 2013, pp. 1806– 1810. [46] G. Bradski, “The OpenCV library,” Doctor Doobs J., vol. 25, no. 11, pp. 120–126, 2000. [47] O. Ben-Kiki et al., “YAML ain’t markup language (YAML),” Tech. Rep., 2009. [Online]. Available: http://www.yaml.org/spec/1.2/spec.html [48] I. Matsuda et al., “Lossless coding of color images using block-adaptive inter-color prediction,” in Proc. IEEE Int. Conf. Image Process., vol. 2, Sep. 2007, pp. 329–332. [49] X. Wu and N. Memon, “Context-based lossless interband compressionextending CALIC,” IEEE Trans. Image Process., vol. 9, no. 6, pp. 994– 1001, Jun. 2000. [50] “Image repository of the university of waterloo.” [Online]. Available: http://links.uwaterloo.ca/Repository.html [51] Sachin Garg, “The new test images,” 2011. [Online]. Available: http://www.imagecompression.info/test images/ [52] David Taubman, “Kakadu JPEG2000 codec,” May 2014. [Online]. Available: http://kakadusoftware.com/ [53] Hewlett-Packard Company, “HP labs LOCO-I/JPEG-LS,” 1999. [Online]. Available: http://www.hpl.hp.com/research/info theory/loco/ [54] Ichiro Matsuda, “MRP 0.5 lossless image codec,” Jan. 2005. [Online]. Available: http://itohws03.ee.noda.sut.ac.jp/˜matsuda/mrp/ [55] Matt Mahoney and Jan Ondrus, “ZPAQ - incremental journaling backup utility and archiver,” Jan. 2013. [Online]. Available: http://mattmahoney.net/dc/zpaq.html [56] Alex Rhatushnyak, “GraLIC 1.11 demo,” Jul. 2011. [Online]. Available: http://www.imagecompression.info/gralic/Gralic111d.zip [57] “WineHQ - runs Windows applications on Linux.” [Online]. Available: http://www.winehq.org/ [58] M. Weinberger et al., “Applications of universal context modeling to lossless compression of gray-scale images,” IEEE Trans. Image Process., vol. 5, no. 4, pp. 575–586, Apr. 1996. [59] A. Martchenko and G. Deng, “Fast 2D sliding window least-squares prediction using cholesky updates,” IEEE Trans. Signal Process., to be published.

Andreas Weinlich received his Dipl.-Ing. degree in Systems of Information and Multimedia Technology from Friedrich-Alexander Universit¨at ErlangenN¨urnberg, Germany, in 2009. From 2009 to 2014 he was holding a position as a research assistant at the Chair of Multimedia Communications and Signal Processing of the University of Erlangen-N¨urnberg, Germany and worked in a research project with Siemens Corporate Technology in the Imaging and Computer Vision department. Since 2015 he is working for Continental AG, Villingen-Schwenningen in the research and development department as software project lead on camerabased systems. He is currently working towards the Dr.-Ing. degree. His research interests cover image and video signal processing with a focus on multi-dimensional medical image representation and coding. On these topics he has authored several papers and patent applications.

13

Peter Amon received his diploma (Dipl.Ing.) degree in electrical engineering from Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Germany, in 2001, where he specialized in communications and signal processing. In 2001, he joined Siemens Corporate Technology, Munich, where he is currently working as a Senior Research Scientist in the Sensing and Industrial Imaging department. His research interests include video coding and transmission, image/video processing and analytics, and future internet technologies. In these areas, he published several conference and journal papers. Since 2003, he actively contributes to the standardization at ISO/IEC MPEG, where he worked on scalable video coding and the respective storage format as well as High Efficiency Video Coding (HEVC). He has been actively involved (e.g., as work package leader) in several European research projects. He also serves as reviewer and member of the technical program committee for several conferences and journals in his field.

Andreas Hutter (M’95) received his diploma and Dr.-Ing. degrees in communications engineering from the Technische Universit¨at M¨unchen (TUM) in 1993 and 1999, respectively. From 1993 to 1999 he worked as a research assistant at the Institute for Integrated Circuits at TUM where he researched on algorithms for video coding and on the implementation of multimedia systems for mobile terminals. He joined Siemens Corporate Technology in 1999, where he is currently leading a research group in the sensing and industrial imaging department. He was an active member of MPEG from 1995 to 2006 where he contributed to the MPEG-4, the MPEG-7 and the MPEG-21 standards and acted as HoD (Head of Delegation) of the German National Body at MPEG. His research interests include image and video coding, transmission and visualization as well as video analytics.

Andr´e Kaup (M’96–SM’99–F’13) received the Dipl.-Ing. and Dr.-Ing. degrees in electrical engineering from Rheinisch-Westf¨alische Technische Hochschule (RWTH) Aachen University, Aachen, Germany, in 1989 and 1995, respectively. He was with the Institute for Communication Engineering, RWTH Aachen University, from 1989 to 1995. He joined the Networks and Multimedia Communications Department, Siemens Corporate Technology, Munich, Germany, in 1995 and became Head of the Mobile Applications and Services Group in 1999. Since 2001 he has been a Full Professor and the Head of the Chair of Multimedia Communications and Signal Processing, University of ErlangenNuremberg, Erlangen, Germany. From 1997 to 2001 he was the Head of the German MPEG delegation. From 2005 to 2007 he was a Vice Speaker with the DFG Collaborative Research Center. He has authored more than 250 journal and conference papers and has over 50 patents granted and patent applications published. His research interests include image and video signal processing and coding, and multimedia communication. Dr. Kaup is a Voting Member of the IEEE Communications Society Multimedia Communications Technical Committee, a member of the German ITG, and a Fellow of the IEEE. He serves as an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY and was a Guest Editor for IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING. From 1998 to 2001 he served as an Adjunct Professor with the Technical University of Munich, Munich. He was a Siemens Inventor of the Year 1998 and received the 1999 ITG Award. He has received several best paper awards, including the Paul Dan Cristea Special Award from the International Conference on Systems, Signals, and Image Processing in 2013. His group won the Grand Video Compression Challenge at the Picture Coding Symposium 2013.

1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Probability Distribution Estimation for Autoregressive Pixel-Predictive Image Coding.

Pixelwise linear prediction using backward-adaptive least-squares or weighted least-squares estimation of prediction coefficients is currently among t...
805KB Sizes 0 Downloads 11 Views