{VQ x Wm(LI < ThLI)

(1)

ノ

﹁

曽

摩

つ

7

O

● 照・

■鴇ご

ノ

‑酸

(2)

Toru Yamada

DISSERTATION

TOKYO METROPOLITAN UNIVERSITY

MARCH, 2013

(3)

Contents

1 Introduction

1.1 Research Background ...

1.1.1 Video-Quality Monitoring in IPTV Services . . . . 1.1.2 Categorization of the Objective Video-Quality Esti-

mation ...

1.2 Scope of This Study ...

1.3 Outline of This Dissertation ...

1 1 1 4 7 9

2 End-to-End Video-Quality Estimation Based on Reduced-

Reference Model

2.1 Introduction ...

2.2 Proposed Method ...

2.2.1 Calculation of the Activity Values and Their Square Errors ...

2.2.2 Psychovisual Weightings for the Activity Difference

2.2.3 Calculation of a Provisional Video-Quality Score . . 2.2.4 Adjustment of the Video-Quality Score Based on Block-

iness Artifacts ...

2.2.5 Adjustment of the Score Based on Local Impairment

2.2.6 Bit-Rate Control for the Original-Video Information

2.3 Parameter Decisions and Experimental Results ...

2.4 Conclusion ...

13 13 15 15 19 20 20 22 23 26 29

3 Video-Quality Estimation at the Head-End Point without

Original Videos

3.1 Introduction ...

31 31 33

i

(4)

3.3 3.4

3.2.4 Blockiness-Level Estimation ...

3.2.5 Blur-Level Estimation ...

3.2.6 Subjective Video-Quality Estimation ...

Experimental Results ...

Conclusion ...

40 43 45 46 52

4 Reduced-Reference Video-Quality Estimation at Network

Nodes for Quality Degradation by Transmission Errors 4.1 Introduction ...

4.2 Context of This Chapter ...

4.3.1 Outline of the Proposed Method ...

4.3.2 Server Side (Information Extraction) ...

4.3.3 Client Side (PSNR Estimation) ...

4.4 Experimental Results ...

4.4.1 Experimental Conditions ...

4.4.2 Decision for the Number of the Divided Regions . . 4.4.3 Comparisons with Conventional Methods ...

4.5 Conclusion ...

53 53 56 59 59 61 65 66 66 67 73 77

5 Video-Quality Estimation at the End-User Point without Any Original-Video Information79

5.1 Introduction ...79

5.2 Context of This Chapter ...82 5.3 Proposed Method ...85

5.3.1 Detecting Impairment Macroblocks by Analyzing Cod- ing Dependency ...85

5.3.2 Evaluation of Error-Concealment Effectiveness Us- ing Motion Information ... .. . . . . . . . 85

5.3.3 Evaluation of Error-Concealment Effectiveness Us-

ing Luminance Discontinuity at Impairment-Macroblock

Boundaries ...88

11

(5)

6

5.3.4 MSE Estimation by the Number of Error-Concealment-

Ineffective Macroblocks ...

5.4 Experimental Results ...

5.4.1 Experimental Conditions ...

5.4.2 Parameter Decision ...

5.4.3 Performance Evaluation ...

5.5 Conclusion ...

Conclusions

89 91 91 91 95 102 103

(6)

1.1

1.2 1.3

Monitoring points for IPTV services described in ITU-T G.1081. ...

Categorization of objective video-quality estimation approach Video-quality-estimation methods proposed in this dissertation. ...

es.

3

9 5

2.1 Video-quality-estimation method proposed in Chapter 2. . 2.2 Video-quality estimation based on activity difference. . . . 2.3 Pixel and activity values used for calculating the blockiness

level...

2.4 Partial-bit transmission for the activity value. ...

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 4.1 4.2 4.3 4.4

Video-quality-estimation method proposed in Chapter 3. .

Activity-difference calculation in the RR model [45_. . . . .

Activity-difference calculation in the proposed method. . . Diagram of the proposed method...

HF value of each frame ( " bus , MPEG-2, 4 Mbps). . . .

Information for the blockiness-level estimation. ...

Blur-level estimation with edge widths...

Scatter plot of the activity difference. (Training set) . . . . Scatter plot of the proposed method. (Training set) . .

Scatter plot of the proposed method. (Test set) ...

Video-quality-estimation method proposed in Chapter 4. . Framework of the RR model in conventional methods. . . . Framework of the RR model in the proposed method. . . . A diagram of the information extraction at a server side. .

14 17

25 25 32 34 35 37 38 42 44 48 49 50 55 57 58 59

(7)

4.5

4.6 4.7 4.8 4.9

4.10 4.11 4.12 4.13 4.14

4.15 4.16 4.17

Frame with a representative-luminance value indicated. (" 32"

is the representative-luminance value.) ...

Representative-luminance map for the frame shown in Fig- ure 4.5. ...

Transmitted information in the proposed method...

An example of the number of blocks which include a partic- ular luminance value...

Frame with multiple representative-luminance values indicated. ...

An example of block subsampling for bit-rate reduction. . Average bit rate of the extracted information for every subsampling pattern...

PSNR-estimation accuracy regarding frame-division patterns.

An example of the extracted information in the 4 x 4 divi-

sion case. ...

PSNR-estimation-accuracy comparisons with conventional methods. ...

Comparisons of correlation coefficients at low bit rates. . . Comparisons of RMSE at low bit rates...

Comparisons of the number of extracted pixels in a frame.

61

62 62 63

64 64

68 70 71

74 75 76 76 5.1

5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9

5.10

Video-quality-estimation method proposed in Chapter 5. . Framework of the typical NR model. ...

Framework of the NR model in the proposed method. . . . An example of impairment macroblocks caused by transmission errors. ...

Pixels along a boundary of an impairment macroblock. . . Relation between threshold Thmv and correlation coefficients.

Relation between threshold ThL and correlation coefficients.

The number of the impairment macroblocks vs. MSE (Train- ing set, Correlation coefficient: 0.86). ...

The number of the error-concealment-ineffective macroblocks

vs. MSE (Training set, Correlation coefficient: 0.94). . . . The number of the impairment macroblocks vs. MSE (Test set, Correlation coefficient: 0.87)

81 83 84

86 89 92 93

96 97

98

vi

(8)

5.13 5.14

Packet-loss ratio and correlation coefficient. ...

Comparison of average decoding time. ...

101 101

vii

(9)

List of Tables

2.1 2.2 2.3 2.4 3.1 3.2 3.3 4.1 4.2 5.1

Subjective video-quality test conditions. ...

Parameters for the weighting operations. ...

Experimental results for training set. ...

Experimental results for test set. ...

Subjective video-quality test conditions. ...

Functions for the score adjustments. ...

Experimental results for the test set. ...

Conditions for the experiments. ...

Data size and bit rate for the representative-luminance values . Experimental conditions...

26 27 28 28 46 47 51 67 72 94

ix

(10)

Introduction

1.1 Research Background

1.1.1 Video-Quality Monitoring in IPTV Services

With the rapid expansion of broadband network services, video-transmission

services using the Internet Protocol (IP) network, such as Video-on-Demand (VOD) or video-sharing services, become popular. Especially, it is pre- dicted that TV broadcasting services by the IP transmission (IPTV) will

be widely spread out all over the world. Generally, video data in IPTV

services are transmitted by the User Datagram Protocol (UDP) which does

not require re-transmission in order to implement realtime broadcasting.

Therefore, data losses due to no re-transmissions result in video-quality degradation. For IPTV services, it is expected to have at least equivalent video quality to the conventional broadcasting services, such as a terres-

trial digital broadcasting or a broadcast-satellite (BS) digital broadcasting.

Therefore, video-quality monitoring is one of the most important concern.

Conventionally, the video-quality monitoring is conducted by human observers before the actual broadcasting or when an end-user terminal re- ceives the video data. The video-quality monitoring by the human observers suffers from quality-check leakage in addition to the expensive human-labor cost. Therefore, automatic and accurate video-quality monitoring techniques are expected. From IPTV service viewpoints, the moni-

1

(11)

2 CHAPTER 1. INTRODUCTION

toring points should be at the broadcasting stations, on the network paths, and at the end-user terminals. In addition, such techniques are especially expected for the video-archive systems where video-quality checks of a large number of video contents are required. Therefore, this study aims

at a development of the objective video-quality estimation techniques for automatic and realtime video-quality monitoring in IPTV services.

In IPTV services, there are two factors of the video-quality degradation.

One is video-quality degradation by video coding (video,compression), and

the other is that by transmission errors. The quality-degradation factors which should be monitored depend on monitoring points such as content providers, service providers, network providers, and end-user terminals. In order to achieve video-quality monitoring at the various monitoring points, multiple video-quality estimation methods have to be developed.

The importance of the video-quality monitoring in IPTV services is widely recognized. International Telecommunication Union Telecommuni-

cation Standardization Sector (ITU-T), one of the specialized agencies in

the United Nations, defines video-quality monitoring locations for IPTV

services in its recommendation ITU-T G.1081 [1]. Generally, operations of

IPTV services are divided into multiple domains as shown in Fig.1.1. This is because each domain is often operated by different players or companies and video-quality monitoring at every interface point between adjacent domains can .be conducted for clarifying the responsibility of the players.

These monitoring functions can contribute the quick determination of the location where a problem occurs. The monitoring locations defined in ITU- T G.1081 are assumed to detect video-quality degradation at the following points.

PT1 : Quality check point when the content provider delivers the video contents to the service provider. Read-errors from tapes or disks are monitored. When the video contents were already coded by the content provider, video-quality degradation by the video coding is also monitored.

PT2: Quality check point before the broadcasting. Video-quality degradation by the video coding is monitored.

(12)

II I--- I I I C

ontent I II Service ProviderNetwork ProviderEnd I User

I ProviderIII

II II II I 1

II I Encoder/Service I II I

1 Content 1 1 Transcoder Server 1 II

I,....r4-73-I II a Home Gateway

I I (II '''' :IPrz---.::,Network [

1

-..•-,4,I1Network

I..--0---,---- M IiaL Cej _ I

1-- 11 1 1 1

II -JI 1I I Set Top Box

II I"vII IIIIII

IIIIII IIIIII

III ---aaill llt

Figure 1.1: Monitoring points for IPTV services described in ITU-T G.1081.

PT3

PT4

PT5

: Quality check point on the network paths. Video-quality degradation by the transmission errors is monitored. When transcoding (another video coding) is executed on a network path, video-quality degradation by the transcoding is also monitored.

: Quality check point before the delivery of video contents to the end users. It is monitored whether the video contents are successfully delivered to the end-users' houses or not.

: Quality check point for the video contents which are displayed on the end-user terminals. The quality depends on the performance of home networks and the end-user terminals.

(13)

4 CHAPTER I. INTRODUCTION

1.1.2 Categorization of the Objective Video-Quality Estimation

Objective video-quality estimation methods having a high correlation to subjective video quality have been discussed by the Video Quality Experts Group (VQEG) where experts from ITU-T and International Telecommu- nication Union Radiocommunications Sector (ITU-R) participate [2]. In ITU-T recommendation J.143 [3], objective video-quality estimation ap-

proaches can be categorized into following three types:

1) Full Reference (FR) models: evaluation of video quality by means of a comparison between an original video and a received video (Fig.1.2-1) . 2) No Reference (NR) models: evaluation of vi

of a received video alone (Fig.1.2-2).

deo quality on the basis

3) Reduced-Reference (RR) models:

both a received video and a small

from an original video (Fig.1.2-3).

evaluation of video quality using amount of information extracted

One of the simplest FR model is PSNR (Peak Signal-to-Noise Ratio).

PSNR is based on a pixel-by-pixel comparison and does not reflect the characteristic of human visual systems. Therefore, PSNR does not have high correlation with the subjective video quality. To achieve higher correlation, methods which treat a video frame as a structure have been studied

[4, 5, 6, 7]. These methods measure the change in luminance, contrast, and

structure in a video frame. Methods which consider the characteristic of

the human visual systems have also been studied 8, 9, 10, 11, 121. These methods achieve higher correlation with the subjective video quality by exploiting the characteristic of human visual systems. For international standards, methods based on the FR model have been described in ITU- T recommendations J.144 [13], J.247 [14], and J.341 [15. The FR model

achieves accurate video-quality estimation by exploiting the information of the original video, however, this model would not be suitable for realtime video-quality monitoring in IPTV service operations since almost all of the monitoring points described in Fig.1.1 would not be able to refer to original video data on the spot.

(14)

1) FR model

—Network*

End-User Original Server Terminal Video

--- Network ---> Quality Estimation

2) NR model

— Network –1>

End-User O riginal ServerTerminal

Video

Quality

Estimation

3) RR model

Original Video

Original Video Info. Extraction

End-User Terminal

Quality Estimation

Received Video Estimated Quality

Received Video

Estimated Quality

Received Video

Estimated Quality

Figure 1.2: Categorization of objective video-quality estimation approaches.

With regard to the RR model, since it transmits feature parameters extracted from the original video to end-users at low bit rates and it is not necessary to transmit the original video itself, it is suitable for realtime video-quality monitoring in IPTV service operations. In addition, it is expected to achieve more accurate quality estimation than the NR model because of exploiting the extracted information of the original video. Some methods based on the RR model have been described in specific terms

in [16, 17, 18, 19], and ITU-T recommendations, J.240 20, J.246 21], J.249 i22], and J.342 23]. These methods are designed in order to esti-

mate video-quality degradation due to both video coding and transmission

(15)

errors. When the information is extracted from video data which have already been compressed, video-quality degradation by transmission errors is only estimated.

The NR model is also suitable for realtime video-quality monitoring since it does not use the information of the original video. Besides, since it does not need to transmit and receive the extracted information of the original video as the RR model needs, a monitoring system would relatively be simple. However, it is difficult to distinguish videb-quality degradation from features of the video itself. For this reason, it is difficult for the NR model to achieve accurate video-quality estimation. International standards based on the NR model have not been established yet. Some methods

based on the NR model previously proposed in [24, 25, 26, 27, 28] do not estimate overall subjective video quality but estimate the degree of blockiness

or blur. ITU-T recommendation J.147 [29] and [30] present methods for

inserting invisible markers into the original video and determining degradation of the invisible markers at end-user terminals. Unfortunately, the insertion itself of invisible markers can lead to video-quality degradation.

To achieve accurate quality estimation for the NR model, methods which

use bitstream information have been studied [31, 32, 33, 34, 35]. These methods employ Discrete Cosine Transform (DCT) coefficients or quanti-

zation parameters to improve quality-estimation accuracy. Such methods, however, depend on the video-compression algorithms and can only be used for video sequences using the specific video-compression algorithm.

Other methods capable of dealing with transmission-error-caused degra-

dation have been proposed in [36, 37, 38], and ITU-T recommendations, P.1201 [39] and P.1202 [40]. These methods use only packet-header infor-

mation and do not take into account the media-layer information. There- fore, the accuracy of quality estimation would likely be low for the various types of video contents.

(16)

1.2 Scope of This Study

As discussed in the previous section, the main purpose of this study is to develop objective video-quality estimation methods for automatic and realtime video-quality monitoring in IPTV services. This dissertation proposes video-quality estimation methods based on the RR and NR models since they are suitable for realtime monitoring by IPTV service providers.

First, a method based on the RR model is proposed in order to estimate video-quality degradation by both video coding and transmission errors. In this method, the information of the original video is extracted before the video coding. This information is transmitted to the end-user terminals.

Video quality is evaluated on network paths or end-user terminals. When videos are coded by the content providers, the information extraction must be conducted by the content providers.

Some content providers, however, may not conduct this procedure and it is possible that video contents without the extracted information are delivered. In this case, when the IPTV service provider checks the video- quality degradation by the video coding, quality evaluation by the NR model is required since the service provider cannot refer to the original video. As the second proposal, a video-quality-estimation method based on the NR model is introduced in order to estimate video-quality degradation by the video coding.

With this NR method, the service provider can monitor video-quality degradation by the video coding. Therefore, a method to monitor that by the transmission errors is next required. A method based on the RR model is proposed for this purpose. A small amount of information is extracted from a compressed video. Video-quality degradation by the transmission errors is estimated with this information.

Nevertheless, there are some cases where it is difficult to transmit the extracted information. For example, there is a case where an IPTV service has already been operated and it is impossible to modify its transmission system. For such cases, an NR-based method for monitoring video-quality degradation by the transmission errors is also proposed. In this method, effectiveness of error concealment which an end-user terminal applies to video-frame regions damaged oy the transmission errors is analyzed for

(17)

video-quality estimation.

The proposed video-quality-estimation methods enable realtime video- quality monitoring in IPTV service at all monitoring locations defined in Fig.1.1.

(18)

Chapter 2:

RR model for Quality Degradation by both Video Coding and Transmission

Errors

IContent;Service Provider I Network Provider~I

IEnd User

IProvider1 11 'E 1I I I ncoder/ Service I I

Content (ITranscoder ServerII

1,,.,.- III 1 Network Network 1 --_.______

mri 1---I/11

1111P11.1 1 lamFams ihisqrsig

Chapter 3:

NR model for Quality Degradation by Video Coding

Chapter 4:

RR model for Quality Degradation by Transmission Errors

Chapter 5:

NR model for Quality Degradation by

Transmission Errors

Figure 1.3: Video-quality-estimation methods proposed in this dissertation.

1.3 Outline of This Dissertation

To summarize discussions in the previous section, the methods proposed in this dissertation can be described in Fig.1.3. This dissertation proposes four video-quality-estimation methods. Each method is proposed in the

following chapters of this dissertation. The dissertation is organized as follows:

Chapter 2 proposes a method based on the RR model. This method can evaluate quality degradation due to both the video coding and the transmission errors. As the extracted information of the original video, a value called "activity" which indicates a variance of luminance values is employed for every given-size pixel block. The activity values of the original video are transmitted to end-user terminals. At the network paths and the end-user terminals, the video quality of a received video is estimated

on the basis of the activity difference between the original video and the received video. Psychovisual weightings and video-quality score adjust-

ments for fatal degradations are applied in order to improve estimation

(19)

accuracy. In addition, low-bit-rate transmission for the extracted information is achieved by using temporal subsampling and by transmitting only the lower six bits of each activity value. With the extracted information

of 15 kbps for the standard definition television (SDTV), accurate video-

quality estimation is achieved. The correlation coefficient between actual subjective video quality and estimated quality is 0.901 in the case of the 15 kbps side information. This method has been adopted as an international

standard ITU-T recommendation J.249 Annex B [22].

Chapter 3 proposes a method based on the NR model. This method evaluates quality degradation due to the video coding. The proposed method does not need any bitstream information. Only pixel information of decoded video frames is used for the video-quality estimation. The activity values described in Chapter 2 are also employed. Firstly, the spatial- frequency information of the decoded video frames is analyzed to detect intra-coded frames which do not apply inter-frame prediction. Then, the activity difference between the intra-coded frame and its adjacent frame is calculated to estimate the amount of the quality degradation. In addition, a blockiness level and a blur level are estimated at every frame by analyzing only pixel information. The estimated blockiness level and blur level are taken into account to improve quality-estimation accuracy in the proposed method. The proposed method achieves accurate video-quality estimation without the original video which does not include any artifacts by the video compression. The correlation coefficient between subjective video quality and estimated quality is 0.925.

Chapter 4 proposes a method based on the RR model for monitoring problems in network paths. This method evaluates quality degradation due to the transmission errors. The method enables to extract more pix-

els from each frame than a conventional method does at low bit rates.

Specifically, the extracted information consists of representative-luminance values which are chosen from each frame and their position information.

The representative-luminance values chosen for individual video frames at the server side and the pixel-position information of the representative- luminance values are transmitted to end-user terminals. On the basis of

(20)

this information, PSNR values at an end-user side can be estimated. Ac- curate estimation for video-quality degradation by transmission errors is achieved with addition of this small amount of information. For SDTV, ac-

curate PSNR estimation (correlation coefficient of 0.92 to 0.95) is achieved

with small amount of additional information of 10 to 50 kbps. When quality degradation by only transmission errors are estimated, this method achieves more accurate quality estimation than the method proposed in Chapter 2 which take into account of quality degradation both the video coding and the transmission errors.

Chapter 5 proposes a method based on the NR model. This method evaluates quality degradation due to the transmission errors. The method is based on a hybrid of bitstream-information analysis and pixel-information

analysis. Video quality in terms of a mean square error (1VISE) between

degraded video frames and error-free video frames is estimated. With the proposed method, impairment macroblocks are accurately detected by bitstream-information analysis, and the effectiveness of error concealment for the impairment regions is evaluated using both the bitstream and the

decoded-pixel information. Error-concealment effectiveness is evaluated using motion information and luminance discontinuity at the boundaries

of impairment regions. Simulation results show a high correlation (correlation coefficients of 0.93) between the actual MSE and the number of

macroblocks in which error concealment has not been effective. A video decoder incorporated with the proposed method outputs accurately estimated MSE values.

Chapter 6 concludes the dissertation. With a combination of the proposed methods, accurate and realtime video-quality estimation in various

monitoring points for IPTV services can be executed. The proposed methods contribute stable IPTV service operations.

(21)

Chapter 2

End-to-End Video-Quality

Estimation Based on

Reduced-Reference Model

2.1 Introduction

In this chapter, a reduced-reference based video-quality-estimation method is proposed. The proposed method enables to estimate video-quality degra-

dation by both video coding and transmission errors. The proposed method can be used at all monitoring locations shown in Fig.2.1

With the proposed method, activity values for individual given-size pixel blocks, which indicates a variance of luminance values, are transmitted to end-user terminals. These values indicate spatial-frequency levels and are used as original-video information. Video quality is estimated on the basis of the activity difference between the original video and the received video.

Psychovisual weightings with respect to the activity difference and video- quality-score adjustments for fatal degradations are also applied to improve estimation accuracy. In addition, low-bit-rate transmission of the feature parameters is achieved by using temporal subsampling and by transmitting only the lower bits of each activity value. Since the proposed method does

not need spatial registration and gain/offset registration, it is suitable for

13

(22)

Chapter 2:

RR model for Quality Degradation by both Video Coding and Transmission Errors

I--- I--- 1 I ContentService Provider 1 1 Network Provider 1 I End User II I P

roviderI III

I I IIIEncoder/ Transcoder ServiceI Server 1I II C

ontent1II I I *7 \ '

1 .i_iII „'nrg-I NetworkNetwork

1r"° PT4Wim

,

it L

— — —---a

Figure 2.1: Video-quality-estimation method proposed in Chapter 2.

real-time quality monitoring.

The subsequent sections of this chapter are organized as follows: Section 2.2 describes the proposed algorithm for estimating subjective video quality using activity-difference values; Section 2.3 discusses an evaluation of the performance of the algorithm, and Section 2.4 summarizes this chapter.

(23)

2.2. PROPOSED METHOD 15

2.2 Proposed Method

The proposed method first calculates activity-difference values, and then psychovisual weightings and video-quality score adjustments are adapted one by one. In this section, a basic concept of the activity difference is described first, and then psychovisual weightings and video-quality score adjustments are explained.

2.2.1 Calculation of the Activity Values and Their Square Er-

rors

To calculate PSNR, it is necessary to calculate a mean square error

(MSE) value of luminance values between the original video and the re-

ceived video. Let Xk be a luminance value in a 16 x 16 pixel block of the original video, Yk be one of the received video in the same position with Xk, and ek be noise induced, i.e.,

Yk Xk ± ek•(2.1)

We now assume ek is independent from Xk and

E[ek] = 0,(2.2)

where E[] is a function to calculate an average value. From this assumption,

the following relation is obtained:

Y -= E[Yk] = E[Xk + ek] = EXk _ + E[ek_

= E[Xk_ = X,(2.3)

where X and Y is the average values of the luminance values in the blocks.

For an RR approach, since all pixels cannot be used, we must consider using less amount of information. For that, standard deviation of the

luminance values is introduced. Standard deviation cr(X) for each 16 x 16

pixel is defined as:

1 255

o-(Xk) = \/Var:Xk] = ----E

\ 256(Xk — X)2

k,_____0

E[(Xk — X)2, _(2.4)

(24)

where Var[] is a function to calculate a variance. Square error (SE) of

the standard deviation between the original video and the received video is calculated as:

SEa = (a(Xk) — g(Yk))2

(\/E[(Xk —X)2] — E[(Yk —17)21)2 -Var[Xd+ E[(Xk ek — X)2]

—2-VVar[Xk_ERXk ek X)2 -2Var[Xk] + 2E _(Xk — X)ek] E[e]

—2-VVar[Xk_ War Xkl Ejei]}

= 2V ar _Xk] E[e2k_

—2'\/Var[Xd{VarAk] E[e]1

= 2V ar _X k] E[q]

E[ei2,1

—2Var[Xk]1+ 1'(7

Var[Xkj.

Generally, since E[q]/Var[Xki is small enough in compressed video

sequences, SE of the standard deviation can be described as:

SEa 2Var[Xk] E-_e] — 2Var[Xk]

E[q].(2.6)

This shows that SE of the standard deviation can approximate MSE of the

luminance values in the case of E[qc /Var[Xk _ << 1 and can be used to es-

timate video quality since video sequences are satisfied with this condition in general.

For simplification of the SE calculation, the proposed method uses activity values instead of the standard deviation of the luminance values.

Activity values of both the original video (ActOrgi ,j) and the received video (ActDegi ,j) are calculated as:

1 255

ActOrgi ,i =---E - X ,(2.7)

256 k=„0

1 255

ActDegi ,i ----E 256 k=0 Y(2.8)

(25)

2.2. PROPOSED METHOD ₁₇

Original Video

ActOrgi 1

- Network ->

Video Bitstream

255

Exk_x

k=0

Network Original Video Info

Received Video

ActDeg 1 j . 256

1

255

Yk Y

k=0

Quality Estimation

Based on Activity-Difference

E1 (Actorg1 — ActDeg j)2

Figure 2.2: Video-quality estimation based on activity difference.

where i, j, k are a frame number, a block number in the frame, and a pixel position in the block, respectively. Video quality is estimated on the basis of the MSE between them. Figure 2.2 summarizes the activity-difference calculation in the proposed method. As shown in Figure 2.2, activity values for the original video are calculated at each block and then transmitted to the end-user terminal. At the end-user terminal, the activity values for the received videos are calculated. The square of the difference between the activity values is calculated as:

= (ActOrgij — ActDegi ,j)2(2.9)

Another block size can be used to calculate the activity values, however, since video codecs generally adopt 16 x 16 pixel block size for quantization process, this block size is preferable to detect impairment by the quantization. Besides, if a large block size is adopted, it would be difficult to detect local impairments by transmission errors. Therefore, the proposed method adopts 16x16 pixel block size.

For some video sequences, luminance gain control may be adapted to optimize a brightness level to display device. In conventional approaches

(26)

employing pixel-difference, pixel-difference values between the original and received video would be large by the gain control and the estimated video

quality would be low. By way of contrast, since gain-factor multiplied in

direct-current (DC) components is cancelled in the activity calculation, ac-

tivity difference is less affected by the gain control. The proposed method is accurately able to estimate video quality even for gain controlled video sequences. Besides, spatial shifts may also be adapted for some video sequences. For conventional approaches in the basis of pixel difference,

calculation for spatial registration is required. The proposed method does not need any spatial registration. This is because the square error is calculated based on the activity values that are more robust to spatial shifts than those based on the pixel values.