ノ
﹁
曽
摩
つ
7
O
● 照・
■鴇ご
ノ
‑酸
Toru Yamada
DISSERTATION
TOKYO METROPOLITAN UNIVERSITY
MARCH, 2013
Contents
1 Introduction
1.1 Research Background ...
1.1.1 Video-Quality Monitoring in IPTV Services . . . . 1.1.2 Categorization of the Objective Video-Quality Esti-
mation ...
1.2 Scope of This Study ...
1.3 Outline of This Dissertation ...
1 1 1 4 7 9
2 End-to-End Video-Quality Estimation Based on Reduced-
Reference Model
2.1 Introduction ...
2.2 Proposed Method ...
2.2.1 Calculation of the Activity Values and Their Square Errors ...
2.2.2 Psychovisual Weightings for the Activity Difference
2.2.3 Calculation of a Provisional Video-Quality Score . . 2.2.4 Adjustment of the Video-Quality Score Based on Block-
iness Artifacts ...
2.2.5 Adjustment of the Score Based on Local Impairment
2.2.6 Bit-Rate Control for the Original-Video Information
2.3 Parameter Decisions and Experimental Results ...
2.4 Conclusion ...
13 13 15 15 19 20 20 22 23 26 29
3 Video-Quality Estimation at the Head-End Point without
Original Videos
3.1 Introduction ...
3.2 Proposed Method ...
31 31 33
i
3.3 3.4
3.2.4 Blockiness-Level Estimation ...
3.2.5 Blur-Level Estimation ...
3.2.6 Subjective Video-Quality Estimation ...
Experimental Results ...
Conclusion ...
40 43 45 46 52
4 Reduced-Reference Video-Quality Estimation at Network
Nodes for Quality Degradation by Transmission Errors 4.1 Introduction ...
4.2 Context of This Chapter ...
4.3 Proposed Method ...
4.3.1 Outline of the Proposed Method ...
4.3.2 Server Side (Information Extraction) ...
4.3.3 Client Side (PSNR Estimation) ...
4.4 Experimental Results ...
4.4.1 Experimental Conditions ...
4.4.2 Decision for the Number of the Divided Regions . . 4.4.3 Comparisons with Conventional Methods ...
4.5 Conclusion ...
53 53 56 59 59 61 65 66 66 67 73 77
5 Video-Quality Estimation at the End-User Point without Any Original-Video Information79
5.1 Introduction ...79
5.2 Context of This Chapter ...82 5.3 Proposed Method ...85
5.3.1 Detecting Impairment Macroblocks by Analyzing Cod- ing Dependency ...85
5.3.2 Evaluation of Error-Concealment Effectiveness Us- ing Motion Information ... .. . . . . . . . 85
5.3.3 Evaluation of Error-Concealment Effectiveness Us-
ing Luminance Discontinuity at Impairment-Macroblock
Boundaries ...88
11
6
5.3.4 MSE Estimation by the Number of Error-Concealment-
Ineffective Macroblocks ...
5.4 Experimental Results ...
5.4.1 Experimental Conditions ...
5.4.2 Parameter Decision ...
5.4.3 Performance Evaluation ...
5.5 Conclusion ...
Conclusions
89 91 91 91 95 102 103
1.1
1.2 1.3
Monitoring points for IPTV services described in ITU-T G.1081. ...
Categorization of objective video-quality estimation approach Video-quality-estimation methods proposed in this disserta- tion. ...
es.
3
9 5
2.1 Video-quality-estimation method proposed in Chapter 2. . 2.2 Video-quality estimation based on activity difference. . . . 2.3 Pixel and activity values used for calculating the blockiness
level...
2.4 Partial-bit transmission for the activity value. ...
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 4.1 4.2 4.3 4.4
Video-quality-estimation method proposed in Chapter 3. .
Activity-difference calculation in the RR model [45_. . . . .
Activity-difference calculation in the proposed method. . . Diagram of the proposed method...
HF value of each frame ( " bus , MPEG-2, 4 Mbps). . . .
Information for the blockiness-level estimation. ...
Blur-level estimation with edge widths...
Scatter plot of the activity difference. (Training set) . . . . Scatter plot of the proposed method. (Training set) . .
Scatter plot of the proposed method. (Test set) ...
Video-quality-estimation method proposed in Chapter 4. . Framework of the RR model in conventional methods. . . . Framework of the RR model in the proposed method. . . . A diagram of the information extraction at a server side. .
14 17
25 25 32 34 35 37 38 42 44 48 49 50 55 57 58 59
4.5
4.6 4.7 4.8 4.9
4.10 4.11 4.12 4.13 4.14
4.15 4.16 4.17
Frame with a representative-luminance value indicated. (" 32"
is the representative-luminance value.) ...
Representative-luminance map for the frame shown in Fig- ure 4.5. ...
Transmitted information in the proposed method...
An example of the number of blocks which include a partic- ular luminance value...
Frame with multiple representative-luminance values indi- cated. ...
An example of block subsampling for bit-rate reduction. . Average bit rate of the extracted information for every sub- sampling pattern...
PSNR-estimation accuracy regarding frame-division patterns.
An example of the extracted information in the 4 x 4 divi-
sion case. ...
PSNR-estimation-accuracy comparisons with conventional methods. ...
Comparisons of correlation coefficients at low bit rates. . . Comparisons of RMSE at low bit rates...
Comparisons of the number of extracted pixels in a frame.
61
62 62 63
64 64
68 70 71
74 75 76 76 5.1
5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
5.10
Video-quality-estimation method proposed in Chapter 5. . Framework of the typical NR model. ...
Framework of the NR model in the proposed method. . . . An example of impairment macroblocks caused by transmis- sion errors. ...
Pixels along a boundary of an impairment macroblock. . . Relation between threshold Thmv and correlation coefficients.
Relation between threshold ThL and correlation coefficients.
The number of the impairment macroblocks vs. MSE (Train- ing set, Correlation coefficient: 0.86). ...
The number of the error-concealment-ineffective macroblocks
vs. MSE (Training set, Correlation coefficient: 0.94). . . . The number of the impairment macroblocks vs. MSE (Test set, Correlation coefficient: 0.87)
81 83 84
86 89 92 93
96 97
98
vi
5.13 5.14
Packet-loss ratio and correlation coefficient. ...
Comparison of average decoding time. ...
101 101
vii
List of Tables
2.1 2.2 2.3 2.4 3.1 3.2 3.3 4.1 4.2 5.1
Subjective video-quality test conditions. ...
Parameters for the weighting operations. ...
Experimental results for training set. ...
Experimental results for test set. ...
Subjective video-quality test conditions. ...
Functions for the score adjustments. ...
Experimental results for the test set. ...
Conditions for the experiments. ...
Data size and bit rate for the representative-luminance values . Experimental conditions...
26 27 28 28 46 47 51 67 72 94
ix
Introduction
1.1 Research Background
1.1.1 Video-Quality Monitoring in IPTV Services
With the rapid expansion of broadband network services, video-transmission
services using the Internet Protocol (IP) network, such as Video-on-Demand (VOD) or video-sharing services, become popular. Especially, it is pre- dicted that TV broadcasting services by the IP transmission (IPTV) will
be widely spread out all over the world. Generally, video data in IPTV
services are transmitted by the User Datagram Protocol (UDP) which does
not require re-transmission in order to implement realtime broadcasting.
Therefore, data losses due to no re-transmissions result in video-quality degradation. For IPTV services, it is expected to have at least equivalent video quality to the conventional broadcasting services, such as a terres-
trial digital broadcasting or a broadcast-satellite (BS) digital broadcasting.
Therefore, video-quality monitoring is one of the most important concern.
Conventionally, the video-quality monitoring is conducted by human observers before the actual broadcasting or when an end-user terminal re- ceives the video data. The video-quality monitoring by the human ob- servers suffers from quality-check leakage in addition to the expensive human-labor cost. Therefore, automatic and accurate video-quality moni- toring techniques are expected. From IPTV service viewpoints, the moni-
1
2 CHAPTER 1. INTRODUCTION
toring points should be at the broadcasting stations, on the network paths, and at the end-user terminals. In addition, such techniques are especially expected for the video-archive systems where video-quality checks of a large number of video contents are required. Therefore, this study aims
at a development of the objective video-quality estimation techniques for automatic and realtime video-quality monitoring in IPTV services.
In IPTV services, there are two factors of the video-quality degradation.
One is video-quality degradation by video coding (video,compression), and
the other is that by transmission errors. The quality-degradation factors which should be monitored depend on monitoring points such as content providers, service providers, network providers, and end-user terminals. In order to achieve video-quality monitoring at the various monitoring points, multiple video-quality estimation methods have to be developed.
The importance of the video-quality monitoring in IPTV services is widely recognized. International Telecommunication Union Telecommuni-
cation Standardization Sector (ITU-T), one of the specialized agencies in
the United Nations, defines video-quality monitoring locations for IPTV
services in its recommendation ITU-T G.1081 [1]. Generally, operations of
IPTV services are divided into multiple domains as shown in Fig.1.1. This is because each domain is often operated by different players or companies and video-quality monitoring at every interface point between adjacent domains can .be conducted for clarifying the responsibility of the players.
These monitoring functions can contribute the quick determination of the location where a problem occurs. The monitoring locations defined in ITU- T G.1081 are assumed to detect video-quality degradation at the following points.
PT1 : Quality check point when the content provider delivers the video con- tents to the service provider. Read-errors from tapes or disks are monitored. When the video contents were already coded by the con- tent provider, video-quality degradation by the video coding is also monitored.
PT2: Quality check point before the broadcasting. Video-quality degrada- tion by the video coding is monitored.
II I--- I I I C
ontent I II Service ProviderNetwork ProviderEnd I User
I ProviderIII
II II II I 1
II I Encoder/Service I II I
1 Content 1 1 Transcoder Server 1 II
I,....r4-73-I II a Home Gateway
I I (II '''' :IPrz---.::,Network [
1
-..•-,4,I1Network
I..--0---,---- M IiaL Cej _ I
1-- 11 1 1 1
II -JI 1I I Set Top Box
II I"vII IIIIII
IIIIII IIIIII
III ---aaill llt
Figure 1.1: Monitoring points for IPTV services described in ITU-T G.1081.
PT3
PT4
PT5
: Quality check point on the network paths. Video-quality degradation by the transmission errors is monitored. When transcoding (another video coding) is executed on a network path, video-quality degradation by the transcoding is also monitored.
: Quality check point before the delivery of video contents to the end users. It is monitored whether the video contents are successfully delivered to the end-users' houses or not.
: Quality check point for the video contents which are displayed on the end-user terminals. The quality depends on the performance of home networks and the end-user terminals.
4 CHAPTER I. INTRODUCTION
1.1.2 Categorization of the Objective Video-Quality Estimation
Objective video-quality estimation methods having a high correlation to subjective video quality have been discussed by the Video Quality Experts Group (VQEG) where experts from ITU-T and International Telecommu- nication Union Radiocommunications Sector (ITU-R) participate [2]. In ITU-T recommendation J.143 [3], objective video-quality estimation ap-
proaches can be categorized into following three types:
1) Full Reference (FR) models: evaluation of video quality by means of a comparison between an original video and a received video (Fig.1.2-1) . 2) No Reference (NR) models: evaluation of vi
of a received video alone (Fig.1.2-2).
deo quality on the basis
3) Reduced-Reference (RR) models:
both a received video and a small
from an original video (Fig.1.2-3).
evaluation of video quality using amount of information extracted
One of the simplest FR model is PSNR (Peak Signal-to-Noise Ratio).
PSNR is based on a pixel-by-pixel comparison and does not reflect the characteristic of human visual systems. Therefore, PSNR does not have high correlation with the subjective video quality. To achieve higher corre- lation, methods which treat a video frame as a structure have been studied
[4, 5, 6, 7]. These methods measure the change in luminance, contrast, and
structure in a video frame. Methods which consider the characteristic of
the human visual systems have also been studied 8, 9, 10, 11, 121. These methods achieve higher correlation with the subjective video quality by exploiting the characteristic of human visual systems. For international standards, methods based on the FR model have been described in ITU- T recommendations J.144 [13], J.247 [14], and J.341 [15. The FR model
achieves accurate video-quality estimation by exploiting the information of the original video, however, this model would not be suitable for realtime video-quality monitoring in IPTV service operations since almost all of the monitoring points described in Fig.1.1 would not be able to refer to original video data on the spot.
1) FR model
—Network*
End-User Original Server Terminal Video
--- Network ---> Quality Estimation
2) NR model
— Network –1>
End-User O riginal ServerTerminal
Video
Quality
Estimation
3) RR model
Original Video
Original Video Info. Extraction
End-User Terminal
Quality Estimation
Received Video Estimated Quality
Received Video
Estimated Quality
Received Video
Estimated Quality
Figure 1.2: Categorization of objective video-quality estimation approaches.
With regard to the RR model, since it transmits feature parameters extracted from the original video to end-users at low bit rates and it is not necessary to transmit the original video itself, it is suitable for realtime video-quality monitoring in IPTV service operations. In addition, it is expected to achieve more accurate quality estimation than the NR model because of exploiting the extracted information of the original video. Some methods based on the RR model have been described in specific terms
in [16, 17, 18, 19], and ITU-T recommendations, J.240 20, J.246 21], J.249 i22], and J.342 23]. These methods are designed in order to esti-
mate video-quality degradation due to both video coding and transmission
6 CHAPTER 1. INTRODUCTION
errors. When the information is extracted from video data which have al- ready been compressed, video-quality degradation by transmission errors is only estimated.
The NR model is also suitable for realtime video-quality monitoring since it does not use the information of the original video. Besides, since it does not need to transmit and receive the extracted information of the orig- inal video as the RR model needs, a monitoring system would relatively be simple. However, it is difficult to distinguish videb-quality degrada- tion from features of the video itself. For this reason, it is difficult for the NR model to achieve accurate video-quality estimation. International stan- dards based on the NR model have not been established yet. Some methods
based on the NR model previously proposed in [24, 25, 26, 27, 28] do not es- timate overall subjective video quality but estimate the degree of blockiness
or blur. ITU-T recommendation J.147 [29] and [30] present methods for
inserting invisible markers into the original video and determining degra- dation of the invisible markers at end-user terminals. Unfortunately, the insertion itself of invisible markers can lead to video-quality degradation.
To achieve accurate quality estimation for the NR model, methods which
use bitstream information have been studied [31, 32, 33, 34, 35]. These methods employ Discrete Cosine Transform (DCT) coefficients or quanti-
zation parameters to improve quality-estimation accuracy. Such methods, however, depend on the video-compression algorithms and can only be used for video sequences using the specific video-compression algorithm.
Other methods capable of dealing with transmission-error-caused degra-
dation have been proposed in [36, 37, 38], and ITU-T recommendations, P.1201 [39] and P.1202 [40]. These methods use only packet-header infor-
mation and do not take into account the media-layer information. There- fore, the accuracy of quality estimation would likely be low for the various types of video contents.
1.2 Scope of This Study
As discussed in the previous section, the main purpose of this study is to develop objective video-quality estimation methods for automatic and realtime video-quality monitoring in IPTV services. This dissertation pro- poses video-quality estimation methods based on the RR and NR models since they are suitable for realtime monitoring by IPTV service providers.
First, a method based on the RR model is proposed in order to estimate video-quality degradation by both video coding and transmission errors. In this method, the information of the original video is extracted before the video coding. This information is transmitted to the end-user terminals.
Video quality is evaluated on network paths or end-user terminals. When videos are coded by the content providers, the information extraction must be conducted by the content providers.
Some content providers, however, may not conduct this procedure and it is possible that video contents without the extracted information are delivered. In this case, when the IPTV service provider checks the video- quality degradation by the video coding, quality evaluation by the NR model is required since the service provider cannot refer to the original video. As the second proposal, a video-quality-estimation method based on the NR model is introduced in order to estimate video-quality degradation by the video coding.
With this NR method, the service provider can monitor video-quality degradation by the video coding. Therefore, a method to monitor that by the transmission errors is next required. A method based on the RR model is proposed for this purpose. A small amount of information is extracted from a compressed video. Video-quality degradation by the transmission errors is estimated with this information.
Nevertheless, there are some cases where it is difficult to transmit the extracted information. For example, there is a case where an IPTV service has already been operated and it is impossible to modify its transmission system. For such cases, an NR-based method for monitoring video-quality degradation by the transmission errors is also proposed. In this method, effectiveness of error concealment which an end-user terminal applies to video-frame regions damaged oy the transmission errors is analyzed for
8 CHAPTER 1. INTRODUCTION
video-quality estimation.
The proposed video-quality-estimation methods enable realtime video- quality monitoring in IPTV service at all monitoring locations defined in Fig.1.1.
Chapter 2:
RR model for Quality Degradation by both Video Coding and Transmission
Errors
IContent;Service Provider I Network Provider~I
IEnd User
IProvider1 11 'E 1I I I ncoder/ Service I I
Content (ITranscoder ServerII
1,,.,.- III 1 Network Network 1 --_.______
mri 1---I/11
1111P11.1 1 lamFams ihisqrsig
Chapter 3:
NR model for Quality Degradation by Video Coding
Chapter 4:
RR model for Quality Degradation by Transmission Errors
Chapter 5:
NR model for Quality Degradation by
Transmission Errors
Figure 1.3: Video-quality-estimation methods proposed in this dissertation.
1.3 Outline of This Dissertation
To summarize discussions in the previous section, the methods proposed in this dissertation can be described in Fig.1.3. This dissertation proposes four video-quality-estimation methods. Each method is proposed in the
following chapters of this dissertation. The dissertation is organized as follows:
Chapter 2 proposes a method based on the RR model. This method can evaluate quality degradation due to both the video coding and the transmission errors. As the extracted information of the original video, a value called "activity" which indicates a variance of luminance values is employed for every given-size pixel block. The activity values of the origi- nal video are transmitted to end-user terminals. At the network paths and the end-user terminals, the video quality of a received video is estimated
on the basis of the activity difference between the original video and the received video. Psychovisual weightings and video-quality score adjust-
ments for fatal degradations are applied in order to improve estimation
10 CHAPTER 1. INTRODUCTION
accuracy. In addition, low-bit-rate transmission for the extracted informa- tion is achieved by using temporal subsampling and by transmitting only the lower six bits of each activity value. With the extracted information
of 15 kbps for the standard definition television (SDTV), accurate video-
quality estimation is achieved. The correlation coefficient between actual subjective video quality and estimated quality is 0.901 in the case of the 15 kbps side information. This method has been adopted as an international
standard ITU-T recommendation J.249 Annex B [22].
Chapter 3 proposes a method based on the NR model. This method evaluates quality degradation due to the video coding. The proposed method does not need any bitstream information. Only pixel information of decoded video frames is used for the video-quality estimation. The ac- tivity values described in Chapter 2 are also employed. Firstly, the spatial- frequency information of the decoded video frames is analyzed to detect intra-coded frames which do not apply inter-frame prediction. Then, the activity difference between the intra-coded frame and its adjacent frame is calculated to estimate the amount of the quality degradation. In addi- tion, a blockiness level and a blur level are estimated at every frame by analyzing only pixel information. The estimated blockiness level and blur level are taken into account to improve quality-estimation accuracy in the proposed method. The proposed method achieves accurate video-quality estimation without the original video which does not include any artifacts by the video compression. The correlation coefficient between subjective video quality and estimated quality is 0.925.
Chapter 4 proposes a method based on the RR model for monitoring problems in network paths. This method evaluates quality degradation due to the transmission errors. The method enables to extract more pix-
els from each frame than a conventional method does at low bit rates.
Specifically, the extracted information consists of representative-luminance values which are chosen from each frame and their position information.
The representative-luminance values chosen for individual video frames at the server side and the pixel-position information of the representative- luminance values are transmitted to end-user terminals. On the basis of
this information, PSNR values at an end-user side can be estimated. Ac- curate estimation for video-quality degradation by transmission errors is achieved with addition of this small amount of information. For SDTV, ac-
curate PSNR estimation (correlation coefficient of 0.92 to 0.95) is achieved
with small amount of additional information of 10 to 50 kbps. When qual- ity degradation by only transmission errors are estimated, this method achieves more accurate quality estimation than the method proposed in Chapter 2 which take into account of quality degradation both the video coding and the transmission errors.
Chapter 5 proposes a method based on the NR model. This method evaluates quality degradation due to the transmission errors. The method is based on a hybrid of bitstream-information analysis and pixel-information
analysis. Video quality in terms of a mean square error (1VISE) between
degraded video frames and error-free video frames is estimated. With the proposed method, impairment macroblocks are accurately detected by bitstream-information analysis, and the effectiveness of error concealment for the impairment regions is evaluated using both the bitstream and the
decoded-pixel information. Error-concealment effectiveness is evaluated using motion information and luminance discontinuity at the boundaries
of impairment regions. Simulation results show a high correlation (cor- relation coefficients of 0.93) between the actual MSE and the number of
macroblocks in which error concealment has not been effective. A video decoder incorporated with the proposed method outputs accurately esti- mated MSE values.
Chapter 6 concludes the dissertation. With a combination of the pro- posed methods, accurate and realtime video-quality estimation in various
monitoring points for IPTV services can be executed. The proposed meth- ods contribute stable IPTV service operations.
Chapter 2
End-to-End Video-Quality
Estimation Based on
Reduced-Reference Model
2.1 Introduction
In this chapter, a reduced-reference based video-quality-estimation method is proposed. The proposed method enables to estimate video-quality degra-
dation by both video coding and transmission errors. The proposed method can be used at all monitoring locations shown in Fig.2.1
With the proposed method, activity values for individual given-size pixel blocks, which indicates a variance of luminance values, are transmitted to end-user terminals. These values indicate spatial-frequency levels and are used as original-video information. Video quality is estimated on the basis of the activity difference between the original video and the received video.
Psychovisual weightings with respect to the activity difference and video- quality-score adjustments for fatal degradations are also applied to improve estimation accuracy. In addition, low-bit-rate transmission of the feature parameters is achieved by using temporal subsampling and by transmitting only the lower bits of each activity value. Since the proposed method does
not need spatial registration and gain/offset registration, it is suitable for
13
Chapter 2:
RR model for Quality Degradation by both Video Coding and Transmission Errors
I--- I--- 1 I ContentService Provider 1 1 Network Provider 1 I End User II I P
roviderI III
I I IIIEncoder/ Transcoder ServiceI Server 1I II C
ontent1II I I *7 \ '
1 .i_iII „'nrg-I NetworkNetwork
1r"° PT4Wim
,
it L
— — —---a
Figure 2.1: Video-quality-estimation method proposed in Chapter 2.
real-time quality monitoring.
The subsequent sections of this chapter are organized as follows: Section 2.2 describes the proposed algorithm for estimating subjective video quality using activity-difference values; Section 2.3 discusses an evaluation of the performance of the algorithm, and Section 2.4 summarizes this chapter.
2.2. PROPOSED METHOD 15
2.2 Proposed Method
The proposed method first calculates activity-difference values, and then psychovisual weightings and video-quality score adjustments are adapted one by one. In this section, a basic concept of the activity difference is described first, and then psychovisual weightings and video-quality score adjustments are explained.
2.2.1 Calculation of the Activity Values and Their Square Er-
rors
To calculate PSNR, it is necessary to calculate a mean square error
(MSE) value of luminance values between the original video and the re-
ceived video. Let Xk be a luminance value in a 16 x 16 pixel block of the original video, Yk be one of the received video in the same position with Xk, and ek be noise induced, i.e.,
Yk Xk ± ek•(2.1)
We now assume ek is independent from Xk and
E[ek] = 0,(2.2)
where E[] is a function to calculate an average value. From this assumption,
the following relation is obtained:
Y -= E[Yk] = E[Xk + ek] = EXk _ + E[ek_
= E[Xk_ = X,(2.3)
where X and Y is the average values of the luminance values in the blocks.
For an RR approach, since all pixels cannot be used, we must consider using less amount of information. For that, standard deviation of the
luminance values is introduced. Standard deviation cr(X) for each 16 x 16
pixel is defined as:
1 255
o-(Xk) = \/Var:Xk] = ----E
\ 256(Xk — X)2
k,_____0
E[(Xk — X)2, (2.4)
where Var[] is a function to calculate a variance. Square error (SE) of
the standard deviation between the original video and the received video is calculated as:
SEa = (a(Xk) — g(Yk))2
(\/E[(Xk —X)2] — E[(Yk —17)21)2 -Var[Xd+ E[(Xk ek — X)2]
—2-VVar[Xk_ERXk ek X)2 -2Var[Xk] + 2E _(Xk — X)ek] E[e]
—2-VVar[Xk_ War Xkl Ejei]}
= 2V ar _Xk] E[e2k_
—2'\/Var[Xd{VarAk] E[e]1
= 2V ar _X k] E[q]
E[ei2,1
—2Var[Xk]1+ 1'(7
Var[Xkj.
Generally, since E[q]/Var[Xki is small enough in compressed video
sequences, SE of the standard deviation can be described as:
SEa 2Var[Xk] E-_e] — 2Var[Xk]
E[q].(2.6)
This shows that SE of the standard deviation can approximate MSE of the
luminance values in the case of E[qc /Var[Xk _ << 1 and can be used to es-
timate video quality since video sequences are satisfied with this condition in general.
For simplification of the SE calculation, the proposed method uses ac- tivity values instead of the standard deviation of the luminance values.
Activity values of both the original video (ActOrgi ,j) and the received video (ActDegi ,j) are calculated as:
1 255
ActOrgi ,i =---E - X ,(2.7)
256 k=„0
1 255
ActDegi ,i ----E 256 k=0 Y(2.8)
2.2. PROPOSED METHOD 17
Original Video
ActOrgi 1
- Network ->
Video Bitstream
255
Exk_x
k=0
Network Original Video Info
Received Video
ActDeg 1 j . 256
1
255
Yk Y
k=0
Quality Estimation
Based on Activity-Difference
E1 (Actorg1 — ActDeg j)2
Figure 2.2: Video-quality estimation based on activity difference.
where i, j, k are a frame number, a block number in the frame, and a pixel position in the block, respectively. Video quality is estimated on the basis of the MSE between them. Figure 2.2 summarizes the activity-difference calculation in the proposed method. As shown in Figure 2.2, activity values for the original video are calculated at each block and then transmitted to the end-user terminal. At the end-user terminal, the activity values for the received videos are calculated. The square of the difference between the activity values is calculated as:
= (ActOrgij — ActDegi ,j)2(2.9)
Another block size can be used to calculate the activity values, however, since video codecs generally adopt 16 x 16 pixel block size for quantization process, this block size is preferable to detect impairment by the quantiza- tion. Besides, if a large block size is adopted, it would be difficult to detect local impairments by transmission errors. Therefore, the proposed method adopts 16x16 pixel block size.
For some video sequences, luminance gain control may be adapted to optimize a brightness level to display device. In conventional approaches
employing pixel-difference, pixel-difference values between the original and received video would be large by the gain control and the estimated video
quality would be low. By way of contrast, since gain-factor multiplied in
direct-current (DC) components is cancelled in the activity calculation, ac-
tivity difference is less affected by the gain control. The proposed method is accurately able to estimate video quality even for gain controlled video sequences. Besides, spatial shifts may also be adapted for some video sequences. For conventional approaches in the basis of pixel difference,
calculation for spatial registration is required. The proposed method does not need any spatial registration. This is because the square error is cal- culated based on the activity values that are more robust to spatial shifts than those based on the pixel values.