訂正確認報告書

(1)

訂正確認報告書

訂正承認日 2018 年 11 月 19 日訂正申請日 2018 年 11 月 1 日

題名 Research on Gradient Local Binary Patterns Method for Human Detection

著者氏名 _{Ning JIANG}

報告者氏名

集積システム分野、博士論文訂正ワーキング長

木村晋二

確認者氏名巽宏平

(2)

本論文は、学位規則第 23 条第 1 項に照らし、学位の取消には該当しないが、訂正を要する箇所が認められたため、これに対して著者によりなされた訂正について確認した結果を以下の通り報告する。

1. 訂正箇所と訂正内容

(1) 訂正箇所：In Chapter 1, introduction, from Page 1 to Page 4 訂正内容：記述の訂正

具体的内容：

1. Introduction

Human detection is a key problem in computer vision, which is widely used in image analysis, intelligent vehicle and visual surveillance. The target of this dissertation is to develop robust feature extraction algorithms that encode image regions as high dimensional feature vectors that support high accuracy human/non-human decisions.

In this chapter, I will introduce the background knowledge of object detection and human detection, and give an overview of this dissertation.

1.1 Image Understanding and Object Detection

Image understanding is the branch of the computer science. The purpose of it is to analysis of images to extract useful information from the world. It covers wide range of image processing fields and plays an important role in computer vision.

…(中略)…

generative approach and discriminative approach. Typically, Bayesian graphical models are used in generative method to characterize these parts and to model their co-occurrences. Discriminative uses machine learning method to classify each feature vector, to judge that it belongs to the object or not.

1.2 Human Detection and Its Applications

After face detection techniques have become practical, human detection in images and still videos is becoming a focus research topic in computer vision, with several applications that have the potential to impact quality of life. Consider the image content

…(中略)…

may facilitate the search some designated contents or search for relevant sub sequences. In this thesis, I will mainly discuss the human detection problem from the images like this. I extract the features or descriptors for either the whole body or various sub-parts, and construct the detector based on this.

(3)

1. Introduction

Human detection is a key problem in computer vision, which is widely used in image analysis, intelligent vehicle and visual monitoring. The purpose of this dissertation is to improve the algorithms in feature extraction for extracting powerful feature vectors for faster and more effective in human/non-human decisions.

In this chapter, the background knowledge is introduced, after that, an overview of this dissertation is presented.

1.1 Image Understanding and Object Detection

Image understanding is an important and hot research topic in computer vision.

The target is to analysis advanced and useful information by extracting the information in all images from cameras. Lots of algorithms in image processing are used in image understanding. And image understanding will become more important.

Human beings can get images from the natural by their eyes and get information from these images by their brain. But because of limited ability of human eyes and brain, human lost information in large field of vision cases or in long time cases. It is hard for human beings to get the important information in all small local areas from a huge field of vision, or keep the attention in a long time. Machine learning and pattern recognition are proposed to help human beings to get more information in large field of vision and in a continuous long time. Getting the descriptions and extracting more advanced information are the image understanding’s targets. There are many monitoring systems in many parts of the world. Through these monitoring systems, we can get valuable information to reflect the objective reality of the world, which make image understanding more important. But there is no useful solution for it because of its’ large computation. At first, the information is extracted by human vision. After getting huge data of images, I want to complete this by computers.

Compared with other image processes, image understanding get higher information in computer vision, like the knowledge of image semantics. Other image processing with lower level information, like image enhancement and image de-noising, deals with data such as the intensity or color value from the original image.

High level computer vision processing, like face detection and human detection in Figure 1, deals with abstract data such as object location or object size.

In image understanding, many times of object detection processes are done to get the information for images. Human detection, vehicle detection and face detection are the important applications of object detection.

(4)

learning method and recognition method. The purpose of representation method is to describe the object. The purpose of learning method is to learn the property of each class of objects and to train the classifier models for classification. The purpose of recognition is to classify objects into several classes by the classifier. So there are two parts in object detection process. Feature extraction and machine learning are included in the first part. The purpose of the first part is to train the classifier. In feature extraction, the pixels and their adjacent pixels values are extracted to a feature vector to describe the information of object, such as pixel information, edge information, shape information and local information. There are two methods for computing feature vector. In one method, each image region is equal, preferably that no information is lost. In another method, each image region is not equal. The regions in some areas may be important than the regions in some cluttered areas. There are also two groups of methods in detection part. The first is discriminative approach, to determine whether it is this object class by the feature vector. Another is generative approach, in which use Bayesian models for pattern recognition.

1.2 Human Detection and Its Applications

After face detection have become reality, human detection is becoming a hot research topic in computer vision, with several applications may affect quality of life.

Take image content analysis system for example, more than ten thousand photos can be taken by digital cameras in just two or three years, even it doesn’t be used frequently. Manually searching and managing these pictures is very cumbersome.

This is very useful if I can develop some image management software to automatically and quickly classify the images with human being inside or not. Human detection is very important for this purpose. And human detection also can be used in videos and films, such as videotape searching and real time monitoring. Combined with face and action recognition, human detection can help us to search for certain specified content or search related subsequences. In this thesis, human detection problems from images will be discussed. And the classifiers are extracted for either human bodies or human sub-parts.

(2) 訂正箇所：In introduction, Page 7 to Page 9 訂正内容：記述の訂正

具体的内容：

Beside, the human detection can be used in the intelligent vehicle system.

(5)

Pedestrian accident is one of the largest sources of traffic-related injuries. Robust human detector can reduce such kind of accident efficient. Majority of the accidents occur either at the pedestrian crossings or while reversing the car. Because of the speed, drivers has no time to take some actions to deal with the human suddenly appear in front of the cars; or the rear mirrors don’t provide a full view of the scene behind the car and this situation

…(中略)…

Thirdly, the illumination condition is different. Direct sunlight or dim lighting at night has influence on result of human detection. Although models of color and illumination invariance have made significant advances, they still are far from being effective solutions when compared to human and mammalian visual system which are extremely well adapted to such changes.

In addition, advanced driving assistance system (ADAS) is another application of human detection. Human traffic accident is a big class in accidents. An advanced driving assistance system can reduce the occurrence of this class of accidents. Most accidents happen when people cross street or cars turn astern. In a high speed case, drivers do not have enough time deal with the sudden appeared people; or avoid the invisible people behind the vehicles, and this invisible place became bad for children because their body size is too small. In these two cases, drivers could not find the people who is going to cause a car accident. Real time advanced driving assistance system with human detection can solve these problems efficiently. A Volvo's pedestrian and cyclist alert system from New York daily news on August 5, 2013 is shown in Figure 2. Human beings near the vehicle can be detected. Then driver has enough time to avoid a traffic accident.

The monitoring system in Fig.3 is another application of human detection. A monitoring system is a detecting of human actions, human faces, and so on. Warning or recording the misconduct in Security monitoring system, traffic management systems, and supermarket management system are the useful applications. And we can use human detection to collect some advanced information, like number of pedestrians across the street in one hour, and the percent of them who across the street in red light. Take multimedia analysis for example, a lot of data is needed to search in a monitoring system. Searching the content that I want costs a long time. In some cases, some helps from people are needed in monitoring systems. For example, people watch the real time videos or replays to search and mark the interest activities.

(6)

But human beings can’t do it perfectly, they are apt to make some mistakes, like losing key information. In view of this application, automatic monitoring system should be developed. In a powerful monitoring system, a segmentation is included to detect people in images, to keep tracking consistency, to classify object class and activities.

In these systems, human detection is very important. Some monitoring systems are shown in Figure 3. There are many points of view, and it is easy to miss some important information by human understanding.

There are many difficulties in human detection. From this century, more and more researchers have focused on human detection method, and the performance is still low for practical application. Therefore, unlike the face detection in commercial products, human detection still needs improvement.

The first difficulty in human detection is that the person’s appearance is different.

Like skin color detection is powerful used in the face detection, but it is not powerful in human detection, because people protect their skin by wearing clothes. Human beings can be caught as different shapes in images by cameras when they do different poses. And human beings also can be caught as different sizes in images when they stay in different distances from the camera. These different appearances are shown in Fig.4.

The second difficulty is, from image to image changes, the backgrounds are chaotic and changing. For example, the images taken outdoor are different from those taken indoor. The classify must be able to separate humans from intricate background areas. Some background objects are false detected to human being in detection stage. We could increase the detection rate by changing the parameters of the detector, but also increase the false detection at the same time.

The third difficulty is that the illumination conditions are different. Direct light or dim illumination affects human/non-human classification results. Although the algorithms for avoiding different color and illumination invariance cases have been improved a lot, they are not enough for reliable applications.

(3) 訂正箇所：In Page 18, Line 1 to Line 11 訂正内容：記述の訂正

具体的内容：

COV [3] assumes the computed tensor descriptors on Riemannian manifold. The feature vector for each pixel is calculated by:

2 2

, , , , , , ,arctan

T x

x y x y xx yy

y

x y I I I I I I I

I

 

  

 

 

(5)

(7)

Wherex,y are the location of the pixel, I_x,I_y,I_xx and I_yyare first and second partial intensity derivatives. The last term is the edge orientation information. Denote index set for target region R as S , a 8 x 8 covariance matrix can then be computed as:

1

1 ( )( )

1

S T

R i i

i

C z z

S  



  





⁽⁶⁾

Where  is the statistical mean ofz_i. Due to the symmetry of covariance matrix, only the upper triangle part needs to be stored as the feature vector for the detection.

A descriptor of a sub region is an 8*(8+1)/2=36-dimensional vector.

COV [3] using Riemannian manifold to compute the tensor descriptors. The feature vector of each pixel is computed by:

2 2

, , , , , , ,arctan

T x

x y x y xx yy

y

x y I I I I I I I

I

 

  

 

 

(5) Wherex,y are the position of the pixel, I_x,I_y,I_xx and I_yyare first and second partial intensity derivatives of the pixel. The last term is the edge orientation information. To denote the index set for target area R as S , an 8 x 8 covariance matrix can be computed as:

1

1 ( )( )

1

S T

R i i

i

C z z

S  



  





⁽⁶⁾

Where  is the statistical mean of z_i . Because of covariance matrix’s symmetrical characteristic, upper triangle parts are used as feature vectors for object detections. Each sub-region in a descriptor is stored as an 8×(8+1)÷2=36-dimensional vector.

(4) 訂正箇所：In Page 19, Line 13 to Line 21 訂正内容：記述の訂正

具体的内容：

Semantic-LBP is proposed in [4] which use the number of “1” pixels and the positions of the middle pixel in these continuous “1” pixels from the binary code of uniformLBP_8,1to extract a 2-dimensional histogram. Then all the pixels vote same weight “1” to its bins of this 2-D histogram. For each block, they transform the bin values of 2-D histogram to a 1-D vector for SVM training. 2⁸

HOG-LBP with Partial Occlusion Handling is proposed in [10] which combines HOG and S-LBP as a long length feature vector set. After adding an occlusion

(8)

problem.

Semantic-LBP is proposed in [4], in which, the length code(1 to 8) of “1” pixels in a continuous “1” area in circular uniform LBP_8,1, and the local positions code(1 to 8) of the median pixel in this continuous “1” area in circular local uniformLBP_8,1, are used to extract an Two-D histogram(8 by 8). After that, each pixels votes an equal weight

“one” to its bins of this Two-D histogram. For each sub-block, the Two-D histogram(8 by 8) is transformed to an One-D vector(64). And the vectors for each sub-block are combined as a high dimensional vector for the next training stage by Support Vector Machine Method.

HOG-LBP with Partial Occlusion Handling is proposed in [10] which combines HOG and S-LBP as a long length feature vector set. The best performance could be achieved by adding an occlusion handling process to solve this occlusion problem.

(5) 訂正箇所：From page 21, Line 20 to page 22, Line 1 訂正内容：記述の訂正

具体的内容：

SVM [23] is effective for learning with small sampling in high-dimensional space.

The decision rule is given by the following formula:

   

1

s ,

N

i i

i

f x K x x b





 ⁽⁹⁾

Where x_i are supported by support vector, Ns is the number of support vectors,

( , )

K x y

is the kernel function.

SVM [23] is an effective learning tool in a high-dimensional space. The decision of SVM is given by formula (9):

   

1

,

Ns

i i

i

f x K x x b





 ⁽⁹⁾

Where x_i are the values of each support vector by feature extraction, Ns is the number of these support vectors in SVM based detector, and

K x y ( , )

is the kernel function of this SVM based detector. The most four kernel functions are given by the following formulas:

(9)

2. 訂正理由

序章や各章の導入部において、同じ研究室の先輩の博士論文や他の論文からの不適切な引用が認められたため、 5ヶ所の訂正を指示した。

訂正 (1)、(2)については、同じ研究室で同様の研究を行っていた先輩の博士論文からの引用であった。また訂正 (3)～(5) は他の論文からの引用であった。

3. 訂正を認めた理由

訂正 (1)、(2)については、同じ研究室の先輩の博士論文からの引用であるが、本博士論文の研究成果に影響を与える部分ではないため、訂正は妥当と認める。

また、訂正(3)～(5) についても他の研究を紹介する部分であり、本博士論文の主たる成果に影響を与えないため、訂正は妥当と認める。

訂 正 確 認 報 告 書