participants. In contrast, facial regions in thermal images need not to be blurred.
In the optical level anonymous image sensing system, using thermal images to find the facial regions is privacy-safe.
In conclusion, in this experiment, the author showed that for face detection using RGB images, non-uniform lighting and fake faces can easily make the cas-cade classifiers unworkable. For face detection in thermal images, environment temperature and humidity can affect the face detection results by affecting the facial temperature distribution. However, we can see facial temperature distribu-tion in thermal images is much more constant than facial brightness distribudistribu-tion in RGB images. The reason is the change of pixel values on facial regions for thermal images under different environment temperature is much smaller than that of RGB images under different lighting condition with respect to the whole camera dynamic range. Furthermore, sometimes background hot objects become false alarms for face detection using thermal images, as showed in Figure 3.17 (f), but such cases are rare compared with the difficulty caused by fake faces for RGB images. In all the scenarios, we can see an advantage in face detection performance by using thermal images over that by using RGB images. Last but not least, we can see using thermal images to find facial regions is privacy-safe, since the facial regions in the thermal images look similar. This fact means it is feasible to use thermal camera for finding facial regions in the optical level anonymous image sensing system.
3.4 Summary and Limitations of the Proposed
effectively, and the best result comes from the one employing both of the proposed approaches. For the field experiment, the author showed the factors which affect the face detection performance for using both thermal and RGB images, further-more, the author also showed the robustness of face detection in thermal images over that in RGB images. The author showed and discussed the feasibility to use thermal images in the optical level anonymous image sensing system for the privacy protection.
There are also some limitations with the proposed ideas. First, MB-ALBP and MB-ALTP are proposed based on the assumption that the humane facial tempera-ture is generally constant and higher than the background temperatempera-ture. However, for some cases, this assumption is no longer effective. For example, in the winter outdoor environment, human facial temperature is close to the environment tem-perature when people stay out for long time. This temtem-perature is much lower than the normal facial temperature. This phenomenon will decrease the effectiveness of MB-ALBP and MB-ALTP. Second, the thermal images are used for face detection for their privacy protection property, since it is hard to recognize the identities of people. However, when the facial regions appear big enough in the thermal images, still some works such as [69, 70] are proposed for detecting faces. This situation poses some difficulty for privacy protection using thermal images. Third, now the speed of cascade classifiers with multiple feature types cannot compete with that with single feature type. These limitations are the consideration of our future works.
Chapter 4
Abnormal Behavior Detection by C3D-AE
4.1 Introduction
This thesis discusses the abnormal behavior detection within the scope of individ-ual abnormal behaviors using RGB cameras. To detect the abnormal behaviors, the most commonly adopted approaches [3, 45] consist of two steps: 1) feature extraction; 2) classification based on the extracted features.
For feature extraction, there are two methodologies for human action related applications using videos: 1) manually designed feature types by feature engi-neering, such as HOG/HOF [50] and HOG3D [51]. 2) learned feature types by deep neural network [54]. The main advantage of manually designed feature types is their generally easy-to-understand working mechanism than that of the learned feature types [71]. The reason is that manually designed feature types are designed by expert experience, which follows human’s way of thinking. The main problem of these manually designed feature types is the difficulty for guaranteeing their effec-tiveness for specific applications. To obtain more effective spatio-temporal feature types for videos, lots of deep neural networks are proposed, such as 3D CNN [55] and C3D [18]. Among these neural networks, the authors of C3D showed its efficiency and generic ability, and released a pre-trained model for following researchers. Since C3D can extract features with powerful descriptive ability, it is widely adopted in video-based human action related applications.
For classification based on the extracted features, a classifier needs to be built.
In the literature, individual abnormal behavior detection is generally solved by supervised learning [72] with two classification strategies [72]: 1) two-class classifi-cation; 2) one-class classification. In two-class classification, a classifier is trained by using both normal and abnormal samples with two pre-defined label types. In prediction, the samples are directly classified as normal or abnormal. To employ two-class classification strategy, there are two basic requirements [73, 74]: a) the
normal and abnormal behaviors can be listed and defined clearly; b) the normal and abnormal data are properly sampled, which means there should be enough training samples covering all the listed normal and abnormal behaviors. In one-class one-classification, a one-classifier is trained by only using the normal samples with one pre-defined label type. In this way, the normal samples are well characterized by the classifier. In prediction, the samples having similar characters with the training samples are deemed as normal, while those having different characters are deemed as out-of-the-normal [73]. For one-class classification, only the normal behaviors need to be well defined and sampled. By this reason, it is widely used in video-based abnormal behavior detection [58, 76, 77, 78, 79].
To adopt the suitable classification strategy for this thesis, the conditions of the application scenarios are analyzed. In application scenarios of this thesis, the variation of abnormal behaviors is much more than that of normal behaviors. For example, in the corridor scenario, normal behaviors are mostly walking in different manners or stopping. However, abnormal behaviors appear in various styles, such as falling down in different directions, fighting, jumping, setting fire, hitting the wall etc.. It is hard to list all the possible abnormal behaviors. Furthermore, most of the daily behaviors are normal, while the abnormal behaviors rarely happen.
It is not easy to collect enough abnormal samples covering all kinds of abnormal behaviors. These adverse conditions make it hard to meet the basic requirements of two-class classification. In this thesis, the author adopted one-class classifica-tion, since it has the advantage of only needing the clear definition of the normal samples, which is relatively easier. The out-of-normal samples in prediction are deem as abnormal. Also, it is more feasible to collect adequate normal samples covering the possible normal behaviors [57, 58, 59], since normal behaviors happen mostly. The main problem in employing one-class classification [73, 75] is that for an one-class classifier, only one side of the decision boundary is determined by the normal training samples. As a result, it is not easy for deciding the degree how tightly the decision boundary should fit to each of the classes. Especially, if the decision boundary is non-convex and long, the required number of training samples is larger than two-class classification strategy [73, 75] . However, since most of the daily behaviors are normal, this issue can be solved to some extend by collecting the large mount of normal samples easily.
This chapter describes the structure of the proposed deep neural network and its training/predicting methods for abnormal behavior detection. The author pro-posed a neural network for abnormal behavior detection by combining a 3D con-volutional neural network (C3D) for feature extraction and an autoencoder for detecting abnormal behaviors by one-class classification strategy. The input of the proposed network is a video clip with 16 frames. In the training stage, the author utilizes a pre-trained C3D as feature extractor. The author trains the au-toencoder by using the features extracted by the pre-trained C3D from video clips with normal behaviors. In the predicting stage, the author uses the
reconstruc-tion errors of the autoencoder to predict abnormal behaviors by comparing them with a threshold. Since the autoencoder is trained by using features extracted from videos with normal behaviors, features extracted from videos with abnormal behaviors will cause larger average reconstruction errors.