Future Works - Study on Human Related Analysis in Privacy Protected Videos

classification based on the extracted features. C3D-AE consists of a 3D convo-lutional neural network for feature extraction and an autoencoder for modeling the normal behaviors, it is trained by one-class learning method. For the 3D convolutional neural network, the pre-trained C3D network [18] is utilized. This pre-trained C3D network was pre-trained by Sports-1M [19]. Then the fully con-nection layers of the C3D are removed and the remaining 3D convolutional and pooling layers are concatenated to an autoencoder. This autoencoder is trained by using the features extracted from video clips with normal behaviors by the re-maining C3D layers. In predicting, a video clip propagates through the C3D-AE, the reconstruction error of the autoencoder is compared with a threshold to pre-dict abnormal behaviors. The author performed three experiments: 1) a hold-out validation experiment; 2) a field test in corridor scenario; 3) a field test in pas-sageway scenario. In the hold-out validation experiment, the author captured a dataset from 22 participants containing seven kinds of normal behaviors and three kinds of abnormal behaviors. The author firstly showed the effectiveness of C3D-AE, and compared C3D-AE with other methods using videos with/without facial masks. In the two field tests, the author showed the applicability of the proposed C3D-AE for abnormal behavior detection in the real scenarios with robustness.

Nowa-days, deep learning is changing the landscape of computer vision, we also intend to use the new dataset to train CNN models for face detection in thermal images and compare the results with those obtained by the approaches in this thesis.

Abnormal behavior detection: In this thesis, the proposed abnormal be-havior detection approach is only tested in the corridor/indoor passageway en-vironments for individuals, we believe the approach can also be utilized in more situations. For the future work, it is worth testing the approach in more scenarios.

Furthermore, now in the dataset, the variations of abnormal behaviors are limited.

However, to increase the variations of behaviors to the dataset is not a straightfor-ward task, since the definition of normal and abnormal behaviors highly depends on the application scenarios. For example, jumping in the corridor is abnormal for most situations, however, on the playground, jumping is mostly normal. An algorithm that can automatically adapts itself to different situation is also a good future work direction.

Acknowledgments

First of all, I would like to express my sincere gratitude to Prof. Taniguchi, Prof. Shimada, Prof. Nagahara and Prof. Uchiyama, for their continuous help in my Ph.D study. With their insightful suggestions and invaluable encouragement, I can have a clear direction in all the time of research and writing of this thesis.

I would like to give my sincere thanks to Prof. Morooka in my thesis committee, for his insightful comments and help. Special thanks to Mr. Trung, who gave me help in improving my programming and writing abilities.

Second, I sincerely appreciate the support from the Chinese Scholarship Council (CSC) and the concerning from Consulate General of The People’s Republic of China in Fukuoka. Due to the financial support from the CSC, I can focus on my research without worrying about the living problem.

Third, I am also grateful to all my colleagues and secretary in LIMU. For the four years of wonderful and memorable time with them. Also with their help, I can capture the datasets used in this thesis.

Last but not least, I would like to express my deepest thanks to my wife and my parents for all their love and encouragement through all the time of my life.

References

[1] B. C. Welsh, D. P. Farrington, Surveillance for crime prevention in public space: Results and policy choices in britain and america, Criminology &

Public Policy 3 (3) (2004) 497–526.

[2] D. Lyon, Surveillance society: Monitoring everyday life, McGraw-Hill Educa-tion, UK, 2001.

[3] A. Al-Dhamari, R. Sudirman, N. H. Mahmood, Abnormal behavior detection in automated surveillance videos: A review, Journal of Theoretical & Applied Information Technology 95 (19) (2017) 5245–5263.

[4] D. Lyon, Surveillance as social sorting: Privacy, risk, and digital discrimina-tion, Routledge, Oxford, United Kingdom, 2002.

[5] F. Pittaluga, S. J. Koppal, Pre-capture privacy for small vision sensors, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (11) (2017) 2215–2226.

[6] P. K. Atrey, M. S. Kankanhalli, A. Cavallaro, Intelligent multimedia surveil-lance, Springer, Berlin, Germany, 2016.

[7] E. M. Newton, L. Sweeney, B. Malin, Preserving privacy by de-identifying face images, IEEE Transactions on Knowledge and Data Engineering 17 (2) (2005) 232–243.

[8] F. Peng, X. W. Zhu, M. Long, An roi privacy protection scheme for h. 264 video based on fmo and chaos, IEEE Transactions on Information Forensics and Security 8 (10) (2013) 1688–1699.

[9] A. Senior, S. Pankanti, A. Hampapur, L. Brown, Y. L. Tian, A. Ekin, J. Con-nell, C. F. Shu, M. Lu, Enabling video privacy through computer vision, IEEE Security & Privacy 3 (3) (2005) 50–57.

[10] H. Sohn, W. Deneve, Y. M. Ro, Privacy protection in video surveillance sys-tems: Analysis of subband-adaptive scrambling in jpeg xr, IEEE Transactions on Circuits and Systems for Video Technology 21 (2) (2011) 170–177.

[11] K. Martin, K. N. Plataniotis, Privacy protected surveillance using secure vi-sual object coding, IEEE Transactions on Circuits and Systems for Video Technology 18 (8) (2008) 1152–1162.

[12] I. Mart´ınez-Ponte, X. Desurmont, J. Meessen, J. F. Delaigle, Robust human face hiding ensuring privacy, in: Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMS), Montreux, Switzerland, 2005, Vol. 4.

[13] D. Chen, Y. Chang, R. Yan, J. Yang, Tools for protecting the privacy of spe-cific individuals in video, EURASIP Journal on Advances in Signal Processing 2007 (1) (2007) 075427.

[14] Y. Zhang, Y. Lu, H. Nagahara, R. I. Taniguchi, Anonymous camera for pri-vacy protection, in: Proceedings of the 22nd International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, 2014, pp. 4170–4175.

[15] Y. Iwashita, S. Takaki, K. Morooka, T. Tsuji, R. Kurazume, Abnormal be-havior detection using privacy protected videos, in: Proceedings of the IEEE Fourth International Conference on Emerging Security Technologies (EST), Cambridge, United Kingdom, 2013, pp. 55–57.

[16] L. Zhang, R. Chu, S. Xiang, S. Liao, S. Z. Li, Face detection based on multi-block lbp representation, in: Proceedings of the International Conference on Biometrics (ICB), Crystal City, VA, USA, 2007, pp. 11–18.

[17] OpenCV-Team, Opencv 2.4.9.0 documentation, https://docs.opencv.

org/2.4.9/modules/objdetect/doc/cascade_classification.html#

featureevaluator-calcord, (accessed on 1 April 2018).

[18] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotem-poral features with 3d convolutional networks, in: Proceedings of the 15th International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 4489–4497.

[19] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei Fei, Large-scale video classification with convolutional neural networks, in: Pro-ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, 2014, pp. 1725–1732.

[20] P. S. Kumar, G. P. Babu, Intelligent multimedia data: data+ indices+ infer-ence, Multimedia Systems 6 (6) (1998) 395–407.

[21] E. Bertino, J. Fan, E. Ferrari, M. S. Hacid, A. K. Elmagarmid, X. Zhu, A hier-archical access control model for video database systems, ACM Transactions on Information Systems 21 (2) (2003) 155–191.

[22] K. Chinomi, N. Nitta, Y. Ito, N. Babaguchi, Prisurv: privacy protected video surveillance system using adaptive visual abstraction, in: Proceedings of the International Conference on Multimedia Modeling (MMM), Kyoto, Japan, 2008, pp. 144–154.

[23] Verizon, Data breach investigations report, http://www.

verizonenterprise.com/verizon-insights-lab/dbir/2017, (accessed on 1 April 2018).

[24] P. Korshunov, A. Melle, J. L. Dugelay, T. Ebrahimi, Framework for objective evaluation of privacy filters, in: Proceedings of SPIE Applications of Digital Image Processing XXXVI, San Diego, USA, 2013, Vol. 8856, p. 88560T.

[25] F. Pittaluga, A. Zivkovic, S. J. Koppal, Sensor-level privacy for thermal cam-eras, in: Proceedings of the IEEE International Conference on Computational Photography (ICCP), Evanston, USA, 2016, pp. 1–12.

[26] T. Winkler, B. Rinner, Sensor-level security and privacy protection by em-bedding video content analysis, in: Proceedings of the18th International Con-ference on Digital Signal Processing (DSP), Fira, Greece, 2013, pp. 1–6.

[27] J. Fern´andez-Berni, R. Carmona-Gal´an, R. del R´ıo, R. Kleihorst, W. Philips, A. Rodr´ıguez-V´´ azquez, Focal-plane sensing-processing: A power-efficient ap-proach for the implementation of privacy-aware networked visual sensors, Sen-sors 14 (8) (2014) 15203–15226.

[28] T. Hamada, K. Benkrid, K. Nitadori, M. Taiji, A comparative study on asic, fpgas, gpus and general purpose processors in the o (nˆ 2) gravitational n-body simulation, in: Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS), San Francisco, USA, 2009, pp. 447–452.

[29] K. Keutzer, S. Malik, A. R. Newton, From asic to asip: The next design discontinuity, in: Proceedings of the IEEE International Conference on Com-puter Design: VLSI in ComCom-puters and Processors (ICCD), Freiburg, Ger-many, 2002, pp. 84–90.

[30] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Proceedings of the IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR), Kauai HI, USA, 2001, Vol. 1, pp. 511–518.

[31] S. Zafeiriou, C. Zhang, Z. Zhang, A survey on face detection in the wild: past, present and future, Computer Vision and Image Understanding 138 (2015) 1–

24.

[32] K. Reese, Y. Zheng, A. Elmaghraby, A comparison of face detection algo-rithms in visible and thermal spectrums, in: Proceedings of the Int’l Conf. on Advances in Computer Science and Application, 2012.

[33] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 2005, Vol. 1, pp.

886–893.

[34] H. X. Jia, Y. J. Zhang, Fast human detection by boosting histograms of oriented gradients, in: Proceedings of the Fourth International Conference on Image and Graphics (ICIG), Chengdu, China, 2007, pp. 683–688.

[35] Y. S. Salas, D. V. Bermudez, A. M. L. Pe˜na, D. G. Gomez, T. Gevers, Im-proving hog with image segmentation: Application to human detection, in:

Proceedings of the 14th International Conference on Advanced Concepts for Intelligent Vision Systems, Brno, Czech Republic, 2012, pp. 178–189.

[36] T. Ojala, M. Pietik¨ainen, D. Harwood, A comparative study of texture mea-sures with classification based on featured distributions, Pattern Recognition 29 (1) (1996) 51–59.

[37] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Transactions on Image Processing 19 (6) (2010) 1635–1650.

[38] S. Liao, X. Zhu, Z. Lei, L. Zhang, S. Z. Li, Learning multi-scale block lo-cal binary patterns for face recognition, in: Proceedings of the International Conference on Biometrics (ICB), Crystal City, VA, USA, 2007, pp. 828–837.

[39] X. Jia, X. Yang, Y. Zang, N. Zhang, R. Dai, J. Tian, J. Zhao, Multi-scale block local ternary patterns for fingerprints vitality detection, in: Proceedings of the International Conference on Biometrics (ICB 2013), Phuket, Thailand, 2013, pp. 1–6.

[40] X. Wang, T. X. Han, S. Yan, An hog-lbp human detector with partial oc-clusion handling, in: Proceedings of the 12th International Conference on Computer Vision (ICCV 2009), Kyoto, Japan, 2009, pp. 32–39.

[41] Y. Jiang, J. Ma, Combination features and models for human detection, in:

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, 2015, Vol. 1, pp. 240–248.

[42] T. Mita, T. Kaneko, O. Hori, Joint haar-like features for face detection, in:

Proceedings of the IEEE 10th International Conference on Computer Vision (ICCV), Beijing, China, 2005, Vol. 2, pp. 1619–1626.

[43] C. Xia, S. F. Sun, P. Chen, H. Luo, F. M. Dong, Haar-like and hog fusion based object tracking, in: Proceedings of the Pacific Rim Conference on Multimedia (PCM), Kuching, Malaysia, 2014, pp. 173–182.

[44] J. Rafferty, J. Synnott, C. Nugent, G. Morrison, E. Tamburini, Fall detection through thermal vision sensing, in: Proceedings of the 10th International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI), Gran Canaria, Spain, 2016, pp. 84–90.

[45] M. Mubashir, L. Shao, L. Seed, A survey on fall detection: Principles and approaches, Neurocomputing 100 (2013) 144–152.

[46] E. E. Stone, M. Skubic, Fall detection in homes of older adults using the microsoft kinect, IEEE Journal of Biomedical and Health Informatics 19 (1) (2015) 290–301.

[47] R. Mehran, A. Oyama, M. Shah, Abnormal crowd behavior detection using social force model, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Miami, 2009, pp. 935–

942.

[48] O. E. Rojas, C. L. Tozzi, Abnormal crowd behavior detection based on gaus-sian mixture model, in: Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016, pp. 668–675.

[49] V. Mahadevan, W. Li, V. Bhalodia, N. Vasconcelos, Anomaly detection in crowded scenes, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA, 2010, pp. 1975–1981.

[50] I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in: Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR), Alaska, USA, 2008, pp. 1–8.

[51] A. Klaser, M. Marsza lek, C. Schmid, A spatio-temporal descriptor based on 3d-gradients, in: Proceedings of the 19th British Machine Vision Conference (BMVC 2008), Leeds, United Kingdom, 2008, Vol. 275, pp. 1–10.

[52] H. Wang, A. Kl¨aser, C. Schmid, C. L. Liu, Dense trajectories and motion boundary descriptors for action recognition, International Journal of Com-puter Vision 103 (1) (2013) 60–79.

[53] H. Wang, C. Schmid, Action recognition with improved trajectories, in: Pro-ceedings of the IEEE 14th International Conference on Computer Vision (ICCV), Sydney, Australia, 2013, pp. 3551–3558.

[54] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436.

[55] S. Ji, W. Xu, M. Yang, K. Yu, 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine In-telligence 35 (1) (2013) 221–231.

[56] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation 1 (4) (1989) 541–551.

[57] D. M. Tax, R. P. Duin, Support vector domain description, Pattern Recogni-tion Letter 20 (11-13) (1999) 1191–1199.

[58] C. Piciarelli, G. L. Foresti, Surveillance-oriented event detection in video streams, IEEE Intelligent Systems 26 (3) (2011) 32–41.

[59] S. Hawkins, H. He, G. Williams, R. Baxter, Outlier detection using replicator neural networks, in: Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Aix-en-Provence, France, 2002, pp. 170–180.

[60] M. J. Wilber, V. Shmatikov, S. Belongie, Can we still avoid automatic face detection?, in: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2016, pp. 1–9.

[61] M.-Y. Cho, Y.-S. Jeong, Face recognition performance comparison of fake faces with real faces in relation to lighting., Journal of Internet Services and Information Security 4 (4) (2014) 82–90.

[62] M. Vollmer, K.-P. M¨ollmann, Infrared thermal imaging: fundamentals, re-search and applications, John Wiley & Sons, 2017.

[63] S. Ariyaratnam, J. P. Rood, Measurement of facial skin temperature, Journal of Dentistry 18 (5) (1990) 250–253.

[64] G. Hermosilla, J. Ruiz-del Solar, R. Verschae, M. Correa, A comparative study of thermal face recognition methods in unconstrained environments, Pattern Recognition 45 (7) (2012) 2445–2459.

[65] S. Wang, Z. Liu, S. Lv, Y. Lv, G. Wu, P. Peng, F. Chen, X. Wang, A natural visible and infrared facial expression database for expression recognition and emotion inference, IEEE Transactions on Multimedia 12 (7) (2010) 682–691.

[66] L. Karlinsky, M. Dinerstein, D. Levi, S. Ullman, Combined model for de-tecting, localizing, interpreting and recognizing faces, in: Proceedings of the Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recog-nition, Palais des Congres Parc Chanot, France, 2008.

[67] J. Howse, Training detectors and recognizers in python and opencv, in: Pro-ceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, Germany, 2014, pp. 1–2.

[68] S. Puttemans, E. Can, T. Goedem´e, Improving open source face detection by combining an adapted cascade classification pipeline and active learning, in:

Proceedings of the VISAPP 2017, Porto, Protugal, March.

[69] D. Socolinsky, A. Selinger, J. Neuheisel, Face recognition with visible and thermal infrared imagery, Computer vision and image understanding 91 (1-2) (2003) 72–114.

[70] X. Chen, P. J. Flynn, K. Bowyer, Ir and visible light face recognition, Com-puter Vision and Image Understanding 99 (3) (2005) 332–358.

[71] R. Setiono, W. K. Leow, J. Y. L. Thong, Opening the neural network black box: an algorithm for extracting rules from function approximating artificial neural networks, in: Twenty First International Conference on Information Systems, Brisbane, Queensland, Australia, 2000.

[72] O. P. Popoola, K. Wang, Video-based abnormal human behavior recognition:

A review, IEEE Transactions on Systems, Man, and Cybernetics 42 (6) (2012) 865–878.

[73] S. S. Khan, M. G. Madden, A survey of recent trends in one class classifica-tion, in: Proceedings of 20th Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, 2009, pp. 188–197.

[74] A. M. Bartkowiak, Anomaly, novelty, one-class classification: a comprehensive introduction, International Journal of Computer Information Systems and Industrial Management Applications 3 (1) (2011) 61–71.

[75] D. Tax, One class classication, Delft University of Technology, (2001).

[76] D. H. Hu, X. X. Zhang, Y. Jie, V. W. Zheng, Y. Qiang, Abnormal activity recognition based on hdp-hmm models, in: International Jont Conference on Artifical Intelligence, Hainan Island, China, 2009.

[77] Y. Jie, Y. Qiang, J. J. Pan, Sensor-based abnormal human-activity detection, IEEE Transactions on Knowledge and Data Engineering 20 (8) (2008) 1082–

1090.

[78] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, L. S. Davis, Learning temporal regularity in video sequences, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2014, pp. 733–742.

[79] J. Sun, S. Jie, C. He, Abnormal event detection for video surveillance using deep one-class learning, Multimedia Tools and Applications (3) (2017) 1–15.

[80] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507.

[81] R. Leyva, V. Sanchez, C.-T. Li, The lv dataset: A realistic surveillance video dataset for abnormal event detection, in: Proceedings of the 5th International Workshop on Biometrics and Forensics (IWBF 2017), Coventry, United King-dom, 2017, pp. 1–6.

[82] A. Adam, E. Rivlin, I. Shimshoni, D. Reinitz, Robust real-time unusual event detection using multiple fixed-location monitors, IEEE Transactions on Pat-tern Analysis and Machine Intelligence 30 (3) (2008) 555–560.

[83] C. Lu, J. Shi, J. Jia, Abnormal event detection at 150 fps in matlab, in:

Proceedings of the IEEE 14th International Conference on Computer Vision (ICCV), Sydney, Australia, 2013, pp. 2720–2727.

[84] J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learn-ing and stochastic optimization, Journal of Machine Learnlearn-ing Research 12 (2011) 2121–2159.

[85] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedfor-ward neural networks, in: Proceedings of the 13th international conference on artificial intelligence and statistics (AISTATS), Chia Laguna Resort, Sardinia, Italy, 2010, pp. 249–256.

[86] L. V. D. Maaten, G. Hinton, Visualizing data using t-sne, Journal of machine learning research 9 (2008) 2579–2605.

[87] THOTH, idt codes, http://lear.inrialpes.fr/people/wang/dense_

trajectories, (accessed on 1 April 2018).

[88] Scikit-learn, Scikit-learn home page, http://scikit-learn.org/stable, (accessed on 1 April 2018).

[89] Y. S. Lee, W. Y. Chung, Visual sensor based abnormal event detection with moving shadow removal in home healthcare applications, Sensors 12 (1) (2012) 573–584.

[90] C. Rougier, J. Meunier, A. St-Arnaud, J. Rousseau, Fall detection from hu-man shape and motion history using video surveillance, in: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW), Niagara Falls, Canada, 2007, Vol. 2, pp.

875–880.

Appendix A

Introduction to Optical Level Anonymous Image Sensing System

In Chapter 1, the author introduces the concept of optical level anonymous image sensing system for privacy protection. This part will describe the details of pro-totype design of the optical level anonymous image sensing system proposed by our lab in [14]. Firstly, the hardware design is described. The device used as the SLM is Liquid Crystal on Silicon (LCoS), which is a reflective device. Using this device, the light rays from the scene into the RGB image sensor are controlled.

Second, the calibration process proposed in [14] is described. The purpose of the calibration process is to find the position relationship between the facial regions detected in the thermal image and the displayed masks on the LCoS device. Third, the face masking software is described.

A.1 Hardware

Our lab built a prototype of optical level anonymous image sensing system ac-cording to the concept introduced in Figure 1.1. The system configuration is given in Figure A.1 (a), the prototype system consists of three main parts: a thermal camera, an LCoS camera and a cold mirror. The LCoS camera functions as the combination of the RGB camera and the SLM device in Figure 1.1. The prototype system is shown in Figure A.1 (b).

The cold mirror in our implementation is made of germanium (Ge). This material can efficiently transmit long-wave infrared light. The transmittance of the 8µm-14µm light wave is higher than 80%, which guarantees the imaging for the thermal camera behind it. It reflects the whole visible light spectrum, the reflectance of the visible light is adequate for the imaging of the RGB camera.

Using the cold mirror, the system can simultaneously obtain the thermal and the RGB images.

The thermal camera in our implementation is PI450 with a 62^◦ FOV lens from Optris. This thermal camera is sensitive to wavelength from 8µm to 14µm. The resolution of the camera is 382(H)*288(V). When the camera is set to raw image mode, it records thermal images over the temperature range from−20^◦C to 100^◦C.

This temperature range is adequate for the application of face detection.

Component Type Parameters Manufacturer

Objective lens HF9HA-1B 9mm, F1.4 Fujinon

Relay lenses - 35mm, F2, Φ11 Lens-ya

Polarizing beam splitter Part No. 49002 - Edmund Optics LCoS device HED 5216 1280×768@60 fps HOLOEYE RGB camera GS3-U3-28S5C-C 1920×1440@26 fps FLIR

Table A.1: Components used in the LCoS camera.

The LCoS camera and its principle are shown in Figure A.1 (c) and (d), respec-tively. In Figure A.1 (d), we see the LCoS camera is composed of an objective lens, an LCoS device, an RGB camera, a beam splitter and several relay lenses. We show the LCoS camera details by listing all the components of the LCoS camera in Table A.1. All the components can be bought or ordered easily in the market for reproducing it. We show the principle of LCoS camera by its optical paths (Figure A.1 (d)). At beginning, the rays from the scene pass through an objective lens, and form an image on a virtual image plane. This image then reaches a polarizing beam splitter. This beam splitter reflects light rays with S-polarization and transmits the light rays with P-polarization. The reflected S-polarized light rays are imaged on LCoS by a relay lens. The pixels of LCoS device can change the polarization by any degrees. If we display 255 for a pixel on the LCoS, the light polarization is changed into P-polarization by 90 degree, and the reflected P-polarized light rays by this pixel can pass the beam splitter again and imaged on the RGB sensor by a relay lens. If we display 0 for a pixel on the LCoS, the polarization of light rays keeps the same, and the reflected light rays from this pixel cannot pass the the beam splitter. To block the light rays from the facial regions, we display 255 and 0 for pixels on the LCoS for the non-facial and facial regions, respectively. By the displayed masks on the LCoS, the rays from the faces in the scene cannot reach the RGB sensor, while other rays can. In the latent image on the RGB image sensor plane, facial regions appear completely black, since in these regions, no light is received. As a result, the images output from the RGB camera are privacy enhanced with facial masks.

FigureA.1:Thehardwareprototypeoftheopticallevelanonymousimagesensingsystem.(a)Configurationofthe prototypesystem,wheretheLCoScamerafunctionsasthecombinationoftheRGBcameraandSLMdeviceinFigure1.1. (b)Ourimplementationoftheprototypesystem.(c)LCoScamerafromthesystemin(b).(d)Theprincipleofthe LCoScamera,weshowbyopticalpathsoftheLCoScamera.

A.2 System Calibration

After the hardware is constructed, we need to calibrate the system. The method for the system calibration is proposed in [14]. In the system, there are three image planes, the planes of RGB and thermal image sensors, and the display plane of LCoS devise. When the system works, it firstly detects the facial regions in the thermal images. Then, the masks are displayed on LCoS device, which prevent the rays of the facial regions from entering into the RGB camera. From this process, we can see, after we find the facial regions from the thermal image, we need to know the corresponding locations on the LCoS devise to display the mask.

Theoretically, we use homographies to model the relationship between the im-age planes. A homography is a matrix, by which any two imim-ages of the same planar surface in the scene are related. Suppose we have points p_α and p_β on the image planesαandβ, respectively. These two points are imaged by the same point from the scene. The homograghy describes the relationship of these two points is p_β =H_αβp_α. Further, we can write the specific form as follow:



 ωx_β ωyβ



=





H_αβ¹¹ H_αβ¹² H_αβ¹³ H_αβ²¹ H_αβ²² H_αβ²³ H_αβ³¹ H_αβ³² H_αβ³³



·



 x_α yα



. (A.1)

Theoretically, if we can find at least four corresponding point pairs on the two planes, we can estimate the homography.

Our purpose is finding the homography HT L between the thermal camera and the LCoS device, so that after the face detection in the thermal images, we can know the corresponding location on the LCoS plane to display the masks. However, HT L is hard to calculate directly, because the pattern displayed on the LCoS cannot be captured by the thermal camera directly. On the other hand, we have the relationship:

HT L=H_LR⁻¹·HT R. (A.2)

The H_LR is the homography between the image displayed on LCoS and RGB image, and HT R is that between thermal and RGB images. We also show the relationship of the homographies of the three image planes in Figure A.2 (a). By the relationship described by Equation (A.2), we can calculate H_{T L} indirectly by knowing HLR and HT R.

FigureA.2:Thesystemcalibrationprocess.(a)Thehomographiesbetweenthethreeimageplanes.(b)Thecalibration boardusedforfindingthehomographybetweenthethermalandtheRGBimages.(c)Calibrationboardcapturedby theRGBcamera.(d)Thesamepositionedcalibrationboardcapturedbythethermalcamera.(e)Checkerboardpattern displayedontheLCoSdevice.(f)CheckerboardpatterncapturedbytheRGBcamerafromtheLCoSdevice,which displaysthecheckerboardpatternin(e).

To estimate the H_{T R}, we created a calibration board with holes as showed in Figure A.2 (b). We installed one light bulb behind each hole. We chose the light bulb covering the spectrum of visible and thermal IR band, so that the points of the holes can be captured by the RGB and thermal cameras at the same time.

Figure A.2 (c) and (d) show the images of the same positioned calibration board captured by the RGB and the thermal cameras in the system. After we got these images, we manually clicked the positions of the corresponding points p_T(x_T, y_T) and p_R(x_R, y_R) from these images. Then H_{T R} can be estimated by least square method using these corresponding points.

To estimate the H_LR, we displayed the checkerboard pattern on LCoS, then captured it using the RGB camera in the system. In Figure A.2 (e) and (f), we show the calibration pattern displayed on the LCoS and that captured by the RGB camera in the system, respectively. For the corner points of the checkerboard pattern displayed on the LCoS, we already knew their locations on LCoS coordinate system. For the corner points in the RGB image, we detected them using corner point detection function by OpenCV 2.4.9. All the corresponding pointsp_L(x_L, y_L) and p_R(x_R, y_R) were used for estimating H_LR by least square method.

A.3 Face Masking Software

The software for the system is designed according to the working process of the prototype system, the flowchart is shown in Figure A.3 (a).

To find the facial regions in thermal images, two methods were implemented:

1) thresholding based method; 2) face detection based method. The thresholding based method used was introduced in [14]. Two thresholds were set, T_low and T_high, and we assumed that the facial temperature is within the range of the two thresholds. Then we can segment the pixels within this range as facial regions. The face detection based method used was the best performed mixed features which described in Chapter 3. Remarkably, when using the face detection based method to find facial regions, the shape of the masks can be quite flexible. In different applications, different mask shapes can be adopted. The step for changing the mask shape is realized by adding some post-processing after the face detection in the thermal images. Three example mask shapes are demonstrated in Figure A.3 (c).

To speed up the system, the whole algorithm was implemented by parallel pro-gramming technique. The whole algorithm is divided into three subtasks: subtask 1 is acquiring the thermal images and finding facial regions in them; subtask 2 is generating masks and displaying them on the LCoS device; subtask 3 is out-putting the RGB images for customer use. Three threads were used to run the three subtasks. We can clearly see subtask 2 relies on subtask 1, since to generate the masks, the facial regions must be firstly found. Subtask 3 relies on subtask

2, since only after the masks are displayed, the RGB images can be output. We set two mutexes for the communication of the subtasks: mutex 1 is between sub-task 1 and subsub-task 2, which stores the locations of the detected facial regions.

After the subtask 1 finds the locations of the facial regions in the thermal image, it writes these locations to the mutex 1. Then, subtask 2 reads these locations from mutex 1, generates the corresponding masks and displays them on the LCoS device. Mutex 2 is between subtask 2 and subtask 3, which stores flag indicating the completion of the mask displaying on the LCoS by the subtask 2. The subtask 3 continuously reads from the mutex 2, if the flag is set, it outputs the RGB frame to the user. Figure A.3 (b) shows the implementation of parallel programming.

Even the speed of the system highly depends on the hardware, it is still worthy to give an evaluation result for reference. The time of each subtask was estimated by averaging the time in the duration of 15 minutes. In estimating, the author used a PC with i7-2600K CPU and 16GB memory. For subtask 1, when using thresh-olding based method, the averaged time was about 2.25 ms. When using face detection based method, we downsized the thermal images from 382(H)*288(V) to 191(H)*144(V) to increase the speed, and the averaged time was 32 ms. For subtask 2, the averaged time was 33.5 ms. The averaged time for subtask 3 was 3.5 ms. The author also measured the frame rates using parallel and serial pro-gramming methods. The detection program was implemented based on Windows 7 operation system, which is not a real time system, as a result, the frame rate is decided not only by the time consumption of subtasks but also affected by the scheduling algorithm of operating system as well as the transmission delay of the thermal camera, RGB camera and the LCoS device. When using the parallel programming method, the frame rates employing the thresholding based and face detection based methods are 24 fps and 20 fps, respectively. As comparison, when using the serial programming method, the frame rates employing the thresholding based and face detection based methods are 15 fps and 10 fps, respectively. We can clearly see for both thresholding based and face detection based methods, parallel programming has much higher frame rate than that of the serial programming method.

FigureA.3:Thesoftwaredesignforthesystem.(a)Theflowchartoffacemaskingalgorithmusedintheprototype system.(b)Theparallelprogrammingtimeline.Subtask1findsthelocationsofthedetectedfacialregionsandwrite themtoMutex1.TheSubtask2readstheselocationsfromtheMutex1anddisplaysthemontheLCoSdevice,then writestheflagtoMutex2whichindicatingthecompletionofthemaskdisplaying.TheSubtask3continuouslyreads fromtheMutex2,iftheflagisset,itoutputtheRGBframetotheuser.Thedetaileddescriptioncouldbefoundin thetext.(c)Threedifferentmaskpatternsasexamples,whichcouldbeadoptedindifferentapplications.

A.4 The Evaluation for Privacy Enhancement in Real Scenario

Figure A.4: The postures of one participate standing in 2 m for testing the privacy enhancement in the real scenario. (a) The participate slightly rotated his body to the right. (b) The participate slightly looked up. (c) The participate slightly rotated his body to the left. (d) The participate slightly tilted his head to the right. (e) The participate looked forward. (f) The participate slightly tilted his head to the left. (g) The participate slightly looked down.

The author evaluated the privacy enhancement performance of the optical level anonymous image sensing system in a real scenario. We see the privacy enhance-ment performance of the optical level anonymous image sensing system depends on the ability to find facial regions in the thermal images. If the facial regions can be correctly found, the privacy can be enhanced. The author evaluated the thresh-olding based and face detection based methods used for finding facial regions. The

author asked the ten participants to stand in different distances from 1 m to 5 m with step of 1 m. In each distance, the author captured a video about 1 minute using both of the methods for finding the facial regions. In the capturing, the au-thor asked the participants to change the face postures by looking up, right, down and left and also rotate their bodies, the postures are shown in Figure A.4. Then the author checked the videos to analyze the privacy enhancement performance.

When using thresholding based method for finding facial regions in thermal images, it was found that faces were well protected in 100% in all the distances tested. The reason is the temperature range in facial regions is quite stable, using thresholding based method can correctly find the pixels within this temperature range. In Figure A.5 (a) the output frames from the system with the front faces from 1 m to 5 m are shown. In Figure A.5 (b) the participant stood at 5 m with rotated body is shown. The drawback of this method is that some non-facial pixels within the similar temperature range with those of facial regions were also masked. In Figure A.5 (e), this limitation is shown. We can see some skin regions such as hands, and some background objects are also masked. In conclusion, the thresholding based method is more suitable for the situations where strict privacy protection is required and the working distance is long, since it can hide all the facial regions.

When using face detection based method for finding facial regions, the best combination of mixed features introduced in Chapter 3 was used. To shorten the detection time, the downsized thermal images were used for finding facial regions.

It was found that the effective working range was around 3 m. The output frames from the system with the front faces from 1 m to 3 m are shown in Figure A.5 (c). For each distance, the recall and precision of face detection performance were measured and shown in Figure A.5 (d). From the graph we can see, like all the face detection based methods, the faces cannot be detected in 100% in the thermal images, thus faces in RGB images cannot be hidden 100% by the system.

The advantage of face detection based method is its flexibility in generating the mask shapes with a simple post processing step. In Figure A.3 (c) the different mask shapes are shown. Furthermore, the privacy enhanced images are visually tidier than thresholding based method. The reason is the unrelated objects such as hands and background objects are not masked (we can see this fact in Figure A.5 (c)). In conclusion, the face detection based method can be used in short distance applications where the visually tidiness is important or flexibility of facial masks is required.

ドキュメント内 Study on Human Related Analysis in Privacy Protected Videos (ページ 97-122)