西 南 交 通 大 学 学 报
第 54 卷 第 6 期
2019 年 12 月
JOURNAL OF SOUTHWEST JIAOTONG UNIVERSITY
Vol. 54 No. 6
Dec. 2019
ISSN: 0258-2724 DOI:10.35741/issn.0258-2724.54.6.31
Research Article
Computer Science and Image Processing
H
ISTOGRAM OF
G
RADIENT AND
L
OCAL
B
INARY
P
ATTERN WITH
E
XTREME
L
EARNING
M
ACHINE
B
ASED
E
AR
R
ECOGNITION
基于极端学习机的耳朵识别的梯度和局部二元模式的直方图
Ahmed Kawther Hussein
Department of Computer Science, College of Education, Mustansiriyah University Palestine St., P.O. Box: 14022, Baghdad, Iraq
Abstract
The ear recognition system is an attractive research topic in the area of biometrics. It involves building machine learning models to verify the identities of humans using their ears. In this article, an exploration of the performance of ear recognition using two features - local binary pattern and histogram of gradient - has been done using the famous dataset USTB. The finding is that there is a similarity in the performance of these two features in terms of accuracy with a difference in the number of false predictions. The achieved accuracy of the histogram of gradient based extreme learning machine was 99.86% while for local binary pattern based extreme learning machine it was 99.59%.
Keywords: Biometrics, Ear Recognition, Extreme Learning Machine, Local Binary Pattern, Histogram of
Gradient 摘要 耳朵识别系统是生物识别领域中一个有吸引力的研究主题。它涉及建立机器学习模型,以验 证人耳的身份。在本文中,已经使用著名的数据集美国旅游局进行了使用两个特征(局部二进制 模式和梯度直方图)的耳朵识别性能的探索。发现是,在准确性方面,这两个功能的性能相似, 但错误预测的数量有所不同。基于梯度的极限学习机的直方图的实现精度为 99.86%,而对于基于 本地二进制模式的极限学习机,则为 99.59%。 关键词: 生物识别,耳朵识别,极限学习机,局部二值模式,梯度直方图
I. I
NTRODUCTIONHuman identification is crucial for the operations of many systems in today’s technologies and services. A whole field that concerns technologies that can perform human identification has emerged called biometrics [1].
In biometrics, various methods are used for human identification: fingerprints, voice, iris of the eyes, gait, etc. The reason for the special interest in biometrics is the practicality for consumers and difficulty to hack in comparison to passwords and cards [2]. One of the most
challenging and unique applications of human identification is ear based identification. The ear has unique physiological and structural aspects. For example, in surveillance systems, detection and identification of an ear is more feasible than an eye. Another aspect of application of ear recognition is one can do twin identifications [3]. In some models, ears and other parts of the body or the human characteristics are fused to enable multi-model based human identification [2], [3], [4].
Development of an ear based identification system has various phases starting from the phases of image processing, ear detection, feature design and extraction, and classifiers building. Each phase has its challenges. Image pre-processing is concerned with image enhancement or dealing with parts of the body or other things that are blocking the ear such as hair or earrings [5], and dealing with illumination changes [12]. Ear detection is also crucial to determine the region of interest (ROI). Ear recognition involves machine learning approaches [6]. Therefore, feature design and extraction is regarded as core of any work related to ear identification [7]. Classifiers building and training are top priorities in systems of ear recognition [1].
The literature contains a wide range of features for vision based recognition systems. Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) are two types of them that are very famous and have good performance for many applications in pattern recognition [8]. In this article, we use HOG and LBP for ear based identification.
II. R
ELATEDW
ORKSThe literature contains numerous approaches that were developed for ear based identification. In the work of [1], the convolutional neural network was designed for the problem of human ear recognition and was based on a deep network topology. This approach selected the optimal activation function. The goal is preventing over-fitting. In order to detect the number of feature graphs, the configuration of the learning rate with the other parameters in the network is a training process done in the network model. Lastly, the human ear recognition test is conducted on the trained network model.
In the work of [9], it presented the local color texture descriptors and evaluated them in comparison to several color spaces. The Support Vector Machine (SVM) was a classifier. The authors have concluded that BSIF-RGB show promising performance.
A pipeline for ear recognition was proposed by [10]. The pipeline is composed of Ear Detection, Ear Feature Extraction, and Matching. The pipeline is similar to a complete pattern recognition pipeline. However, the pipeline approach makes it likely to employ the arbitrary images of subjects taken in a different environment and look at the subjects based only on ears. In the work of [11], an approach for deciding whether an ear is excluded or not has
been done using block based principle
component analysis (PCA) to recognize the subject ear. Another work is the work of [12], the convolutional neural network, which is called Faster R-CNN, to detect an ear using profile images in 2D based on a multiple scale faster region. In the work of [13], eight types of features are extracted using a Local Binary Pattern, Local Phase Quantization, Binaries Statistical Image Features, Patterns of Oriented Edge Magnitudes, a histogram of gradient, different scale invariant feature transformation and Gabor. They have provided a tool for extraction of these features from any ear dataset. In the work of [14], the usage of light field cameras for ear recognition has been proposed, and the authors have argued that light field cameras are being commercialized and the proposed ear recognition is based on that is needed. Thus, they have combined 536 light field images with Lenslet Light Field Ear using a database for 67 objects in 4 different poses taken by a Lytro ILLUM lenslet light field camera. The Lenslet Light Field Ear Database contains stringent cases like ear images partly occluded by ear piercing, earrings, hair, or a combination of these. The ear recognition solution based on a novel light field suggests that it takes advantage of the richer spatio-angular information.
III. M
ETHODOLOGYOur goal is to compare the performance of two classifiers for ear recognition: one trained on HoG features while the other trained on LBP features. In order to perform this, an extreme learning machine classifier was used. This
section provides the methodology for
implementing this study.
A. LBP Features
Local binary pattern is one type of image feature that concentrates on extracting the texture information in an image [16]. In an LBP, a binary code is generated for each pixel to represent the relative change in the intensity between the pixel and its surrounding pixels. Next, the frequency of occurrence of each binary pixel is represented by
a histogram. The histogram provides a compact representation of the textual patterns in the image. There are several variants of LBP: typical LBP, uniform LBP, and completed LBP.
B. HoG Feature
It is a histogram of the gradient information of the orientation in the zone. The zone might be described by either Cartesian or Polar coordinates. Next, the rotation differences in the stroke within the zone is enabled using normalization. The operation of normalization is applied using the higher gradient orientation, which is used to represent the first bin in the histogram. Such normalization is important to enable good matching between both the original histogram and the matched one.
C. Extreme Learning Machine (ELM)
This involves training one hidden layer of feed-forward neural networks using the concept of least square mean error [14]. It starts with random initialization of the input hidden layer, then the output hidden matrix is found. Then the weights of the hidden output layer are found using the Moore-Penrose inverse. This approach is better than the usual way that uses the concept of a gradient. This is because of fast training and more avoidance of local minima.
IV. P
ROCEDURE FORB
UILDING ANE
ARB
ASEDR
ECOGNITIONS
YSTEMWe state in this section the procedure for building an ear based recognition system using LBP and HoG. The procedure is provided in Table 1.
Table 1.
Procedure of building an ear based recognition system
Input
Dataset
The Number of hidden neurons The activation function
Output
Accuracy
Starts
1. The data divide into training and testing 2. Extract HoG and LBP
3. Normalize
Train an ELM based on HoG, we call it ELM- HoG using (the number of hidden neurons and activation function)
Train and ELM based on LBP, we call it ELM-LBP using (the number of hidden neurons and activation function)
4. Test extreme learning machine - HoG using testing data
5. Test extreme learning machine -LBP using testing data
6. Return testing accuracy
End
V. D
ATASETThe dataset includes 180 images of 60 subjects, both students and teachers from USTB [15], from 3 sessions in July and August 2002. The database includes images of the right ear from each subject. In each session, the images were possessed under different lighting and in conditions with a different rotation. We show examples of some images from the data in Figure 1.
Figure 1. Example of four images from USTB dataset
VI. R
ESULTS ANDD
ISCUSSIONIn order to evaluate both HoG and LBP features, an ELM with a number of neurons equal to 10000 was created. The sigmoid function was used as an activation function. The predicted ELM values were compared with ground truth for each of the types of features. The results are shown in Figures 2 and 3. In addition, we show the confusion matrix in Figures 4 and 5. It was observed that the accuracy of HoG at 99.83% was superior to LBP with 99.87%. From the confusion matrix, we observed that the HoG predictions matched LBP predictions in one of the false identifications out of the total number of false identifications. For further exploration, we plotted the accuracy of 10 experiments on LBP and HoG features with the corresponding mean
values in Figure 6. It turned out that the mean value of the accuracy of HoG is slightly bigger than LBP. In order to determine whether there is statistical significance, we conducted a t-test between the two hypotheses. The results of the t-test was 8.05974E-08. This indicates that HoG is superior over LBP. 0 10 20 30 40 50 60 subject 0 10 20 30 40 50 60 p re d ic ti o n v s . tr u th predicted ELM lbp truth
Figure 2. The predicted values vs. ground truth for ELM trained on LBP 0 10 20 30 40 50 60 subject 0 10 20 30 40 50 60 p re d ic ti o n v s . tr u th
predicted ELM hog truth
Figure 3. The predicted values vs. ground truth for ELM trained on HoG
Figure 4. The confusion matrix for extreme learning machine and LBP
Figure 5. The confusion matrix for extreme learning machine and HoG
0 10 20 30 40 50 60 70 80 90 100 0.993 0.994 0.995 0.996 0.997 0.998 0.999 lbp hog meanlbp meanhog
Figure 6. The accuracy of 10 experiments for LBP and HoG features based ELM
VII. C
ONCLUSIONEar based recognition is to identify people based on their ear images. It attracts researchers because of its potential for high accuracy and feasibility of deployment in various environments such as security systems. In this article, we found that both HoG-ELM and LBP-ELM are highly effective and have accurate feature and classifiers combinations. The resulted accuracy based on USTB features is more than 99%. Future work is to explore the performance of other types of features and to use 3D images for ears as input.
A
CKNOWLEDGMENTThe author would like to thank the
Mustansiriyah university
(www.uomustansiriyah.edu.iq) Baghdad - Iraq for its support in the present work.