西 南 交 通 大 学 学 报
第 55 卷 第 2 期
2020 年 4 月
JOURNAL OF SOUTHWEST JIAOTONG UNIVERSITY
Vol. 55 No. 2
Apr. 2020
ISSN: 0258-2724 DOI:10.35741/issn.0258-2724.55.2.23
Research article
Computer and Information Science
I
MPROVED
H
YBRID
S
WARM
I
NTELLIGENCE
A
LGORITHM FOR
V
OICE
C
OMMAND
R
ECOGNITION
用于语音命令识别的改进型混合群智能算法
Jamal Salahaldeen Majeed Alneamy, Ghada Mohammad Tahir Kasim Aldabagh
Software Department, College of Computers Science and Mathematics, University of Mosul Alkhatonia, Mosul, 41001, Iraq, [email protected]
Received: August 20, 2019 ▪ Review: September 18, 2019 ▪ Accepted: February 27, 2020
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0)
Abstract
Swarm intelligence involves the aggregation of the boids, which interact with each other in their own environment. The boids are agents that follow simple rules in the absence of centralized structure. The Artificial Neural Network is also known as a connectionist system that originated from the biological neural network. The aim of the present study is to improve the hybrid swarm intelligence algorithm for voice command recognition. The proposed algorithm hybrid combines the lion optimization algorithm with the fish swarm algorithm, and was improved upon using voice features. Approximately ten voice commands were recorded, including “open”, “close”, “open door”, “close door”, “open window”, “close window”, “on”, “off”, “play”, and “stop”.
Keywords:Artificial Neural Network, Swarm Intelligence, Hybrid Algorithm, Voice Recognition
摘要 群智能涉及波德的聚集,它们在各自的环境中相互影响。Boid 是在缺乏集中式结构的情况下 感知简单规则的主体。人工神经网络也被称为源自生物学神经网络的连接系统。本研究的目的是 改进用于语音命令识别的混合群智能算法。 提出的混合算法是狮子优化算法和鱼群算法,并在使 用语音功能时得到了改进。大约记录了十个语音命令,包括“打开”,“关闭”,“打开门”, “关闭门”,“打开窗口”,“关闭窗口”,“打开”,“关闭”,“播放”和“ 停”。 关键词: 人工神经网络,群体智能,混合算法,语音识别
I. I
NTRODUCTIONThe artificial neural network (ANN) is known as a connectionist system that originated from the
biological neural network. There are various applications of the ANN that include image recognition and image processing. Around 20 years ago, there was significant interest in
cellular automation. Cellular structure is a representation in group format on the basis of cellular automation, and further leads to swarm.
Swarm intelligence (SI) was introduced by Beni and Wang in 1989 [1], [2], [3]. Swarm behavior is similar to aggregative motion. Initially, only biological researchers dealt with swarm etiquette. Furthermore, engineers are placing significant interest in swarm behavior and therefore SI. SI is a similar concept, based on artificial intelligence, that combines the etiquette of decentralized and self-regimented systems, and may be artificial or natural. The SI has been applied in various fields and plays an important role in the optimization of telecommunication engineering [4], automated traffic systems, military defense system application [5], and robotic engineering [1], [6].
II. B
ASICC
ONCEPTSI involves the aggregation of the boids. The boids exist in their own environment without centralized structure, and interact with each other to follow simple rules. The concept of SI is seen in several natural examples such as the growth of bacteria, fish schooling, bird flocking, ant colonies, and many others. SI involves a generalized set of algorithms.
The swarm behavior model includes the boids and is an artificial life program that was created by C. Reynolds in 1986 [7]. It imitates flocking etiquette in birds. Separation, alignment, and cohesion are the basic rules applied in the boids’ world. Interaction between individual boids causes complexity.
SI has two important and essential properties: self-organization and labor division. Self-organization is defined as the ability of a system to release its boids (i.e. agents, or components) in an appropriate form without external help. Self-organization depends on several functions such as fluctuation, complex interaction, positive feedback, and negative feedback [8]. Feedback plays an important role in amplification and stabilization, while fluctuation is used to create randomness. Complex interaction is useful when swarms claim data between themselves, within their own searching area. The labor division property of SI is defined as the synchronous execution of different feasible and simple duties by individuals. This property empowers the swarm to route the complex problems that require individuals to work together [9].
Swarm is most favorable because of its simplicity and reliability. As compared to centralized systems, swarm requires simpler components. In systems level robotics
engineering, swarm can help in the robotics parts modularized, dominant, mass produced, interchangeable and disposable. Swarm is also highly reliable. It can be designed in such way to endure different types of disturbances. Due to redundancy, swarm is able to adapt to its working environment dynamically. Swarm acts like a massive paralleled computational system, so it carries out duties that reach beyond those of other robotic systems, such as centralized systems or complex robot systems. Thus, swarm becomes a promising asset for robotics. SI Algorithms include Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Genetic Algorithms (GA), Glowworm Swarm Optimization (GSO), Particle Swarm Optimization (PSO), Cuck-oo Search Algorithm (CSA) and Differential Evolution (DE) [9]. All aforementioned algorithms have exhibited the potential to resolve various optimization difficulties.
SI Algorithms are used in various forms of recognition software, such as image, speech, voice, etc. [10]. Basically, hybrid swarm algorithms increases the reliability and efficiency of the software. The SI frame works as follows: initialize population, define pause or stop condition, evaluate the fitness function, update and move agents and return to global best solution [11]. Many more combinations and techniques along with the basic SI algorithms result in hybrid swarm algorithms whose outcome is the best optimization [12], [13] and [14]. For example, swarm with Viterbi is used for speech or voice recognition. Optimization of hidden markov model (HMM) with the Evolutionary algorithm provides a solution to the voice tag problem. In this paper, the input voice is compared with the outwards weighted parameters. Neighboring parameters and fitness function will be analyed to obtain the best solution. PSO-Viterbi algorithm is routinely used because of its potential to provide a framework for future improvement. Feature extraction must recognize any speech or voice so we can adjust the weight (a parameter which transformed input data into impact output within neural network at hidden layers) and compute the result by comparing the original voice and the weighted voice. For clearer output, we can run the algorithm for a number of iterations varying the weighted parameter [15]. The aim of the present study is to improve the hybrid SI algorithm for voice command recognition.
Weight is the parameter within a neural network that transforms input data within the network's hidden layers.
III. M
ATERIAL ANDM
ETHODS A. Proposed AlgorithmThe proposed hybrid algorithm combines the lion optimization algorithm with the fish swarm algorithm. It improved a previous hybrid algorithm using the features of the voices.
The proposed algorithm consists of four stages, which are as follows:
First stage: In this stage, we built the database
to be used later in recognition. This stage consists of the following steps:
Step 1: The commands was given to record
the voices. In the present study, 10 voice commands were recorded. They are shown in Figure 1.
Figure 1. The database of commands voices Step 2: Record about twenty samples for each
voice command.
Second stage: In this stage, the input
command voice was recorded.
Third stage: This is known as the “feature
extraction stage.” In this stage, the features of the database and input samples were extracted.
B. Analysis of the Features
The features we calculated are as follows:
1) Summation of Each Sample Value’s Analysis
Sum value (k) = (1) where N = length of sample, K = number of sample.
2) Calculate the Standard Deviation of Each Sample Value
Standardvalue(K) = (2)
where N = length of sample, K = number of sample.
3) Calculate the Mean Features of Each Sample Value
Mean value (K) = (3)
where N = length of sample, K = number of samples.
Fourth stage: In this stage, a new version of
the hybrid algorithm applied the features used by the hybrid algorithm to recognize the command voice. The proposed algorithm used the lion algorithm to calculate the visual value for the fish algorithm.
+ rand(0 to 1) (4)
where = the new location of the current fish, = the old location of the current fish, visual = the visual distance, rand(0 to 1) =
random number generated between (0,1).
The minimum Euclidean distance fitness function is used to recognize the voice command.
IV. R
ESULTSThe proposed hybrid algorithmprovides many advantages over the fish and lion optimization SI algorithm, mainly with respect to time and accuracy of recognition. The following aspects of the proposed algorithm were tested to be compared to the first algorithm: the time needed to recognize the voice command and the accuracy of recognition of the voice command. Table 1 shows the results of the experiment conducted to determine the recognition time needed by the hybrid algorithm without features extraction compared with the fish and lion optimization algorithm.
Table 1.
Voice command Fish recognition time (sec) Lion optimization recognition time (sec) Hybrid algorithm recognition time (sec) Proposed features hybrid algorithm recognition time (sec) Open 0.004092 0.005049 0.00321 0.000395 Close 0.004172 0.005063 0.00320 0.000367 Open door 0.004998 0.005697 0.00419 0.000435 Close door 0.004999 0.005299 0.003933 0.000455 Open window 0.004938 0.005538 0.003981 0.000501 Close window 0.005001 0.005741 0.004033 0.000441 On 0.004996 0.005548 0.004012 0.000467 Off 0.005540 0.006290 0.004812 0.000441 Play 0.005006 0.005655 0.004611 0.000423 Stop 0.00302 0.00316 0.00249 0.000578
Table 1 also shows the results of the experiment conducted to determine the recognition time needed by the proposed feature hybrid algorithm compared with the standard fish and lion optimization algorithm. As shown in Table 1, the time the hybrid algorithm too to recognize the voice commands is less than the time taken by the standard fish and lion
optimization. The difference in time is small, but in the proposed hybrid algorithm is faster with large time difference than the earlier algorithm.
Table 2 presents the recognition results of each sample using the standard fish and lion optimization algorithm and the hybrid and features hybrid algorithms after removing silent moments using Matlab10.
Table 2.
Result determining the recognition time by using standard fish, lion optimization, hybrid and features hybrid proposed algorithm
Voice command Fish recognition time (sec) Lion optimization recognition time (sec) Hybrid algorithm recognition time (sec) Proposed
features hybrid algorithm recognition time (sec)
Open 0.00304 0.00319 0.00191 0.000022 Close 0.00276 0.00279 0.00120 0.000047 Open door 0.00371 0.00457 0.00419 0.0000215 Close door 0.00413 0.005198 0.003933 0.000026 Open window 0.00493 0.005538 0.003981 0.000043 Close window 0.00471 0.004941 0.00403 0.000084 On 0.00143 0.00343 0.00312 0.000023 Off 0.00240 0.00470 0.00112 0.000026 Play 0.00496 0.004995 0.00351 0.0000308 Stop 0.00102 0.00203 0.00123 0.0000251
Figure 2 to 10 represents the input voice signals command and the signals it recognized.
Figure 2. Open voice command
Figure 4. Open door voice command
Figure 5. Close door voice command
Figure 6. Open window voice command
Figure 7. Close window voice command
Figure 8. On voice command
Figure 9. Off voice command
Figure 10. Play voice command
Figure 11. Stop voice command
V. D
ISCUSSIONBeni and Wang (1989) [1] reported the swarm algorithm which focused on the uses and advantages of swarm algorithms and its importance in intelligence. He related SI to robotic engineering [1]. Garg et al. [3] reported a SI related to the ant colony optimization based on previous research.
Wahab et al. reported comparative analysis of various types of algorithms through experiments conducted using well-known benchmark functions [9]. They were explained basic steps for a generic algorithm along with the brief and diagrammatic manner. They carried out many statistical tests to determine the significant performances breifly. The results were reported in the diagrammatic manner to explain the overall advantage of differential evolution (DE) and particle swarm optimization (PSO) [9]. The artificial ant colony and its optimization were considered. The particle swarm algorithm, cuckoo search algorithm, and glowworm swarm optimization were also explained with examples [9]. Some researchers reported analysis which is inspired from nature like ant and bee colonies along with the application of many SI algorithms [16]. Much research was conducted on SI–based algorithms [16]. From last two decades, near about 9000 research studies were available on insects and animal-based algorithms. Some reports were published on nature-inspired optimization algorithms [17].
Speech recognition has a higher complexity (
recognition of input voice and translation of
spoken language into text
) and a broad application range. Shukla et al. [18] explained speech recognition as predictions were obtainedoptimization techniques to redesign artificial
neural network
. They compared three different algorithms to check their performance for the highest accuracy of voice recognition. They all produced good results with an average 95.3% accuracy. A similar report was published by Kaur et al. on voice recognition [19]. Our results are in accordance with these reports.Swarm algorithms based on HMM have been reported in the literature [20]. These reports
discussed improved particle SI algorithms for enhanced voice recognition. They suggested a novel voice recognition technique based on a particle SI algorithm and vector quantization. They produced better results than those generated by Shukla et al. Their accuracy was 97.14%. Some researchers obtained 97.8% accuracy in speech recognition using an advanced feature extraction technique along with the particle SI algorithm [21].