# Hardware Implementation of Particle Swarm Optimization and its Application for Adaptive Signal Processing Waseda University Doctoral Dissertation

全文

(2)

(3) Abstract As one of the most well-known optimization algorithm, Particle Swarm Optimization (PSO) attracts more and more attention and presents its talented potential in recent years. Strongly nature-inspired by the swarm intelligence, PSO, proposed by Kennedy and Eberhart in 1995 for the first time, is conceived as a simulation of animal flocking, learning and sharing information when a group of insects or birds seek for food in a search space. Owing to the feature of group communicating and learning like animals, PSO is first used to machine learning field in the early stage. In pace with the algorithm improving and the superior performance springing up, PSO is popularly introduced to solve complex nonlinear problems, such as data mining, biometric, circuit synthesis, economic analysis, power system optimization and positioning system, starting after 2000. Recently, in an attempting to employ PSO in above system with real-time requirements, its hardware implementation with high calculating speed and high stability based on large scale integration (LSI) goes to a significant subject. Especially, as the PSO application on adaptive signal processing and mobile system becomes worldwide attention-focused, a general purpose hardware architecture which can deal with complex nonlinear issue is sincerely expected. First, the processing speed is very important for real-time system. Furthermore, the stability is also a necessary performance. Existing PSO software is too time-consuming and cannot provide the real-time processing speed. To meet this challenge, an attractive subject is to implement PSO in. i.

(4) hardware. Thus, a novel hardware implementation of PSO with high speed and stability is sincerely desired. Second, the conventional 2-dimension PSO cannot satisfy real-time application for complex nonlinear control. Thus, a hardware implementation of multi-dimension PSO is urgently needed to control the multi-parameter of complex. application.. A. general. purpose. hardware. architecture. of. multi-dimension PSO is expected in this subject as a fundamental hardware platform. It can create a new solution for complex nonlinear control with real-time requirements. Third, as a classic subject of complex nonlinear control, adaptive filter has flexibly changeable coefficients to be regulated. Conventional algorithms of adaptive filter are difficult to be realized in hardware with a high adaptive speed. Therefore, the proposed hardware of multi-dimension PSO are applied for adaptive filter as a particular application and pioneer design. Chapter 1 [Introduction] This chapter briefly introduces the concept and the background of the PSO. The current research status and development of PSO are presented as an outstanding evolutionary algorithm. However, there still are some challenges for PSO development. Then, the objective and scope of this research are indicated in this chapter. Three subjects are proposed for facing the challenges. Chapter 2 [A hardware architecture of particle swarm optimization for real-time control] PSO with random time-varying inertia weight and acceleration coefficients (PSO-RTVIWAC) proposed in 2009 has better convergence and higher accuracy than the conventional PSO algorithms. In this. ii.

(5) chapter, as the first. subject, a novel hardware implementation of. PSO-RTVIWAC is developed to raise the processing speed for real-time system. The proposed architecture incorporates two features to improve the processing speed and the calculation stability with an acceptable chip area cost. One feature is a pipeline approach to realize parallel computation. The other is a synchronous module control. By implementing the LSI of proposed pipeline architecture on a field programmable gate array (FPGA), the calculation speed for one particle (76.9ns) is 1,039 times faster than the software implementations, outperforming the serial hardware implementation proposed in previous work by 17 percent. The chip cost of the hardware is about 47,664 gates. The system error of previous hardware is completely eliminated. The proposed architecture achieved not only high processing speed but also high stability. This novel subject provides a high performance PSO hardware for real-time system. Chapter 3 [A hardware implementation of multi-dimension PSO for complex nonlinear control] In order to meet the complex nonlinear application having multi-parameters to be controlled, this chapter further proposes a basic architecture design for establishing a general purpose hardware of multi-dimension PSO as a fundamental platform and implements the proposed hardware in FPGA. Based on the architecture proposed in Chapter 2, the PSO hardware for each dimension is paralleled to realize the multi-dimension calculation. One feature of this approach is to provide the coefficients for multi-dimension calculation by building memories and recycling the prestored values. The other feature of this multi-dimension PSO hardware is to employ the proposed pipeline technology in the selective module. The proposed. iii.

(6) hardware architecture achieves flexible scalability which can make it possible to realize the dimension extending. An 8-dimension PSO based on the proposed hardware design is implemented into Altera Cyclone II (FPGA) as an example. The chip cost includes 1,229 logic elements (Gate Count: 14,748) and 834,016 bits memory (Gate Count: 3,336,064), which is reasonable for 8 dimension extending. The calculation for one particle can be completed within 77.0ns. The proposed hardware can be employed to solve complex nonlinear problems with real-time requirements due to its high accuracy and processing speed. This hardware design is a general purposed hardware implementation of multi-dimension PSO and suitable for particular nonlinear applications with different dimension. Chapter 4 [Implementation of the multi-dimension PSO for adaptive FIR filter development] The conventional adaptive filter approaches are too complex to be implemented in hardware and the calculation speed is not satisfactory. The control of filter coefficients is a classical multi-parameters issue and the PSO dimension number can be flexibly adjusted according to the order of filter. Thus, the general purposed hardware implementation of multi-dimension PSO presented in Chapter 3 is employed to design an adaptive FIR filter as an example of complex nonlinear application in this chapter, aiming to achieve a high updating speed of filter coefficient. The adaptive filter based on 8-dimension PSO as an example in Chapter 3 is implemented in FPGA. By slightly adding fitness calculation (1.8% of the entire system), the hardware architecture can provide a much faster update speed than software and conventional approach. The coefficient update rate of the proposed design. iv.

(7) reaches 92.5 KHz (11µs), which is about 4,625 times faster than least mean squares (LMS) algorithm implemented by software and 27.2 times faster than recursive least squares (RLS) algorithm implemented by dedicated hardware. The mean square deviation between hardware results and MATLAB software results is 5.4×10-5. This adaptive filter presents superior performance for real-time requirement. The above results prove that the proposed research can provide a high coefficient updating speed to control the filter performance. This successful application verifies that the proposed multi-dimension PSO hardware can effectively work for adaptive signal processing as a complex nonlinear application. Chapter 5 [Conclusion] In this chapter, the conclusion is described and the future work is drawn for the research. This research concentrates on the PSO hardware implementation with high performance and its application for adaptive signal processing. The first subject presented a novel hardware implementation of PSO with high speed and stability for real-time control system by using pipeline technology and synchronous module control. In second subject, a general purpose hardware architecture of multi-dimension PSO with flexible scalability are proposed as a fundamental platform. In the third subject, an adaptive FIR filter design based on the proposed hardware platform is introduced as the particular application and pioneer design. These proposals are favorable contributions for LSI implementation of PSO and its application for adaptive signal processing area and mobile system.. v.

(8) Table of Contents Abstract.......................................................................................................... i Table of Contents ......................................................................................... vi List of Figures............................................................................................... x List of Tables .............................................................................................xiii Chapter 1 Introduction .................................................................................. 1 1.1 Background and motivation ......................................................... 1 1.1.1 Research background .......................................................... 1 1.1.2 Particle swarm optimization (PSO) .................................... 3 1.1.3 Challenges and motivation .................................................. 7 1.2 Objectives and outline of the thesis ........................................... 11 Chapter 2 A Hardware Architecture of Particle Swarm Optimization for Real-time Control............................................................................... 14 2.1 Overview .................................................................................... 14 2.2 PSO-RTVIWAC and position estimate ...................................... 16 2.2.1 PSO and PSO-RTVIWAC ................................................. 16 2.2.2 Position estimation ............................................................ 19 2.3 Serial PSO hardware in previous work ...................................... 20 2.4 Proposed hardware with synchronous module control and pipeline architecture ........................................................................... 23 2.4.1 Synchronous module control ............................................ 26. vi.

(9) 2.4.2 Control unit ....................................................................... 28 2.4.3 Pipeline approach .............................................................. 32 2.5 Experimental results and comparison based on FPGA .............. 36 2.5.1 Calculation time ................................................................ 38 2.5.2 Chip cost ........................................................................... 40 2.5.3 Positioning stability and accuracy..................................... 41 2.6 Summary .................................................................................... 43 Chapter 3 A Hardware Implementation of Multi-dimension PSO for Complex Nonlinear Control ............................................................... 45 3.1 Overview .................................................................................... 45 3.2 Multi-dimension PSO ................................................................ 47 3.3 Hardware design for multi-dimension PSO with flexible scalability ........................................................................................... 49 3.3.1 Hardware architecture based on integrated modules ........ 49 3.3.2 Particle calculating block .................................................. 53 3.3.3 Random coefficients ROM ............................................... 55 3.3.4 Initial ROM and intermediate RAM ................................. 57 3.3.5 Flexible scalability of proposed hardware architecture .... 59 3.4 Hardware synthesis and analysis based on FPGA ..................... 59 3.4.1 Timing analysis ................................................................. 61 3.4.2 Chip cost ........................................................................... 64. vii.

(10) 3.4.3 Hardware accuracy............................................................ 66 3.5 Summary .................................................................................... 66 Chapter 4 Implementation of the Multi-dimension PSO for Adaptive FIR Filter Development ............................................................................ 68 4.1 Overview of conventional adaptive algorithm ........................... 68 4.2 Adaptive FIR filter design based on multi-dimension PSO....... 71 4.2.1 Adaptive FIR filter system design ..................................... 71 4.2.2 Adaptive algorithm based on multi-dimension PSO ......... 74 4.2.3 Improvement of initialization ............................................ 76 4.3 Improved adaptive algorithm based on multi-dimension PSO .. 77 4.3.1 Fitness function improvement ........................................... 78 4.3.2 Convergence speed analysis .............................................. 82 4.4 A flexible control for adaptive FIR filter ................................... 85 4.4.1 A flexible control design of adaptive FIR filter ................ 85 4.4.2 Comparisons of different pass band control ..................... 90 4.5 Implementation of adaptive FIR filter based on proposed multi-dimension PSO hardware ......................................................... 97 4.5.1 The basic architecture based on proposed multi-dimension PSO hardware ............................................................................. 97 4.5.2 The architecture of fitness calculation for adaptive FIR filter ........................................................................................... 99. viii.

(11) 4.5.3 The structure of FIR filter module .................................... 99 4.6 Implemental results and comparison based on FPGA ............. 101 4.6.1 Hardware synthesis results .............................................. 104 4.6.2 Filter coefficients update rate .......................................... 105 4.6.3 Chip cost ......................................................................... 108 4.6.4 Hardware accuracy.......................................................... 109 4.7 Summary .................................................................................. 111 Chapter 5 Conclusion and Future Work ................................................... 114 5.1 Conclusion ............................................................................... 114 5.2 Future work .............................................................................. 118 Acknowledgments .................................................................................... 120 References ................................................................................................ 121 Publications .............................................................................................. 129 Academic journals ............................................................................ 129 International conferences .................................................................. 130. ix.

(12) List of Figures Figure 1.1: Searching and communicating in PSO solution space .............. 2 Figure 1.2: Particle searching and converging to target ............................... 2 Figure 1.3: The significant function of PSO for solving nonlinear problem . ................................................................................................. 5 Figure 1.4: The development of PSO applications with its performance requirements .............................................................................. 5 Figure 1.5: Particle searching and converging to target ............................... 7 Figure 1.6: Hardware implementation instead of software .......................... 8 Figure 1.7: Multi-dimension PSO ................................................................ 9 Figure 1.8: Improvement of PSO for higher hardware performance ......... 10 Figure 2.1: Position estimate in PSO-RTVIWAC ...................................... 20 Figure 2.2: PSO hardware structure in previous work ............................... 21 Figure 2.3: PSO hardware structure in this research .................................. 25 Figure 2.4: Asynchronous module control approach in previous work ...... 27 Figure 2.5: Synchronous module control approach in this research .......... 27 Figure 2.6: Control unit design and state machine ..................................... 29 Figure 2.7: Comparison of control mechanisms......................................... 31 Figure 2.8: Two stages of pipeline.............................................................. 33 Figure 2.9: Final six stages of pipeline ....................................................... 34. x.

(13) Figure 2.10: Dataflow in pipeline architecture ........................................... 35 Figure 2.11: Pipeline and state machine ..................................................... 36 Figure 3.1: 2-dimension searching space ................................................... 48 Figure 3.2: Multi-dimension searching space ............................................ 48 Figure 3.3: Hardware architecture of multi-dimension PSO ...................... 50 Figure 3.4: Basic concept for multi-dimension architecture ...................... 52 Figure 3.5: Integrated modules for multi-dimension PSO ......................... 52 Figure 3.6: Multi-dimension PCB .............................................................. 54 Figure 3.7: ROM for storing the random coefficients ................................ 56 Figure 3.8: Recycle the values in the ROM for dimension increasing ....... 57 Figure 3.9: Initial ROM and intermediate RAM ........................................ 58 Figure 3.10: Integrated module design for flexible scalability................... 60 Figure 3.11: Hardware RTL of 1-dimension PSO ...................................... 62 Figure 3.12: Hardware RTL of 8-dimension PSO ...................................... 62 Figure 3.13: Simulation result of timing analysis for ROM and PCB ....... 63 Figure 3.14: Simulation result of timing analysis for RAM ....................... 63 Figure 3.15: Chip cost of the entire system ................................................ 65 Figure 4.1: Adaptive FIR filter ................................................................... 72 Figure 4.2: Update coefficients according to different requirements ......... 72 Figure 4.3: Adaptive FIR filter system based on PSO................................ 74 Figure 4.4: Filter coefficient optimization based on multi-dimension PSO. xi.

(14) ................................................................................................. 75 Figure 4.5: Improved fitness function based adaptive FIR filter system ... 78 Figure 4.6: Improving approach of fitness function ................................... 81 Figure 4.7: Convergence characteristic of proposed approach ................... 84 Figure 4.8: Convergence characteristic of proposed approach ................... 84 Figure 4.9: Band limiting design of adaptive FIR filter ............................. 87 Figure 4.10: Simulation results of band pass response............................... 93 Figure 4.11: Simulation results of band stop response ............................... 95 Figure 4.12: Hardware architecture of multi-dimension PSO for adaptive FIR filter .................................................................................. 98 Figure 4.13: The implementation of fitness function in FCB................... 100 Figure 4.14: The pipeline technology employed in FCB ......................... 100 Figure 4.15: The hardware structure of FIR filter .................................... 101 Figure 4.16: Synthesis result of FCB ....................................................... 104 Figure 4.17: Update speed of proposed hardware for low pass response. 105 Figure 4.18: Update speed of proposed hardware for band pass response ............................................................................................... 106 Figure 4.19: Achieved performance of proposed approach ...................... 111. xii.

(15) List of Tables Table 2.1:. Comparison of hardware implementation .............................. 26. Table 2.2:. PSO-RTVIWAC parameters ................................................... 37. Table 2.3:. Comparison of calculation time.............................................. 39. Table 2.4:. Comparison of chip cost ......................................................... 41. Table 2.5:. Comparison of positioning stability and accuracy ................. 42. Table 3.1:. Speed comparison between different approaches ................... 64. Table 3.2:. Chip cost of each module ....................................................... 65. Table 4.1:. Simplification of fitness function ........................................... 80. Table 4.2:. Parameters of proposed adaptive FIR filter based on PSO ........ ................................................................................................ 83. Table 4.3:. Methods of band pass response .............................................. 89. Table 4.4:. Methods of band stop response .............................................. 90. Table 4.5:. Parameters of PSO and FIR filter ........................................... 91. Table 4.6:. System processing speed of different methods....................... 96. Table 4.7:. Parameters of adaptive filter based on PSO-RTVIWAC algorithm............................................................................... 102. Table 4.8:. Synthesis results of PSO algorithm module generated in Quartus II .............................................................................. 103. Table 4.9:. Coefficients update rates of proposed adaptive filter ........... 107. xiii.

(16) Table 4.10: Coefficients update rates of different adaptive algorithms ....... ............................................................................................. 108 Table 4.11: Chip costs of proposed hardware implementation .............. 109. xiv.

(17) Chapter 1 Introduction 1.1. Background and motivation. 1.1.1. Research background Many nonlinear problem of real world can be translated into the case of. optimization which is difficult to be solved by conventional optimization algorithms with complex calculation. As the nonlinear problems of various fields are more and more complex, optimization becomes one of the most important topics in engineering, computer science, economics, management, and many other disciplines. Optimization algorithm mainly acts to obtain the best result from a solution set by given situation. Conventional optimization algorithms perform intolerably complex calculation when meeting the nonlinear optimization. As the development of the electronics and information industry, the real-time requirement for high processing speed becomes more and more sincere.. 1.

(18) Figure 1.1: Searching and communicating in PSO solution space. Figure 1.2: Particle searching and converging to target. 2.

(19) 1.1.2. Particle swarm optimization (PSO) As one of the most well-known optimization algorithm, Particle Swarm. Optimization (PSO) attracts more and more attention and presents its talented potential in recent years. Strongly nature-inspired by the swarm intelligence, PSO, introduced by Eberhart and Kennedy in 1995 [1], is conceived as a simulation of animal flocking, learning and sharing information when a group of insects or birds seek for food in a search space. For example, a group of bees find flower. The bees move in the search space and share individual knowledge about the space. From the nature of the social behavior, if any member of bees can find out a desirable path or nearest position to the flower, it will share this information to the other bees, as described in Fig. 1.1. The Then, the rest of the members will follow quickly. This process will be repeated many times. Finally, the flower will be found as presented in Fig. 1.2. PSO algorithm is a novel solution to the complex non-linear optimization problem. In many nonlinear issues, the calculations of conventional methods are too complicated to address the practical requirements. In order to get optimal value with minimum cost, PSO have been applied to a large variety of practical nonlinear problem. It can be employed to solve nonlinear optimization with complex setting. In an extreme case, the problem is just a black box to simulations or evaluations of candidate solutions. PSO can play an important role when there is not enough calculation time or knowledge. PSO tends to find good solutions in potential solution space by far less time than it would take for an algorithm to compute the optimal solution, especially if the function of problem is non-linear and complex. It uses simple searching instead of. 3.

(20) complicated calculation as well as ability to swiftly converge to the optimum with few calculation and parameters to adjust. The significant function of PSO for solving complex nonlinear problem instead of conventional calculation is introduced in Fig. 1.3. The swarm of PSO can be envisioned as multiple bees or birds as particles that search for the best food source as optimum by using their inertia, their knowledge and swarm communication. Each single particle behave in the similarly regulation. Particles communicate and learn from each other about the searching space. Each position of searching apace represents a solution. Starting with a randomly initialized swarm and moving in randomly directions, each particle search in the solution space and take the form of a global best result and personal best result. The personal best value is the best objective value the particle ever experienced. Meanwhile, the personal best position is also remembered. The global best value is the best objective value the whole swarm ever experienced. A set of fitness function which describes the constraints of the searching space evaluates the best value. Once a global best value is figured out, all the particle will start to move to this position. For next step, the core concept of PSO is the calculation of particle velocity according to the personal best value and global best value, which takes place after the evaluation of current positions. With the new velocity, the particle will move to the next position. The best values will be evaluated again and updated. This iteration process continues until the particle swarm converges to an optimal target.. 4.

(21) Figure 1.3: The significant function of PSO for solving nonlinear problem. Figure 1.4: The development of PSO applications with its performance requirements. 5.

(22) There other conventional algorithms are inspired by natural evolution and swarm intelligence based. The most well-known algorithms are genetic algorithm (GA), simulated annealing (SA), ant colony optimization (ACO) and particle swarm optimization (PSO). Genetic Algorithm (GA) is the one of the most widely known paradigm in evolutionary algorithms. The GA uses using three operators: selection, crossover and mutation. For many applications, the GA is relatively complicated and results in an intolerable processing time with complex operations [2][3]. SA does not work with stability and escapes from local optima by occasional jumps without effective control [2][4]. ACO proposed in 1991[5] presented better performance, but it is easy to fall into local optimum problem. Compared with GA, PSO does not need too complicated operation. PSO is more stable than SA and ACO. It can produce better results in a cheaper way than other algorithms, and has much faster convergence speed. Owing to the feature of group communicating and learning like animals, PSO is first used to machine learning field in the early stage [6]-[8]. In pace with the algorithm improving and the superior performance springing up, PSO is popularly introduced to solve complex nonlinear problems, such as data mining [9]-[12], biometric [13]-[16], circuit synthesis [17]-[20], economic analysis [21]-[24], power system optimization [25]-[28] and positioning system [29]-[31], starting after 2000. The development of PSO applications with its performance requirement is presented in Fig. 1.4. Recently, in an attempting to employ PSO in above system with real-time requirements, its hardware implementation with high calculating speed and high stability based on large scale integration (LSI) goes to a significant subject.. 6.

(23) Especially, as the PSO application on adaptive signal processing [32]-[33] and mobile system [34]-[35] becomes worldwide attention-focused, a general purpose hardware architecture which can deal with complex nonlinear issue is sincerely expected. 1.1.3. Challenges and motivation However, there still are several challenges that conventional research did. not solved when PSO is employed into wider applications.. Figure 1.5: Particle searching and converging to target. 7.

(24) Figure 1.6: Hardware implementation instead of software. First, the processing speed and stability is very important performance for real-time system. Current PSO software is time-consuming and fails to solve complicated problems within a strict timing constraint. For many real-world fields such as positioning system and signal processing, the processing speed of software is too slow to meet the real-time requirements. For example, in positioning system, the requirement of position calculating speed is much higher when the objective moves in a high speed. The software has to update the position after every dozens of iterations, which cannot catch the current right position of the target, as shown the example (a) in Fig. 1.5. Another example (b) is signal processing system which also requires high processing speed. Serial hardware architecture proposed in previous work cannot satisfy the requirement of speed and stability. Thus, a novel hardware implementation of PSO with high. 8.

(25) speed and stability is proposed in this research. To meet this challenge, an interesting subject is to implement PSO in hardware. The hardware testing based on field programmable gate array (FPGA) is desired to achieve stabile and high speed hardware. Thus, a novel hardware implementation of PSO with high speed and stability is a target of this research, as described in Fig. 1.6. Second, the conventional 2-dimension PSO can only meet some simply application but cannot satisfy real-time application for complex nonlinear control such as signal processing. In complex control system, there generally are more than two parameters are waiting for controlling. Thus, a multi-dimension PSO and its hardware implementation are urgently needed to control the multi-parameter of complex application. This subject focuses on a multi-dimension PSO and its hardware implementation. A general purpose hardware architecture of multi-dimension PSO is proposed as a fundamental platform. It creates a new solution for complex nonlinear control with real-time requirements. This subject is presented in Fig. 1.7. Figure 1.7: Multi-dimension PSO. 9.

(26) Figure 1.8: Improvement of PSO for higher hardware performance. Third, conventional adaptive algorithm is too complex to realize in hardware with a high adaptive speed. A high update speed of PSO for adaptive filter coefficients is also expected as a real-time requirement of signal processing. The regulation of adaptive filter is a classic multi-parameter control with flexibly changeable coefficients number. Thus, a hardware implementation of multi-dimension PSO which can meet the real-time signal processing is urgently needed. However, the complexity of PSO hardware will become rising when the dimension number is increasing. Therefore, an improvement of PSO algorithm for higher implementation performance is expected, which is introduced. Therefore, the proposed multi-dimension PSO hardware is employed to design an adaptive filter as a particular application. The hardware testing based on FPGA is desired aiming to achieve high performance hardware. This subject is introduced in Fig. 1.8.. 10.

(27) 1.2. Objectives and outline of the thesis The thesis will concentrate on the introduction and description of research. on PSO hardware implementation and its application for adaptive signal processing. In recent years, the PSO has experienced significant developments. But there are still some challengers in utilizing this novel technology to variable applications. One particular challenge is the requirement of higher processing speed, and the other is multi-parameter control for complex nonlinear application such as adaptive signal processing based on LSI implementation. There is a growing demand for multi-dimension PSO hardware implementation that can provide satisfactory performances for these requirements. Considering about the above challenges, three subjects are proposed and presented in chapter 2, 3 and 4, respectively. Subject 1 presents a novel hardware implementing method for speeding up the PSO processing speed and stability. Subject 2 proposes a multi-dimension PSO and its hardware implementation for complex nonlinear control with real-time requirement as the first trial. Subject 3 employs the proposed multi-dimension PSO and its hardware for adaptive filter design as a pioneer design. The PSO calculation speed on conventional software platform cannot meet the high speed of real-time processing. Serial hardware cannot satisfy the requirements. Chapter 2 introduces a novel hardware implementation of PSO with random time-varying inertia weight and acceleration coefficients (PSO-RTVIWAC) with pipeline architecture to raise the processing speed for real-time system instead of software. The proposed architecture includes two features to improve the processing speed and the calculation stability with an. 11.

(28) acceptable chip area cost. One of features is a synchronous module control. Control unit play a significant role in the module control. The other is a pipeline approach to realize parallel computation for different stages. The PSO computing can be divided into 6 stages according to the hardware operation. By implementing the proposed pipeline architecture on a FPGA, PSO can perform much faster than the software implementations and outperform the hardware implementation with serial operation of previous research. The experimental results prove that it can deal with optimization works with high stability and processing speed for real-time applications. This novel subject is successful completed and provide a high performance solution for PSO hardware design. In. order. to. meet. the. complex. nonlinear. application. having. multi-parameters to be controlled, a multi-dimension PSO is proposed based on the PSO-RTVIWAC in Chapter 3. However, as the dimension number of PSO increasing, the calculation become very complex and the processing speed sharply decline. In order to fulfill the rigid real-time condition, an implementation way with high speed is expected. As a result, a hardware implementation of multi-dimension PSO is sincerely expected. Thus, this chapter further proposes a hardware implementation of PSO for the real-time complex optimization with flexible scalability as a fundamental platform. The entire hardware architecture has several modules. Each module is introduced elaborately. The synchronous module control is also employed as well. In addition, by dividing the communal module, separating the independent module and proposed ROM mapping method, a flexible scalability can be achieved. An 8-dimension PSO based on the proposed hardware design is implemented in. 12.

(29) FPGA as an example, which can provide a faster calculation speed than conventional software and hardware. This hardware design can meet for general purpose when meeting the different complex nonlinear application. The conventional adaptive filter approaches re too complex to be implemented in hardware and the calculation speed is not satisfactory. The control of filter coefficients is a classical multi-parameters issue and the PSO dimension number can be flexibly adjusted according to the order of filter. Thus, Chapter 4 illustrates an adaptive FIR filter design as a particular application based on the general purpose hardware implementation of multi-dimension PSO. For reduce the complexity of the hardware and better implementation performance, the fitness function of multi-dimension PSO is improved according to feature of the FIR filter. The proposed adaptive algorithm can work effectively at a higher convergence speed than conventional adaptive algorithm and evolutionary algorithm. The adaptive filter based on 8-dimension PSO is implemented in FPGA as an example. The hardware architecture can provide a much faster update speed than software and the conventional adaptive filter design, when it realize low pass and band pass response. The mean square deviation between hardware results and software results is low. The designed adaptive filter presents high performance and the experimental results prove that it is a superior approach to satisfy the real-time applications. Finally, the conclusion is going to be presented and some of future work will be drawn for the research in Chapter 5.. 13.

(30) Chapter 2 A Hardware Architecture of Particle Swarm Optimization for Real-time Control 2.1. Overview According to the IEEE Standard [36] the definition of real time pertains to. the performance of a computation during the actual time that the related physical process transpires in order that the results of the computation can be used in. guiding the physical process. As defined in ISO/IEC JTC1 SC31 and. the emerging standard ISO/IEC FDIS 19762-5 [37], real-time locating systems (RTLS) are a combination of wireless hardware and real-time software that are used to continuously determine and provide the real-time position of assets and resources equipped with devices designed to operate with the system. The determination of the real-time position of positioning target is essentially based on the wireless signals sent by the sensor node. Under unavoidable interferences during the signal transmission, a complex nonlinear problem arises for achieving the optimal positioning with high accuracy for which the real-time software is responsible. Among the numerous methods that are available to solve the nonlinear problem, particle swarm optimization (PSO), introduced by Kennedy and Eberhart in 1995 [1], is a population-based stochastic heuristic algorithm. PSO is recently gaining more and more attention for its powerful ability to solve complex optimization problems, especially those with numerous local optima.. 14.

(31) In the literature [2][38][39], the advantage of the PSO algorithm over traditional techniques such as the genetic algorithm (GA) is demonstrated in its faster convergence speed, ability to produce a better result in less time and at low cost. They also indicated that PSO is more suitable for dealing with a nonlinear dynamic problem than GA or simulated annealing (SA) [40] because of its better anti-locking ability. In the last few years, the PSO algorithm has already been used as the solution to several nonlinear problems [2][38][41][42]. However, the heuristic nature of PSO makes it time-consuming. Moreover, modified versions of PSO with better performance are more complex and time-consuming, which makes traditional techniques virtually ineffective in meeting the real-time requirements of RTLS. On the other hand, the simplicity of the algorithm makes it possible to implement PSO with hardware, which decreases the processing time. Several kinds of implementation have been reported [40][43][44]. Chen et al. have discussed the advantages of hardware implementation of the PSO algorithm over GA or ANN [45]. Among these implementations, the hardware architecture of the PSO algorithm for the positioning system proposed by Chen et al. [46] achieves excellent performance in terms of both calculation speed and hardware cost because of the introduction of several hardware-oriented modifications to the hardware design, such as using ROM to store random numbers and converting floating point numbers to a fixed point number. However, calculation error occurs in this implementation, which makes it too unstable for practical applications. Considering the above concern, on the basis of the framework of Chen et al.’s implementation, the pipeline architecture of PSO is proposed in this chapter.. 15.

(32) The major objective of this development is to improve the system speed and the calculation stability with acceptable hardware cost. Also, the hardware design is intended to be general and able to be modified for other kinds of applications. To achieve these goals, two distinctive features are employed in the hardware architecture, and these features will be illustrated comprehensively in the following sections. The rest of the chapter is organized as follows. In section 2.2, the procedure of the PSO algorithm and position estimation using PSO is briefly introduced. Section 2.3 introduces the serial PSO hardware proposed by Chen et al. [45][46]. In section 2.4, the proposed novel hardware with two significant features is explained in detail. Then the overall proposed hardware architecture is demonstrated. Section 2.5 shows a comparison of experimental results between several kinds of hardware and software implementations. Section 2.6 comprises the summary and conclusions of this study.. 2.2. PSO-RTVIWAC and position estimate. 2.2.1. PSO and PSO-RTVIWAC The PSO algorithm is inspired by the social behavior of bird flocks. Within. a search space, several particles with velocities and positions are set to imitate the behavior of birds. Chen et al. [46], proposed a new concept of PSO, namely, PSO-RTVIWAC which has better convergence performance and higher accuracy than the conventional PSO algorithms. PSO-RTVIWAC concept is proposed to efficiently control the search and converge to the global best. 16.

(33) solution of the swarm. A significant improvement of the convergence rate is observed in PSO-RTVIWAC method. The particle number and iterations can be reduced to a small value while accomplishing an acceptable accuracy simultaneously. Thus, through the PSO-RTVIWAC method, the processing time of PSO can be reduced to a great extent. Moreover, by employing a few particles and iterations, a reasonable higher accuracy is obtained in the PSO-RTVIWAC method. This property makes the PSO-RTVIWAC method become more attractive, since the computation efficiency is improved considerably. The computation can be completed in an extremely shorter time than other conventional PSO algorithms. This research is established based on PSO-RTVIWAC as an example because of its advantages. First, each particle adjusts their behavior mainly according to its current velocity (inertia), and other better positions (personal best and global best) within the swarm (trend). The behavior is described by. dt 1 w dt c1 ( pd xdt ) c2 ( gd xdt ). (2.1). xdt 1 k dt 1 xdt. (2.2). where vd and xd are the velocity and position of particle d; pd and gd are the personal best position and global best position; the remaining variables are control parameters, in which w is the inertia weight factor, and c1 and c2 are acceleration constants decreased linearly from maximum value cmax to minimum value cmin. k is a introduced velocity limitation factor. In the original PSO algorithm, the control parameters are all constants.. 17.

(34) After the update of velocity and position, a particle's fitness is re-evaluated and the record of personal and global best performances is updated. The whole procedure is iterated several times and the final global best is considered as the optimized result of the system. The parameter automation strategy proposed in PSO-RTVIWAC involves complex calculations of parameters with the purpose of achieving higher speed convergence and accuracy. The parameters of PSO-RTVIWAC are calculated as. w r3 wmax t (wmax wmin ) / T . (2.3). c1 r1 r4 cmax t (cmax cmin ) / T . (2.4). c2 r2 r5 cmax t (cmax cmin ) / T . (2.5). phi 4 1 r6 . (2.6). k 2 2 phi phi 2 4 phi. (2.7). where t is the number of the current generation and T is the number of total generations. r1~r6 are defined as the random coefficients between 0 and 1. The inertia weight w is decreased linearly from maximum value wmax to minimum value wmin. Then, the fitness of each particle is evaluated from its position using a fitness function. This is not a unified process, but is defined according to the specific application. After that, the personal best and global best records are updated as. 18.

(35) pbest if fitness current fitness pbest pbest current if fitness current fitness pbest . (2.8). gbest if fitness current fitness gbest gbest current if fitness current fitness gbest . (2.9). Personal best (pbest) indicates the position with the best fitness that a particle ever experienced. Global best (gbest) indicates the position with the best fit of personal best positions around all the particles. Through a time-varying control mechanism of parameters, this algorithm proves to be particularly effective in dealing with optimization functions in the nonlinear dynamic environments. At the same time, compared with the constant parameters used in the original PSO algorithm, the complex computation of parameters gives rise to high overhead in the hardware implementation of PSO-RTVIWAC. 2.2.2. Position estimation The general concept of position estimation using the PSO algorithm is. represented in Fig. 2.1. In the positioning space, there are four locators deployed with predetermined coordinates. As the system input information, the distances between locators and target are measured by the locators and are expressed as R0, R1, R2 and R3.Then with these inputs, the fitness of particles is calculated as. Dd ,m =. X m xd Ym yd 2. 2. 19. (2.10).

(36) Fi t = Di ,m Rm 3. 2. (2.11). m 0. where (Xm , Ym) is the coordinate of locator m; (xd , yd) is the coordinate of particle d; and Dd,m is the distance between particle d and locator m. Then the position and velocity of each particle are updated, as mentioned above. This process will continue several iterations until the coordinate of the target is found out. The calculation time of this process is defined from first iteration to the end.. Figure 2.1: Position estimate in PSO-RTVIWAC. 2.3. Serial PSO hardware in previous work The basic hardware structure for PSO-RTVIWAC developed by Chen et al.. [45][46] is represented in Fig. 2.2.. 20.

(37) Figure 2.2: PSO hardware structure in previous work. Before introducing the proposed architecture in this research, it is necessary to explain this developed implementation first. There are five modules in this architecture. 1) Particle computing block (PCB): this module is enabled first. If it is the first iteration, random initial velocity and position are assigned to particles. Otherwise, velocity and position are calculated using Eq. (2.1) and Eq. (2.2). In either case, the velocity and position of a particle are the output of this module. 2) Fitness computing block (FCB): this block calculates the fitness value of each particle. This fitness value is specific to each application and varies from one application to other applications. For the positioning system, the module first calculates the distances between a particle and each sensor using Eq. (2.10), and then the fitness of the particle is calculated using Eq. (2.11). After that,. 21.

(38) pbest and gbest are updated using Eq. (2.8) and Eq. (2.9). 3) Random number ROM: to implement Eq. (2.3) to Eq. (2.7) in the hardware, a large chip area is necessary. Also, the equations require long calculation time. Therefore, a certain number of these random values is calculated in the PC and prestored in the random number ROM and recycled. This feature reduces both chip cost and calculation time. 4) RAM: variables in the algorithm, such as velocity and position, are stored in RAM. 5) Control unit: the central part of the system. By sending control signals and waiting for feedback signals, the control unit regulates the progress of the entire system. In the previous implementation, a five-state machine was used in the design of the control unit. As explained by Chen et al. [45][46] within these five states, the modules operate as follows. State 1: The initial state; all the modules are disabled and wait for the enable signal. State 2: The control unit sends a computing signal to the PCB, and then the state transfers to the next state. State 3: The PCB performs calculation while the control unit waits for its finish signal. On receiving the finish signal, the state transfers to the next state. State 4: The FCB performs calculation while the control unit waits for its finish signal. On receiving the finish signal, the state transfers to. 22.

(39) the next state. State 5: The control unit sends the memory address of the next particle to the PCB and then starts the next particle calculation.. 2.4. Proposed hardware with synchronous module control and pipeline architecture In real-time applications, any improvement of calculation speed and. stability is very significant for implementing PSO in hardware. In Chen et al.’s work [45][46], it was found that the calculation is not always stable, which decreases the stability of the whole system. On the other hand, it is possible to speculate that, in order to utilize the hardware resource to the maximum limit, many advanced techniques, such as a read-ahead mechanism, are employed. Also, the self-control of computing modules shows that they are somehow independent, which provides a loose style of module control. However, because of the complex timing caused by these design methods, system stability may decrease when an unpredictable status arises inside the hardware. In addition, high-speed on-chip memory is used to achieve better performance and the communication between high-speed on-chip memories and relatively low-speed FPGA may also be a potential cause of calculation error. In this research, it is speculated that the calculation error in Chen et al.’s implementation is caused by the complicated hardware design and the using of a high-speed on-chip RAM. Therefore, in our proposed architecture, the entire hardware is redesigned on the basis of the framework proposed in Chen et al.’s research [45][46],. 23.

(40) particularly for module control. In the new design, modules are directly controlled, which means that all the modules are continuously controlled by the control unit. In other words, computing modules only complete computations and do nothing else. On the other hand, the on-chip RAM is replaced with an FPGA built-in RAM, which eliminates the possibility of error in data transfer between memory and other modules. These measures proved to be effective in avoiding unpredictable calculation error. However, as the cost of improving stability, the system speed is decreased severely to about half that of the previous implementation. A straightforward approach is taken in the proposed architecture to compensate the reduced calculation speed. Although the PSO algorithm is capable of parallel processing for each particle, a parallel structure will dramatically increase hardware cost and complicate the reading and writing of RAM and ROM as the number of particles increases. According to the intrinsic structure of the PSO algorithm, the whole procedure can be simply divided into two main steps and organized in a pipeline style. Moreover, it is possible to achieve a higher level of parallelness if the algorithm is divided into more steps resulting in more pipeline stages that increase the calculation speed. The proposed hardware structure is shown in Fig. 2.3. In order to implement the pipeline architecture, many registers are inserted between the modules and each module is under the control of control unit directly.. 24.

(41) Figure 2.3: PSO hardware structure in this research. The different features between the previous architecture and the new developed architectures are listed in Table 2.1. Three factors are compared: memory hardware, module control and pipeline approach. Compared with Chen et al.’s implementation, in the improved hardware design, FPGA built-in memory is used and direct module control is adopted, making the system more stable but slow. On the basis of the simplification, by taking the pipeline approach, a more stable and faster system is achieved. In the rest of this section, the synchronous module control and pipeline approach are explained in detail, and finally, the combination of these two features are demonstrated.. 25.

(42) Table 2.1: Comparison of hardware implementation Memory hardware. Control mechanism. Pipeline. Speed. Stability. Previous. On-chip RAM. Complicated and distributed. No pipeline. High. Unstable. Synchronous. Built-in RAM. Direct and centralized. No pipeline. Low. Stable. Pipeline. Built-in RAM. Direct and centralized. Pipeline. High. Stable. 2.4.1 Synchronous module control The most important difference between proposed approach of this research and previous hardware structure is the module control method. Figure 2.4 reviews the asynchronous control method of previous hardware, in which modules get control from each other. Because the control timing is decided by the calculating situation of previous module, there would be some control error happened based on the judgment of calculation ending. Although the communication overhead can be saved, the stability of the system cannot be satisfactory. Figure 2.5 presented synchronous control method proposed in this research, in which modules can get synchronous control from control unit. That means each module can communicate with control unit directly. The communication overhead is increased, but a stable module control can be achieved.. 26.

(43) Figure 2.4: Asynchronous module control approach in previous work. Figure 2.5: Synchronous module control approach in this research. 27.

(44) 2.4.2 Control unit In this subsection, the hardware design of the control unit and the combination of the pipeline and the control unit are explained. The control unit is the heart of the proposed hardware architecture. Through sending control signals and receiving feedback signals, this module interacts with all other modules: by sending computing and reset signals, it decides when calculations are performed; by sending the write enable signal, it decides when to update the memory. Overall, the behavior of the entire system is controlled only by the control unit. The control unit is mainly comprised of a state machine and control registers, as represented in Fig. 2.6. Since all six operations are performed simultaneously, the state machine in the pipeline architecture only needs three states. The first state is the initial state. In the second state, the control unit enables the computing modules to perform calculations and also enables RAM to update values. Then the control unit waits for the finish signals from computing modules and update complete signals from RAM. On receiving all the feedback signals from other modules, the state machine enters the third state. In this state, all the control registers are updated. Control registers are composed of a control signal register and counter register. The control signal register decides whether the corresponding module is enabled or disabled. Counter registers provide different memory addresses for each module separately, since in the pipeline approach, the progress of each module is different from each other. The way these counters are updated decides the structure of the pipeline process.. 28.

(45) Figure 2.6: Control unit design and state machine. To improve the stability compared with Chen et al.’s work, a simplified version of the PSO hardware architecture is developed. In this intermediate design, similar to the previous research, there are also five modules. On the other hand, there are also two significant modifications. One is in how the control unit enables computing modules. The other is the timing relation around modules. Figure 2.7 illustrates the two kinds of approaches to controlling a computing module. In Chen et al.’s research, in the short time period 1, the. 29.

(46) control unit sends a computing signal to the PCB to trigger its calculation. In the long time period 2, the PCB performs calculation and does not receive signals from the control unit. In the proposed design, during the long time period 3, the control unit continues to send the computing signal (enable signal) to the PCB while the PCB is performing computation. In the short time period 4, the control unit does not send the computing signal to the PCB and the PCB is idle. In other words, the PCB is disabled by the control unit. The difference can be confirmed from this comparison: in Chen et al.’s work, after receiving the computing signal, the computing modules acquire control from the control unit and begin operating independently. In contrast, in the proposed version, the computing modules are always controlled by the control unit. On the other hand, the modified timing relation can be demonstrated by the six-state machine used in the proposed design, rather than the five-state machine used in Chen et al.’s research: State 1: State 1: Initial state. State 2: Control unit enables the PCB to perform calculation. On receiving the finish signal from the PCB, the state transfers to the next state. State 3: Control unit disables PCB and enables RAM to store the calculation result. On finishing the writing of the result values, the state transfers to the next state. State 4: Control unit enables the FCB to perform calculation. On receiving the finish signal from the FCB, the state transfers to the next state. State 5: Control unit disables FCB and enables RAM to store the. 30.

(47) calculation result. On finishing the writing of the results, the state transfers to the next state. State 6: Control unit updates the address for RAM to the next particle. After that, the state transfers to state 2. These comparisons show that in the proposed version, the control unit handles all the control over other modules and plays a very important role in the system. Also, modular design makes the simplified PSO hardware more general and flexible.. Figure 2.7: Comparison of control mechanisms. 31.

(48) 2.4.3 Pipeline approach In Chen et al.’s research [45][46], in consideration of reducing the state transition overhead, the PSO algorithm is divided into two parts rather than four (explained in section 2.1). These two parts are implemented in two individual computing modules, namely, the PCB and the FCB. The PCB and FCB work sequentially, since they wait for the end of the other’s calculation before starting their own calculations. It is possible to further increase the processing speed by utilizing this waiting time effectively. The pipeline approach is introduced in the proposed architecture. Two stages of pipeline are built by dividing the algorithm into the PCB and FCB. This is possible because there are several particles to be processed iteratively and when the FCB is calculating the fitness of current particle d, PCB may start to process next particle d+1. An approximation of the pipeline structure for PSO is illustrated in Fig. 2.8. Particle 1 to Particle 10 is represented by P1 to P10. In the ideal case that the time consumption of the PCB and the FCB are equivalent, the pipeline approach achieves a speed almost two times faster than that of the serial approach without a pipeline. However, in real hardware, the implementation this kind of pipeline still does not provide the same speed as the previous implementation. Therefore, the hardware process of the PSO algorithm is further divided into six phases: reading input of the PCB, calculation of the PCB, storing output of the PCB, reading input of the FCB, calculation of the FCB, and storing output of the FCB. For instance, when the input of the PCB is being read for particle d+1, the other five operations are as follows.. 32.

(49) . PCB is performing calculation at particle d;. . Calculation result of PCB at particle d-1 is being processed;. . Input of FCB is being read from RAM at particle d;. . FCB is performing calculation at particle d-1;. . Calculation result of FCB at particle d-2 is being processed;. Figure 2.8: Two stages of pipeline. The reason for this division is that the reading and writing operations of RAM are relatively time-consuming in the proposed architecture since an FPGA built-in memory is adopted. An approximation using this kind of pipeline structure is represented in Fig. 2.9.. 33.

(50) Figure 2.9: Final six stages of pipeline In the rest of this section, the hardware that is necessary to build this kind of pipeline is explained in detail. Reading and writing operations of registers can be seen as very fast processes. Because of this feature of the register, for the two computing modules and RAM, each I/O port is assigned to a corresponding register. Input values for equation calculation or memory update are obtained from the input registers; calculation results from the computing modules are first stored in output registers. Through these registers, the I/O operations of the PCB and. 34.

(51) FCB can be ignored and consequently, the processing time is reduced. At the same time, the actual input and output processes with RAM become individual operations. Thus, through the use of the I/O register, the six stages of the enhanced pipeline and a parallel computation can be obtained. To implement this kind of pipeline structure, data is transferred not only between registers and modules, but also among registers. For example, to enable the immediate calculation of FCB after the calculation of PCB, the output values in the PCB output register are directly feed into the FCB input register. The proposed hardware architecture and detailed dataflow are represented in Fig. 2.10, which describe the date transfer between modules.. Figure 2.10: Dataflow in pipeline architecture. Figure 2.11 represents the combination of the pipeline and state machine. In the calculation state, the tasks in the six pipeline stages are performed. In the counter update state, the pipeline is suspended and control information for the. 35.

(52) pipeline is updated. Together with the function specificity of module design, these two steps form a general pipeline for any application.. Figure 2.11: Pipeline and state machine. 2.5. Experimental results and comparison based on FPGA To investigate the performance of the proposed hardware, a novel control. approach, intermediate versions and pipeline architecture with I/O registers are developed. All three approaches are directly implemented on the Cyclone II EP2C70F896C6N development board using the Verilog hardware description. 36.

(53) language in Quartus II 9.0.. Table 2.2: PSO-RTVIWAC parameters. Particle number. 10. Max. iteration times (T). 50. cmin. 0. cmax. 40. ωmin. 0.4. ωmax. 0.9. Search space. 300*300 (m2). Velocity limitation. -150 ~ +150. Position limitation. -150 ~ +450. Number of locators. 4. The system clock of the FPGA is set to 50 MHz. Comprehensive performance comparisons, including calculation time, positioning accuracy and chip cost, are presented in this section. For a fair comparison, all the PSO-RTVIWAC parameters of the proposed approaches used in the experiments are kept the same as those in Chen et al.’s implementation[45][46], as represented in Table 2.2.. 37.

(54) 2.5.1. Calculation time Table 2.3 lists the experimental results of calculation time required by the. three different approaches compared with the approaches proposed by Chen et al. [46]. The first column shows the name of the four implementations. The second to fourth columns show whether the new features are implemented. The fifth column indicates the performance of each implementation for the tracking target. It shows how many times the PSO- RTVIWAC can track the target in one second. The sixth column shows the calculation time of one evaluation (the time consumed by one particle in one iteration). The seventh column shows the total calculation time of tracking the target once (500 evaluations). The calculation time is defined as the time consumption of all fifty iterations. The software approach is implemented on a PC (Intel 2.4G Hz, 32bit, 1GB RAM), as proposed by Chen et al. [46]. The software platform is Matlab. In this implementation, 25 runs of tracking are obtained in one second, where one tracking means the execution of 500 evaluations (50 iterations with 10 particles). The calculation time of one evaluation is 80 μs, which means it takes 40 ms to complete target tracking once. Although the previous serial approach can attain a high speed through the employment of high-speed on-chip memory, it causes errors and instability.. 38.

(55) Table 2.3: Comparison of calculation time. synchronous Pipeline control Software approach on PC Previous serial approach Proposed control approach (first) Intermediate versions (second) Pipeline with proposed control (third). I/O register. Calculation Calculation Tracking times time for target per second one particle time. -. -. -. 25. 80 μs. 40 ms. -. -. -. 21,893. 90.10 ns. 45.05 μs. ○. -. -. 9,566. 209.08 ns. 104.54 μs. ○. ○. -. 18,610. 107.46 ns. 53.73 μs. ○. ○. ○. 25,981. 76.98 ns. 38.49 μs. Similarly, the performances of three hardware approaches are listed. 1) In the first experiment, based on Chen et al.’s implementation, a direct synchronous control mechanism is adopted, which achieves 9,566 targets tracking per second. 2) In the second experiment, on the basis of the results of the first experiment, the pipeline approach is developed; it largely improves the calculation speed to 18,610 targets tracking per second.. 39.

(56) 3) In the third experiment, also with the proposed hardware architecture, I/O registers are added to the second approach to enhance the pipeline, and a speed of up to 25,981 targets tracking can be attained in one second. This means that it only takes 38.49 μs to complete target tracking once, which is faster than that of the software counterpart on the PC, about 1,039 times. This speed can be considered fast enough for real-time applications with strict timing constraints. The calculation speed is improved by 17 percent compared with Chen et al.’s work. It is possible to further improve the speed performance if the algorithm is divided into more pipeline stages, and the results of this research prove that the direct control method can be used to easily build any kind of pipeline structure. 2.5.2. Chip cost For a fair comparison, chip cost is expressed in terms of a logic element. (LE), which is a general building block unit in an Altera programmable device. Table 2.4 lists the chip cost of four architectures including Chen et al.’s implementation, where the LE counts of our research are obtained by simulating the hardware design using the Cyclone II EP2C70F896C6N FPGA model to maintain consistency with Chen et al.’s work. In order to achieve excellent stability, some chip areas must be spent for the improvement of timing control in the proposed control approach. However, the calculation speed is decreased sharply, as shown in Table 2.3. Then, slightly more chip resources are invested to increase the processing speed in the final proposed approach. Although the chip cost of the proposed pipeline architecture with the I/O register is higher than the previous implementation, it is acceptable and worth for improving speed and stability. In the comparison of chip cost,. 40.

(57) since a different built-in RAM memory is adopted in this research instead of the high-speed on-chip memory employed in Chen et al.’s implementation, the comparison of chip cost is only an approximation for reference.. Table 2.4: Comparison of chip cost. PSO Architecture. Development Tool. FPGA Model. Chip Cost. Previous serial approach. Quartus II. EP2C70F896C6N. 2,789 LEs. Proposed synchronous control. Quartus II. EP2C70F896C6N. 3,543 LEs. Intermediate versions. Quartus II. EP2C70F896C6N. 3,657 LEs. Pipeline with synchronous control. Quartus II. EP2C70F896C6N. 3,972 LEs (Gate Count: 47,664). 2.5.3. Positioning stability and accuracy Stability and accuracy are the critical factors for real-time applications. In. order to compare proposed approaches with previous work, the same average error function Ep,average as in Chen et al.’s work [46] is employed. The function is expressed as. 41.

(58) E p ,average . E . 1000 r 1. 2 p. 1000. (2.12). Twenty-five target positions are selected for the experiment of comparing the positioning accuracy: (0, 0), (0, 75), (0, 150) … (225, 300), (300, 300) (unit: meter). For each target, 1,000 runs are carried out in the search space. The adoption of random number ROM and the simplification of equations are inherited from Chen et al.’s research.. Table 2.5: Comparison of positioning stability and accuracy. Ratio of available result. Ep,average. 71.4%. 2.52E-02. Proposed control approach. 100%. 2.76E-02. Pipeline with synchronous control. 100%. 2.76E-02. Previous serial approach. The experimental result of the proposed approach based on hardware described in this chapter is compared with the results of the previous serial approach, as shown in Table 2.5. In the previous implementation, a 28.6 percent error rate occurred, which means that in 1,000 positioning jobs, there are 286 times of the positioning result not being available. On the other hand, during the whole experiment with the proposed approach, no error occurs. When a different hardware approach from the previous work was adopted, Ep,average for. 42.

(59) the proposed pipeline approach slightly increased. However, such a slight degradation is acceptable and marginal. Compared with the proposed control approach, the proposed pipeline architecture does not cause any additional error since it only improves the implementation of the same algorithm based on the same hardware.. 2.6. Summary A novel pipeline architecture was described for improving the hardware. design of PSO-RTVIWAC. The proposed architecture was developed on the basis of the framework of the previous research on serial PSO hardware architecture. With the aim to improving the speed and stability compared with the previous research, the proposed architecture implemented two features. First, the hardware was completely redesigned in a simpler style. In particular, the module control mechanism was changed from the complicated asynchronous style to a direct and centralized synchronous style, which contributes to a stable hardware system. Second, the pipeline approach was originally introduced in the proposed PSO hardware architecture. The proposed architecture was built on FPGA, and the chip cost is about 47,664 gates. Through a compact arrangement of pipeline stages and the use of I/O registers for each module, the calculation speed for one particle (76.9ns) is 1,039 times faster than in the software approach on a PC was achieved. The proposed pipeline architecture outperforms the previous serial architecture by 17 percent in speed with much higher stability, and provides a practical approach for a real-time positioning system. The experimental result showed. 43.

(60) that the proposed architecture achieves not only higher processing speed but also higher stability, compared with previous research in terms of both speed and stability. This novel subject provides a high performance PSO hardware for real-time system. As a flexible method, the specific design of the proposed architecture makes it easy to adapt to any kind of real-time 2-dimension application merely by replacing the fitness function and dimension of the search space. Usually, the fitness evaluation is the most time-consuming part of the algorithm. Therefore, it is always possible to divide the fitness evaluation process into several stages in a pipeline structure to build a fast and stable PSO system.. 44.

(61) Chapter 3 A Hardware Implementation of Multi-dimension PSO for Complex Nonlinear Control 3.1. Overview There are many successful applications about PSO algorithm in low. complex nonlinear problems. These researches have shown the high convergence and the effectiveness of the PSO algorithm. However, about the applications in multi-parameter control of nonlinear optimization are rarely mentioned. Multi-parameter optimization is the process of simultaneously optimizing more than two parameters of certain constraints. About the 2-parameter control, there is only a set of 2-dimension continuous solutions to be optimized. In the multi-parameter optimization problem, there is multi-dimension of continuous solution. That means the optimal result should include multi-dimension solution. Multi-dimension problems of nonlinear optimization can be found in various fields, product and process design, aircraft design, finance, automobile design, the oil and gas industry, signal processing, or wherever optimal decisions need to be taken in the presence of trade-offs among multi-dimension solutions. Maximizing profit and minimizing the calculation cost to product optimum such as maximizing performance and minimizing fuel consumption of a vehicle and find out the best coefficients of a digital filter.. 45.

(62) Chapter 2 proposed a novel hardware architecture for 2-dimension PSO. However, it cannot meet the requirement of complex nonlinear control. The hardware design of 2-dimension PSO is fixed and difficulty to be extended for multi-dimension calculation directly. This chapter employs a multi-dimension PSO based on PSO-RTVIWAC for meeting the complex nonlinear control. Based on architecture presented in Chapter 2, a hardware implementation is proposed in this chapter as a fundamental hardware platform for multi-dimension PSO. The high calculation speed of this hardware is the most important performance for real-time application. In addition, as the dimension number of PSO increasing, the calculation become very complex and the processing speed sharply decline. In order to achieve a hardware high processing speed and reasonable chip cost, an implementation way with high performance is expected. This chapter presented a general purpose hardware implementation of multi-dimension PSO algorithm for complex nonlinear application based on previous research. With the design of highly integrated module, the proposed hardware can separate the calculation of each dimension. By dividing the communal module, paralleling the independent calculation of each dimension and proposed ROM mapping method, the proposed hardware architecture has high flexibility and scalability and any dimensions can be realized. The rest of the chapter is organized as follows. In section 3.2, multi-dimension PSO is briefly. introduced.. Section. 3.3. presents. the. hardware. design. for. multi-dimension PSO. In section 3.4, the hardware synthesis and analysis is shown. Section 3.5 comprises the summary and conclusions of this study.. 46.

関連したドキュメント

Figure 4.13, 4.14 compares the distribution of the timer interrupt delay without and with virtual core migration under frequent IPC load on top of Linux. The IPC load is generate

Nguyen Thi Thanh Hai and Toshio Obi, E-Government Implementation Failure: Insights of Prism Framework -The case of Public Administrative Management Computerization Project

Compared with Particle Swarm Optimization (PSO) algorithm, the positioning error in the proposed model is significantly reduced.. From the simulated results, it is indicated

Keywords: homology representation, permutation module, Andre permutations, simsun permutation, tangent and Genocchi

interaction abstract machine token passing on fixed graph. call

In this paper, taking into account pipelining and optimization, we improve throughput and e ﬃ ciency of the TRSA method, a parallel architecture solution for RSA security based on

Standard domino tableaux have already been considered by many authors [33], [6], [34], [8], [1], but, to the best of our knowledge, the expression of the

5 used an improved version of particle swarm optimization algorithm in order to solve the economic emissions load dispatch problem for a test system of 6 power generators, for