Graduate School of Fundamental Science and Engineering, Waseda University
༤
༤ 㻌 ኈ 㻌 ㄽ 㻌 ᩥ 㻌 ᴫ 㻌 せ
Doctor Thesis Synopsis
ㄽ
ᩥ 㢟 ┠࣐ ࣝ ࢳ ࣉ ࣞ ࣖ ࣮ ࢤ ࣮ ࣒ ࡢ ⮬ ື ୪
ิ 㟁 ຊ ๐ ῶ 㛵 ࡍ ࡿ ◊ ✲
Thesis Theme
Studies on Automatic Parallelization and Power Reduction of a Multiplayer Game
⏦ ㄳ ⪅
(Applicant Name)
Yasir ALDOSARY
ࣝࢻ࣮ࢧ࣮ࣜ ࣮ࣖࢭࣝ
Major in Computer Science and Engineering Research on Advanced Computing Systems
May, 2013
Synopsis main body 1st page
In recent generations, due to hindering issues such as the “memory wall” , Computer system developers have sought higher performance through parallel computing. This parallel computing technology has been proven to be effective in many fields such as multimedia and simulators. However, parallel computing is still a very challenging technology to implement due to problems such as data contentions and pointer analysis. To overcome that parallelizing compilers have been extensively researched and developed to mainly: (1) help extract parallelism from computer applications to enhance performance (2) mask the parallelizing complexities from the developers to help them freely focus on the application.
Since the beginning, Video Gaming companies have continuously raced to advance Gaming technologies to produce the most fulfilling experiences for the customers; thus just as the mainline computer industry shifted towards multicore technologies so did the Gaming industry.
Due to Video Games’ inherited waterfall nature they have yet to truly take considerable advantage of what the multicore technology offers. Similarly, no research has been conducted that examines extracting parallelism from Video Games by using parallelizing compilers.
The OSCAR compiler is a parallelizing compiler that is well proven to be successful in extracting considerable amount of parallelism from computer applications. Alongside enhanced performance, the reduction of power is one important benefit that OSCAR is capable of producing for multicore processors that are equipped with Dynamic Voltage and Frequency Scaling (DVFS) or Power Gating technologies. In order for the OSCAR to extract optimum amount of parallelism an application, it must be written in the Parallelizable-C format. The reduction of power has become an import topic of research with the growing interest in “Green Technologies”, and it becomes even more so in the case of handheld devices where power is a limited resource.
In this research, an attempt at parallelizing and reducing the power consumption of a well renowned Video Game called ioquake3 using the OSCAR compiler was examined.
Chapter 1, “Introduction” states the objectives of this paper. In particular, this chapter describes the potential benefits of implementing an automatic parallelizing compiler with a well-renowned Video Gaming benchmark, and the importance of this work in relation to other works in this field.
Chapter 2, “OSCAR Compiler and API” describes how the OSCAR compiler automatically extracts parallelism from a computer application, and similarly reduce the power consumption requirements of that application. First, the OSCAR compiler analyzes a computer application to discover control and data dependencies amongst its macro-tasks; basic blocks; loops; and function calls. Then, it assigns those macro-asks to processing units to reduce the overall execution time.
Furthermore, to help reduce power consumption the OSCAR compiler may apply one of the
Synopsis main body 2nd page
following methods: (1) Minimum Execution Time; sleep when idle (2) Satisfaction Of Real-time Deadline; lower the frequency to exploit idle time. Furthermore, Parallelizable-C rules are integrated into the target application’s code to help OSCAR analyze the program thoroughly and exploit optimum amount of parallelism and power reduction potential as well.
Chapter 3, “ioquake3 Structure Analysis” describes the structural detailed analysis of a well renowned multiplayer Video Game called ioquake3. To understand which areas of ioquake3 should be the focus of the optimizations, the execution of ioquake3 was profiled using a performance profiler called Visual Studio Performance Profiler. The profiling results revealed that after the ioquake3 engine finishes all Game initializations, there exist three bottlenecks that utilized over %90 of the CPU processing load. First, to understand the driving mentality found in ioquake3 operational engine, the internal modules and mechanisms were broadly explored. The analysis showed that the three bottlenecks operated in waterfall manner where the flow of operation is as follows; (1) BotAI that was responsible for replacing the human counterpart for an AI driven player, and acted as the brains for that player by viewing the surroundings, making decisions based on personal and areal conditions, and finally inputting the commands to carry out that chosen decision. (2) ClientThink was responsible for carry out the actions that were chosen by the player- human and AI driven- into the virtual world, and calculating the logic of those interactions while maintaining the changes incurred by those actions by updating the data of all the interacting entities; Player, World, Items. (3) SendClientMessages was responsible for updating all the participating players- human and AI driven- with their latest surroundings that resulted from the previous actions by using a camera-like mechanism and taking a snapshot of the surroundings and conveying it to the designated player.
Chapter 4, “A Parallel Approach, Hazards, and Solutions for ioquake3 on a Multicore Platform” describes the parallelizing methodology that was implemented to ioquake3 so that it could be compiled on the OSCAR compiler and the performance results of that new transformation. After establishing where the focus of the CPU was, and how the internal modules interconnected, the target for the research was achieved and the experiment was carried as follows. First, the analyzed structure of the three bottlenecks that were discovered was examined to determine the amount of parallelism that could potentially be extracted from them. The results showed that the three bottlenecks were of a “for loop” structure that handles heavy amounts of computations; thus indicating potential for parallelism. Second, as a preliminary experiment to see how much extractable parallelism could be made without any alteration to ioquake3, it was compiled by using the OSCAR compiler then tested on a multicore machine. The results showed that because of the existence of many hazards, no performance enhancement could be achieved;
thus the code had to be rethought. Third, the bottlenecks were deeply analyzed by hand and by the aid of a debugger to discover all the potential hazards that may occur from attempting to
Synopsis main body 3rd page
parallelize while noting the non-compliant Parallelizable-C areas. Forth, all the discovered hazards were examined, rethought, solutions were planned out, and then implemented while revamping the code to become Parallelizable-C compliant. Fifth, the newly revised ioquake3 code was then compiled by the OSCAR compiler to produce a parallelized ioquake3 code. Finally, the performance of the newly produced code that was measured on two multicore machines- IBM POWER5+ and RPX- and analyzed. The results showed that the newly produced OSCAR code could achieve 5.1 times speedup on the IBM POWER5+ machine using 8-cores, and 2 times speedup on RPX using 4-cores.
Chapter 5, ” A Power Reduction Approach and Difficulties for ioquake3 on a DFVS Platform” describes the methodology that was implemented to reduce the power consumption requirements of ioquake3 when running on a DVFS platform, and also shows the environmental setup to produce the desired results. The approach in which ioquake3’s power consumption was reduced was by: first, ioquake3 as an application was examined to choose one of OSCAR’s two available power reducing methods mentioned earlier. Video Games are designed with the known structure of Time Frames, where the logic of the inputted actions must be computed within a certain time frame. This structure perfectly fits OSCAR’s Deadline approach; thus it was chosen.
Second, for OSCAR to appropriately schedule program tasks to enable power reduction, the time and clock-cycles for those tasks must be explicitly specified for OSCAR. Therefore, the time and clock-cycles for the internal modules of ioquake3 were measured, then, implemented into the code as directives for the OSCAR to input for its analysis. Third, ioquake3 was compiled on OSCAR using the power reduction options. Finally, the power reduced version of ioquake3 was executed on RPX that is equipped with DVFS features, then, the power consumption of RPX was measured and an analysis was made. On the one hand, the results showed that when running the original ioquake3 version, RPX consumed 1.6 watts on average. On the other hand, the results also showed that when running the power reduced version of ioquake3, RPX consumed 1.25 watts average. Therefore, the OSCAR compiler was able to reduce the power consumption of an ioquake3 down to %73 on average. It is important to mention that originally RPX was running with a Linux environment that had a 10000 micro seconds, which was too long for ioquake3’s Frame rate of 30 frames per second. Therefore, a modified version of Linux was installed that had frequency transition time of 50+ microseconds, which allowed the OSCAR to exploit idle times within ioquake3 successfully.
Chapter 6, “Conclusions” concludes this thesis and explains future works.
No.1
᪩
᪩✄⏣Ꮫ ༤ኈ㸦ᕤᏛ㸧 Ꮫ⏦ㄳ ◊✲ᴗ⦼᭩
(List of research achievements for application of doctorate (Dr. of Engineering), Waseda University) Ặ ྡ(Yasir ALDOSARY) ༳( )
㸦As of July, 2013㸧
✀ 㢮 ู (By Type)
㢟ྡࠊ Ⓨ⾲࣭Ⓨ⾜ᥖ㍕ㄅྡࠊ Ⓨ⾲࣭Ⓨ⾜ᖺ᭶ࠊ 㐃ྡ⪅㸦⏦ㄳ⪅ྵࡴ㸧(theme, journal name, date & year of publication, name of authors inc. yourself)
Papers ۑ
ۑ
Technical reports
ۑ
Enhancing the Performance of a Multiplayer Game by Using a Parallelizing Compiler, 17th International Conference On Computer Games (CGAMES12), July 30th 2012, Yasir ALDOSARY, Keiji KIMURA, Hironori KASAHARA, Seinosuke NARITA
Enhancing the Performance of a Multiplayer Game by Using a Parallelizing Compiler, International Journal of Intelligent Games & Simulation (IJIGS) Volume 7 Number 1, April 2013, Yasir ALDOSARY, Keiji KIMURA, Hironori KASAHARA, Seinosuke NARITA
Enhancing the Performance of a Multiplayer Game by Using a Parallelizing Compiler, IPSJ SIG on Computer Architecture, April 25th 2013, Yasir ALDOSARY, Dominic Hillenbrand, Yuuki Furuyama, Keiji KIMURA, Hironori KASAHARA, Seinosuke NARITA