博士学位論文

(1)

博士学位論文 Doctoral Dissertation

内容の要旨及び

審査結果の要旨

Dissertation Abstract and

Summary of the Dissertation Review Result

第 37 号

The Thirty-Seventh Issue

2021

年

3

月

March, 2021

The University of Aizu

(2)

はしがき

博士の学位を授与したので、学位規則（昭和２８年４月１日文部省令第９号）第８条の規定に基づき、その論文の内容の要旨及び論文審査の結果の要旨をここに公表する。

学位記番号に付した｢甲｣は学位規則第４条第１項（いわゆる課程博士）によるものであることを示す。

Preface

On granting the Doctoral Degree to the individuals mentioned below, abstracts of their theses and the theses review results are herewith publicly announced, in according to the provisions provided for in Article 8 of the Ruling of Degrees (Ministry Of Education Ordinance No.9, enacted on April 1, 1953)

The Chinese character, “甲”, at the beginning of the diploma number represents that an

individual has been granted the degree in accordance with the provisions provided for in

Paragraph 4-1 of the Ruling Of Degrees (what is called “Katei Hakase,” or the Doctoral

Degree granted by the University at which the grantee was enrolled.).

(3)

- 1 -

Abstract

Speech is considered the most natural way of communication among humans. This makes an Automatic Speech Recognition (ASR) system a natural choice for human-machine interaction. The speech recognition problem is defined as the task of decoding an acoustic speech signal into a written text (the recognized word sequence).

Large Vocabulary Continuous Speech Recognition (LVCSR) systems are capable of dealing with a large vocabulary of words, typically more than 100k words pronounced continuously in a fluent manner. Although most of the techniques used in speech recognition are language-independent, still different languages are posing different types of challenges. Efficient language modelling is

considered one of the hard challenges facing LVCSR built for languages with flexible word order.

Those are characterized by the complex morphology, which causes data sparsity and high out-of-vocabulary rates leading to poor language model probability estimates.

To eliminate these issues various advanced modelling techniques were investigated for the task of modelling synthetic languages. Both the predictive power of those models in terms of perplexity and the impact of those models on ASR system performance in terms of Word Error Rate (WER) were evaluated. The combination of ¥(n¥)-gram and Recurrent Neural Network Language Model (RNN LM) allowed improving overall system performance by 7.1%. Using Bayesian language model it is possible to generate meaningful, grammatically correct sentences that were not observed during model training, even for a language with very complex grammar. Factored language models seem to be able to capture additional information and improve LM probability estimates when the amount of training data is limited. The proposed factor set selection strategy allowed relative WER improvement of 6.9%

to be achieved, which is higher than in previous studies. Using a large amount of training data, RNN LM evaluation showed the largest performance improvement.

Despite the progress of ASR systems built using deep neural networks (DNNs) over the last decade, the state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance when facing challenging acoustical environments characterized by the high level of non-stationary noise. Methods to improve noise robustness usually include adding

components to the recognition system which often require careful optimization and increase the computational load of an ASR system. Lately, the noisy speech recognition research is mostly concentrated on the front-end speech enhancement or back-end DNN architecture and training objective functions while less attention is paid to the choice of the input features. That is why data augmentation of the input features derived from the Short-Time Fourier Transform (STFT) has become a popular approach. The STFT considers any signal to be piece-wise stationary and linear and treats it as a sum of predefined functions. However, speech is a highly non-linear and non-stationary signal which contains a multitude of information.

To tackle this assumption it is proposed to introduce adaptive mode decomposition feature extraction methods. Their ability to improve overall ASR system performance for the DNN-based end-to-end speech recognition model is presented. Several architectures were proposed to combine those features at different levels of the deep neural network. In addition to three obvious naive combinations,

(6)

- 4 -

informative data representations were analyzed by finding a correlation between the STFT and HHT feature channels and introduce attention-based combination with different levels of detail.

Combined-feature models outperformed conventional ones in various train/test scenarios and using different amount of speech data. It was shown that proposed combinations are more robust to additive natural noise at signal-to-noise levels range from 20 dB to 0 dB. Moreover, this advantage does not deteriorate on real noisy speech data. In all proposed models, the increase of network parameter number was negligible with respect to the baseline single-feature model. This demonstrates, that HHT-based features can improve the end-to-end system noise robustness at a small computational cost.

Summary of the Dissertation Review Result

This dissertation study is focused on improving the performance of the automatic speech recognition (ASR) systems. Typically, ASR systems consist of two main models, i.e. acoustic model (AM) and language model (LM). The AM is used to perform a matching between the speech signal and a sequence of phonemes or words and the LM estimates how well this sequence corresponds to a given language characteristics. In this study, improvements to both the AM and LM have been achieved by proposing a new method for factor set selection in the so called Factored LM and a neural network based combination of the standard spectrogram features and Hilbert spectrum features derived from two types of adaptive mode decomposition techniques – EMD and VMD.

Different languages have different degrees of variability and that is why some standard approaches for building language models do not work well for some languages. Such language is the Russian

language where the word order exhibits high flexibility. One way to deal with this problem is to consider not just the words but some other information about them such as part of speech, lemma, etc.

All these word features are called factors and the corresponding LM is named as Factored LM. In order to achieve efficient and effective model, those factors have to be carefully selected. A new factor selection strategy has been proposed in this study which resulted in better performance than the standard method.

Another difficulty for the ASR systems is the presence of noise in the speech signal. There are various methods to reduce the noise, but they are usually difficult to incorporate in the AM. The standard spectrogram derived from the STFT is not noise robust, so in this study, another way to extract features based on the adaptive mode decomposition technique is used. There are two methods called EMD and VMD which are both data driven and noise robust especially at low signal-to-noise levels.

However, the Hilbert spectrum derived from the EMD or VMD decomposition is not as good as the STFT spectrum for clean speech. This suggests that the combination of the STFT and Hilbert spectrum can deliver improved performance for both clean and noisy speech. The second contribution of this study is the way to combine the STFT and Hilbert spectrum using neural network and an attention mechanism.

In the final review, the candidate presented his work in 60 minutes followed by 50 minutes questions and discussion. The committee have reviewed the submitted dissertation and the response to questions

(7)

- 5 -

raised after the preliminary review and satisfied the answers. All member of the committee agreed to confirm the significance of the dissertation for a PhD degree.

(8)

- 6 - Name

氏名

HERATH MUDIYANSELAGE Isuru Nihathamana Jayarathne

（ヘラトムディアンセラゲイスルニハタマーナジャイヤラトナ）

The relevant degree 学位の種類

甲CI博第84号

論文題目

EEG Analysis for Authentication and Affect-Guided Soundscape Exploration

脳波解析による認証と立体音響の探索 Dissertation Review Committee Members

論文審査委員

The University of Aizu, Prof. COHEN, M. (Chief Referee)

The University of Aizu, Prof. ZHAO, Q.

The University of Aizu, Prof. CHEN, W.

The University of Aizu, Associate Prof. YAGUCHI, Y.

会津大学教授マイケルコーエン（主査）

会津大学教授趙強福会津大学教授陳文西会津大学准教授矢口勇一

(9)

- 7 -

Abstract

This work is devoted to analyzing and exploiting EEG signals for security and musical relaxation.

Such biometric applications are used in several areas because of the non-invasive capturing nature of EEG and the affordable availability of consumer-grade devices. In our first study, EEG signals were used as a biometric trait to authenticate access to restricted systems, which provides more security compared to traditional biometrics. As a second study, EEG signals were used as a measurement of relaxation to reinforce the discovery and exploration of the ``sweet spot'' in a pantophonic musical landscape.

User authentication systems based on EEG have recently become popular, marking an inflection point in the field. Since most of the surveyed related studies achieved high accuracy, our goal was to increase user-friendliness by reducing the number of electrodes using a simple task to collect data.

EEG signals were collected across different trial phases: relaxation, visual stimulation, and mental recall. We introduce a novel derived feature, dubbed Inter-Hemispheric Amplitude Ratio (IHAR), which expresses the ratio of amplitudes of laterally corresponding electrode pairs. The extracted feature set was tested with several machine learning (ML) algorithms, including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), and k-Nearest Neighbor (kNN). Most of the ML algorithms showed 100¥% accuracy with 14 electrodes, and according to our results, perfect accuracy can also be achieved using as few as 4 electrodes.

A novel one-class classification method was proposed based on autoencoders to overcome the weaknesses of EEG-based authentication systems. In most systems, classification models need to be retrained and a considerably large dataset must be used. Autoencoders were trained for each system user in the proposed system that does not need to retrain the whole system when adding a new user.

Signal augmentation techniques— jittering, amplitude warping, time warping, random downsampling, and permutation— were used to expand the training dataset. Evaluation results showed 97.1%

accuracy with a 0.91 AUC value for 8 frontal electrodes.

A system was developed to address the challenge of musical relaxation by using modern machine learning techniques. In our approach, a computer-guided audition for spatial soundscapes was investigated, automatically exploring a polyphonic area while using bio-signals as indicators of satisfaction. We propose a reinforcement learning (RL) method to discover the sound relaxation

“sweet spot” in a pantophonic soundscape. An avatar roams within a pantophonic space, surrounded by independent audio channels, while a human subject, listening through the avatar’s ears, is connected to an electroencephalographic (EEG) headset. Instead of changing position manually, a Deep Q Network (DQN) in reinforcement learning is used. The reward for the DQN agent was calculated from a derived formula using computed valence and arousal values from collected EEG signals. The performance of the DQN agent was evaluated using a simulator with virtual rewards before testing with human subjects. Test results of human trials were and encouraging, suggesting that DQN has the potential to automate such “sweet spot” exploration.

(10)

- 8 -

Summary of the Dissertation Review Result

This dissertation features several scientific contributions:

➢ 1. Novel feature extraction method called Inter-Hemispheric Amplitude Ratio (IHAR) with signal augmentation for EEG biometric authentication was proposed.

➢ 2. Subject-specific features can be extracted by EEG analysis.

2a. The state-of-the-art of subject-specific extraction methods, including biometric authentication, was surveyed.

2b. A new method was proposed to enhance musical relaxation according to personal taste using EEG and reinforcement learning (RL) techniques.

➢ 3. A novel one-class classification method for EEG biometric authentication based on autoencoders was also proposed. (unpublished)

Elaboration:

This work is devoted to analyzing and exploiting EEG signals for security and musical relaxation, as explored by two studies. In the first study, EEG signals were used as a biometric trait to authenticate access to restricted systems, which provides more security compared to traditional biometrics. As a second study, EEG signals were used as a measurement of relaxation to reinforce the discovery and exploration of the “sweet spot” in a pantophonic musical landscape.

➢ 1. Since most of the surveyed related studies achieved high accuracy, the goal was to increase user-friendliness by reducing the number of electrodes using a simple task to collect data. EEG signals were collected across different trial phases: relaxation, visual stimulation, and mental recall. The candidate introduced a novel derived feature, dubbed Inter-Hemispheric Amplitude Ratio (IHAR), which expresses the ratio of amplitudes of laterally corresponding electrode pairs. The extracted feature set was tested with several machine learning (ML) algorithms. Most of the ML algorithms showed 100% accuracy with 14 electrodes, and according to our results, perfect accuracy can also be achieved using as few as 4 electrodes.

➢ 2a. With recent technological advancement of EEG signal capturing devices, the process is getting comparatively simpler as devices are capable of providing better portability with reduced calibration time. However, most detailed analysis suggests that a minimal number of most appropriate channels should be selected for better results, even if a system is equipped with the most advanced hardware. The candidate reviewed several approaches, providing an overview of crucial design considerations in handling EEG data for extended accuracy and practical applicability to authentication.

➢ 2b. A system was developed to address the challenge of musical relaxation by using modern machine learning techniques. In the explored approach, computer-guided audition for spatial soundscapes was investigated, automatically exploring a polyphonic area while using bio-signals as indicators of satisfaction. The candidate proposed a reinforcement learning (RL) method to discover the sound relaxation “sweet spot” in a pantophonic soundscape. Instead of

(11)

- 9 -

changing position manually, a Deep Q Network (DQN) in reinforcement learning is used. The reward for the DQN agent was calculated from a derived formula using computed valence and arousal values from collected EEG signals. Test results of human trails were encouraging, suggesting that DQN has the potential to automate such “sweet spot” exploration.

➢ 3. A novel one-class classification method was proposed based on autoencoders to overcome weaknesses of EEG-based authentication systems. In most systems, classification models need to be retrained and a considerably large dataset has to be used. Autoencoders were trained for each system user in the proposed system that does not need to retrain the whole system when adding a new user. Signal augmentation techniques—jittering, amplitude warping, time warping, random downsampling, and permutation were used to expand the training dataset.

Evaluation results showed 97.1% accuracy with 0.91 AUC value for 8 frontal electrodes.

In the final review, the candidate presented a formal review of research results, followed by a Q&A session. There weren’t many remaining issues indicated by the referees for closer consideration. Some previously noted concerns about the generality of the results and coherence of the dissertation had been satisfied by extended testing making the conclusions more rigorous, and also connections between the two threads of the described research were better established.

(12)

- 10 - Name

氏名

YANG, Qinglin 楊青林 The relevant degree

学位の種類

甲CI博第85号

論文題目

Optimization of Resource Allocation for Edge Computing in Efficient Deep Learning Services

効率的なディープラーニングサービスを実現するエッジコンピューティングのためのリソース割り当ての最適化 Dissertation Review Committee Members

論文審査委員

The University of Aizu, Associate Prof. LI, P. (Chief Referee)

The University of Aizu, Prof. MIYAZAKI, T.

The University of Aizu, Prof. PHAM, A.

The University of Aizu, Senior Associate Professor, JING, L.

会津大学准教授李鵬（主査）

会津大学教授宮崎敏明

会津大学教授アントゥアンファン会津大学上級准教授荊雷

(13)

- 11 -

Abstract

Algorithmic breakthroughs of deep learning make people enjoy a more convenient as well as smarter mobile life. Convolutional Neural Networks (CNNs) is an important computation model for many popular mobile artificial intelligence applications.

As the accuracy is ensured by the CNN model, the performance of deep learning service mainly depends on the response time for handling user demands, which includes the network transmission time, task scheduling time, inference time (the execution time of the deep neural network (DNN) inference), and so forth. In response time, inference time usually occupies the dominant portion, especially for a complicated DNN model.

In other words, CNN inference, i.e., processing input data based on well-trained CNN models, is computation-intensive and incurs a heavy overhead for mobile devices with limited hardware resources (e.g., storage, battery). Compared a natural way to tackle this challenge is to employ cloud computing by offloading the computation tasks to remote servers which has two major concerns like high bandwidth requirements and transmission latency.

We find the discrepancy memory size of each layer for different CNN models which means mobile devices can conduct inference applications over a few layers so that the intermediate data smaller to reduce the transmission time. Therefore, in order to maximize the utilization of edge and mobile resources (computation resource, communication, and storage) in the efficient deep learning service, we first propose to offload a portion of CNN inference computation of mobile devices to the edge computing site due to the findings that batching tasks on GPUs can significantly reduce average inference time.

We design an algorithm that jointly considers the tasks on all mobile devices and the corresponding batching benefit on the edge site, different from existing work on the collaborative inference that lets each mobile device independently make offloading decisions. Different from simply offloading the whole task to the edge, we focus on partial offloading, i.e., the mobile devices conduct inference computation over a few layers of CNN models and then send the intermediate results to the edge computing site, which completes the computation of the rest layers. Such a kind of collaborative inference approach has been shown very effective in further reducing inference time and improving the energy efficiency of mobile devices. Finally, extensive simulations are conducted to evaluate the performance of our proposed algorithms and the results show they outperform existing work under different settings.

Subsequently, mobile devices become much powerful than ever with greater computing capacity and longer battery life to install deep learning-based applications.

As a supplement of MEC, when mobile devices are relatively close to each other it could be sensible to have a direct communication link instead of delivering data via a base station in order to achieve low latency and save transmission power of both devices and base stations, radio access as well as core network resources.

Many mobile devices now are able to collaborate on computing tasks by sharing their resources (e.g., CPU, GPU), which motivate us to aggregate workloads of mobile devices to efficiently accelerate the

(14)

- 12 -

deep learning inference process, considering the findings that batched workload with GPUs can reduce the overhead of GPU memory access.

To meet this demand, we propose to employ partial swarm optimization (PSO) which is a versatile population-based stochastic optimization technique, to help design our collaborative inference scheme.

Moreover, extensive simulations are implemented to evaluate the performance of the designed algorithm. By performance evaluation, we find that the collaborative inference scheme can reduce global dealing time in the given field compared with handling the data which is affected by the high transmission latency between mobile devices and the cloud.

Since the edge servers are not always keeping a high-load operation status which still consumes energy, in order to study the energy efficiency problem of edge nodes with switching ON/OFF (SO2) strategy by migrating the tasks, we propose to apply the Deep Reinforcement Learning-based method to tackle the challenge that we have no knowledge of incoming tasks that can be formulated into a sequential decision-making problem since the edge nodes continue to receive tasks from consumers that cause the workloads to vary in space and time. Furthermore, simulation experiments are

conducted to evaluate the proposed scheme, and the results show that the switching ON/OFF strategy can save energy compared with the general situations.

Summary of the Dissertation Review Result

The research topic of the candidate is about the efficient machine learning inference of mobile devices using edge computing. This is a new and important research direction, which is promising to have a big research impact. The draft is well organized and three main contributions have been clearly presented with satisfied writing skill. Some critical technical weaknesses pointed out in the previous round of reviewing have been addressed according to reviewers’ comments and suggestions. The candidate has improved the presentation by reorganizing the contents and adding more explanation about the research background and motivations. Based on the above facts, the candidate has satisfied the graduate requirements and the committee supports him to pass the final review.

(15)

- 13 - Name

氏名

SHRESTHA, Shashank スレスタササンカ The relevant degree

学位の種類

甲CI博第86号

論文題目

Open Data Integration Through Polystore Data Management Systems

ポリストアデータ管理システムによるオープンデータ統合 Dissertation Review Committee Members

論文審査委員

The University of Aizu, Prof. BHALLA, S. (Chief Referee)

The University of Aizu, Prof. VAZHENIN, A.

The University of Aizu, Prof. TEI, S.

The University of Aizu, Associate Prof. MOZGOVOY, M.

会津大学教授サバシュバーラ（主査）

会津大学教授アレクサンダーヴァジェニン会津大学教授程子学

会津大学准教授マキシムモズゴボイ

(16)

- 14 -

Abstract

Recently, there is growing interest in the database community to manage large scale unstructured data from multiple heterogeneous data stores. Special attention is garnered to this problem due to the size of data, the speed of increment of data and the emergence of various data types in different scientific data archives. There are various solutions that federate queries over multiple data sources using a single data model. The concept of Polystores, Multistore, Polyglot systems, have been proposed that provide solutions to integrate queries from a single data model. Moreover, the emergences of open data and linked data in recent years have unlocked new research requirements and challenges.

This work is devoted to study the models of data integration, analyze them and incorporate them into a system to manage linked open data provided by astronomical domain. The past models of data

integration have provided an evaluation framework to understand and evaluate the current study.

A polystore system was developed to manage user workflows and visualize the data provided by astronomical domain. Moreover, astronomy as a scientific domain produces huge amount of data which is stored in the data archives provided by NASA and their subsidiaries. The data type mostly consists of images, unstructured texts and structured (relations, key-values). This thesis articulates the problems of integrating multiple data stores to manage heterogeneous data and a Polystore architecture as a solution. A method of managing a local data store and communicating with a remote cloud data store with the help of a web-based query system is defined.

Summary of the Dissertation Review Result

The candidate has made all the necessary changes that were suggested during the final review. Reports regarding the significance of some articles included in the dissertation were submitted. The candidate has implemented the entire set of recommendations satisfactorily.

The dissertation examines and evaluates the topic of Open Data Integration. It considers the

application of data integration techniques in managing large scale astronomical data. A query system has been developed to access open data resources. It is demonstrated that the system facilitates in querying and visualizing the astronomical data with the help of different APIs provided among the site resources. The system design is compared with existing state-of-the-art Polystore Systems. A

comparison with existing systems and the evaluation of the query system is demonstrated with criteria discussed in Federated Database Management Systems. Different query examples and the user

interaction with the query system has been shown and discussed.

The study has adopted a state-of-the-art Big Data resource provided by Palomar Transient Factory (PTF). The data resources are supported for world-wide utilization by Astronomers in the new field of Time-domain astronomy. The candidate interacted with users from California Institute of Technology.

The model of Polystore system was presented to the audience from the field of Astronomy. The system supports an interactive query elicitation support. Its performance was discussed and its demonstration was presented to the visiting members in the area of astronomy. The committee evaluated the demonstrated queries and responses from users from the field of astronomy.

(17)

- 15 -

During the final doctoral dissertation review, the committee examined the responses of the preliminary dissertation review and new contents included by the candidate, in detail. The questions that were not answered or partially addressed were few and did not form a major exclusion of detail. Overall, the review committee members expressed favorable opinion that the candidate has acquired an ability to carry out independent research activities and provide the candidate with doctoral degree. There were some minor modifications required before submission of the finalized dissertation.

(18)

- 16 - Name

氏名

KHAUSTOV, Victor カウストフヴィクトル The relevant degree

学位の種類

甲CI博第87号

論文題目

Machine Learning-Based Team AI in a Cooperative Multi-Agent Game

協調型マルチエージェントゲームにおける機械学習ベースのチームAI

Dissertation Review Committee Members 論文審査委員

The University of Aizu, Associate Prof. MOZGOVOY, M.

(Chief Referee)

The University of Aizu,

Senior Associate Prof. YOSHIOKA, R.

The University of Aizu,

Senior Associate Prof. WATANOBE, Y.

The University of Aizu, Senior Associate Prof. PYSHKIN, E.

会津大学准教授マキシムモズゴボイ（主査）

会津大学上級准教授吉岡廉太郎会津大学上級准教授渡部有隆

会津大学上級准教授ピシキンエフゲニー

(19)

- 17 -

Abstract

AI researchers often rely on popular games as testbeds for evaluating new algorithms and approaches.

Sports games possess typical traits of a successful AI testbed: they are popular among the general public, they are easy to set up, and they provide sufficient challenge for AI, since the participants are expected to exhibit both athletic abilities and a certain level of tactical and strategic thinking.

In particular, sports games can be used to test the feasibility of machine learning algorithms in the domain of dynamic, game-like environments. Being a prevalent method of creating AI systems nowadays, machine learning is not commonly adopted for game AI systems, and its efficiency in this area still needs investigation.

The dissertation studies the process of constructing a working AI system based on machine learning, able to operate in a cooperative multi-agent game environment. The work outlines the stages of initial training data processing, automatic action markup, learning an agent model (based on a Markov decision process), and evaluation of the obtained agents. The training dataset is based on actual human tracking data, obtained with specialized video capturing and digitization equipment installed at sports arenas.

Experiments are being conducted using a simplified soccer game engine, thus the results can be considered directly applicable to soccer-like virtual environments with possible generalizations to adjacent areas. The resulting agents are evaluated on the basis of their ability to perform reliable ghosting, i.e., to recreate styles and skills of players, comprising the input dataset.

Our results prove that it is possible to convert human tracking data into a dataset with game events, suitable for training a machine learning-based AI system. We show that events can be detected reliably and represented in the game world with a sufficient degree of accuracy. We also demonstrate that the resulting AI system is able to accurately replicate the behavior of real soccer players according to a number of well-grounded metrics. These metrics are able to evaluate various aspects of AI behavior independently, which enables us to identify weaker aspects of our AI system and focus our efforts on them. Our experiments also prove that a machine learning-based AI provides a closer approximation of a human play style than a typical rule-based AI system.

The obtained results will be of interest to game AI developers, sports analysts, and the general AI community. A clear trend in the sports games genre is a convergence between the game and the reality, which in particular practice means the demand for more believable AI-controlled agents. While many traditional game AI systems are based on manual design, it is arguably difficult to create diverse and realistic AI-controlled sports teams without machine learning methods. Another clear trend is a more in-depth, thorough approach to sports analytics, based on detailed analysis of player tracking data.

Stronger teams of sports analysts can be now found not only in the world-class teams but also in teams of lower ranks. The ability of an AI system to mimic the behavior of teams and specific players can become an invaluable capability for these specialists. Finally, the creation of multi-agent coordinated behavior, emerging from the decision-making processes of individual participants, is an interesting topic for a wide audience of AI researchers.

(20)

- 18 - The dissertation is organized as follows:

Chapter 2 (Background and Related Works) is dedicated to the analysis of multi-agent sports-like game environments. We discuss distinctive features, characteristic for this particular type of game world, and how they affect the requirements for the AI systems controlling non-player characters. We study existing literature and identify the challenges associated with such environment. We explain the choice of a soccer-based testbed game project and explore related available systems, examining their goals and application areas. We also discuss the notion of “believability” and “fun” in game AI systems and their role in perceived quality of the resulting game project.

In Chapter 3 (Principles of Decision Making) we discuss specific machine learning methods used in our system. The choice of methods is not trivial in this case since they have to satisfy a number of criteria, including the ability to be trained on relatively small input data sets available to us. We outline how the learning process is organized, how agent knowledge is represented in computer memory, how similar case extraction is performed, and how the final decision-making process is organized. We provide grounds for our choice and briefly examine possible alternatives and options for future experiments.

The goal of Chapter 4 (Data Preprocessing) is to provide an in-depth review of tasks associated with preprocessing the source human tracking data to make it a suitable input for a machine learning algorithm. Tracking data of soccer players is captured with specialized hardware installed at a stadium.

It provides a series of digitized game snapshots that lack game event information, crucial for AI decision making. Thus, these events have to be reconstructed and represented in a form applicable in a game world (which, in turn, should be seen as a rough approximation of a real soccer game rather than an accurate simulation). We discuss challenges associated with the data processing step and describe our approach to address them.

The final Chapter 5 (Evaluation Results) provides an account of our experiments performed in the course of designing and improving our AI system. We discuss the configuration of the AI agent (its feature space, its method of ranking probabilistic decisions, its choice of decision-making points, etc.), possible improvements, and the methods of evaluating AI behavior. We outline the set of metrics that can be used to assess the conformance of the AI system to the target play style, the basis of their choice, and their reliability. We discuss challenges associated with the evaluation process and describe our current results. We consider both single-player and multi-player scenarios, where either only the player currently possessing the ball is evaluated, or evaluation is made for the whole team.

(21)

- 19 -

Summary of the Dissertation Review Result

The dissertation studies the process of constructing a working AI system based on machine learning, able to operate in a cooperative multi-agent game environment. The work outlines the stages of initial training data processing, automatic action markup, learning an agent model (based on a Markov decision process), and evaluation of the obtained agent.

Experiments are being conducted using a simplified soccer game engine, thus the results can be considered directly applicable to soccer-like virtual environments with possible generalizations to adjacent areas. The resulting agents are evaluated on the basis of their ability to perform reliable ghosting, i.e., to recreate styles and skills of players, comprising the input dataset.

The principal achievements of this work are published in literature. Still, certain experiments

evaluating the quality of multi-agent behavior are in progress. The review committee confirms that the work is devoted to an important topic having numerous possible spillover effects and high potential impact in both gaming and non-gaming contexts. The author demonstrated a complete pipeline of an AI creation process, covering steps from preliminary data analysis to the evaluation of the resulting system according to its capability to “ghost” the original actors.

The committee also notes high enthusiasm of the candidate towards research activities in general and interest to the current topic in particular. The candidate shows appropriate attitude to the process of scientific enquiry, and is able to follow established practices. He is also good at establishing contacts and contributing to team projects. The candidate demonstrates excellent communication abilities. He efficiently expresses his ideas and answers the questions without hesitation. We had a very productive discussion during the final review session, and were able to confirm all necessary requirements for a doctoral degree.

(22)

- 20 -

博士学位論文 Doctoral Dissertation

内容の要旨及び審査結果の要旨 Dissertation Abstract

and

Summary of the Dissertation Review Result

第37号

The Thirty-Seventh Issue

2021年3月 March, 2021

発行会津大学

〒965-8580 福島県会津若松市一箕町鶴賀 TEL: 0242-37-2600

FAX: 0242-37-2526 THE UNIVERSITY OF AIZU Tsuruga, Ikki-machi Aizu-Wakamatsu City

Fukushima, 965-8580 Japan

博 士 学 位 論 文