6.2.1 Scale-space Filtering
The DE optimization process searches for the optimal values of player model param-eters that represent the best player in a particular game. When the optimization evolves the player model for that game, it searches for the parameter landscape that belongs to the game. The process hunts for the optimal evaluation point in the landscape given by the fitness evaluation function. When the game changes, the parameter landscape changes accordingly.
Our assumption for the changing landscape is that when the game slightly changes, its landscape will, just a little, change correspondingly. Fortunately, we can control the game difficulty with its game parameters. That means, for a specific game difficulty, once the optimization process finds the optimal parameter solu-tion, it should take less effort for the process to find a new optimal solution at the slightly-changed game difficulty.
This assumption is inspired by the canonical work of the late Andrew Witkin in 1983, named scale-space filtering [62]. As the first step of his proposed signal-processing technique, Witkin applied Gaussian smoothing repetitively with increas-ingσparameters over the same waveform. Fig. 6.2 shows the result of this sequential Gaussian smoothing. Each row represents a signal. The original signal locates at the bottom of the figure. The upper signals are Gaussian-smoothed version of the lower ones. The top row is the smoothest signal. It contains the largest Gaussian convolution (σ value) in the figure.
We can view this figure and the smoothing filter in the different way from top to bottom. Assume that the waveform in this figure is a parameter landscape. A smoother waveform at the upper row represents a parameter landscape for an easier game. In such landscape, it is quick and simple to locate the optimal position.
Later, in the lower row, the parameter landscape becomes more complicated in a harder game. We can start exploring the complicated landscape from this formerly optimal position. It will be quicker and simpler to find out the new optimal position than to start the landscape exploration from random points.
Using this analogy for our problem, it is simpler to start optimizing the player model in an easy-level game. We then continue optimizing the model with gradually-increasing game difficulty. This approach should take less optimization time or gain
more performance improvement at the target difficulty level, when compared to working straight on a hard-level game.
Figure 6.1: Sequential Gaussian smoothing of a waveform. The original waveform is on the bottom row. The upper waveforms are smoother than the lower ones, due to increasing σ values in Gaussian convolution.
Figure 6.2: Waveform as a searching parameter landscape. It is easier to find an optimal solution in a smoother searching landscape (upper waveforms). Searching around this optimal position may help to find a new optimal solution easier and faster in a more complex landscape (lower waveforms).
We call this approacha gradual incremental learning. It comes from an incremen-tal learning approach in machine learning (ML). The goal of incremenincremen-tal learning ML is to train the learning model to adapt to new data while still remember all its existing knowledge. This is to avoid retraining the model. It is also similar to our gradual incremental learning approach. The exception lies in the condition that we require a small number of differences in each searching landscape.
6.2.2 Issues in Gradual Incremental Learning
Basically, there are two major concerns when applying gradual incremental learning in parameter optimization. Both issues may result to differences in performance improvement on the optimization or the overall time required to optimize the system.
These two major concerns are:
• Sizing Policy: The sizing policy is related to the gradual requirement in the name of the approach. It is about the sizes of incremental changes that are suitable for the learning.
The suitable sizes in gradual incremental learning vary, in general, depending on the optimizing parameters. On the one hand, the size should be small enough to generate smooth transition in the searching landscape. This helps the optimization process to locate new optimal position easily. On the other hand, the size should be large enough to allow significant changes in the fitness evaluation.
• Timing Policy: The timing policy is how we determine the right time to apply the incremental changes. On the one hand, if we apply the changes too soon, the system may not be fully evolved. This may result to poor overall performance at the end of the optimization process. On the other hand, if we apply the changes too late, the system becomes less efficient due to a waste of time.
The simplest timing policy is a fixed-interval policy. We supply the incremental changes in a specific time period. The rule of thumb is that this time period should provide a balance between a proper system evolution and an efficient amount of time. Another approach for the timing policy is to use some other measures to manage the changing interval. Normally, the fitness evaluation in the optimization process helps to decide the appropriate time to change.
There are many implementations in the approach, e.g. making a change when the n-best fitness scores are stable for a specific time period, etc.
In this chapter, we set up experiments to explore the gradual incremental learning with a focus on timing policy. Due to a limited range in Star Trek game parameters, it is not suitable to use the game for the study of sizing policy. Although the system improvement is a main advantage of incremental learning ML, we do not
concentrate on the most improved standpoint. Rather, we are interested in a stable improvement, even at the minimum degree, that provides valuable feedbacks for the benefit of mutual evolution process. This is our key mechanism for automatic parameter tuning framework.