Preliminary - 本文 Thesis 総合研究大学院大学学術情報リポジトリ A1888本文

This section provides the general picture of classification-based instance matching. We also describe how a ranking factor can be included in the traditional classifier. Before getting to the detail, we note the frequently used symbols, which are listed in Table 6.1.

6.3.1 Classification-based instance matching

In classification-based instance matching, the input for learning algorithm as well as for prediction task is the examples representing the correlation of instances. To our best knowledge, the correlation reflects the similarity of the instances. Such similarity can be the literal similarities or the final matching score. Because the literal similarities provide more detail information so that the learning algorithm can find the optimal classifier, literal similarities are widely used to construct the examples. Based on the observation of existing classification-based instance matching systems, we illustrate a typical workflow that matches to all systems. Figure 6.1 depicts the workflow.

According to Figure 6.1, there are six components in the workflow.

• The first component is property alignment, which creates the property mappings.

This component is automatically or manually operated.

Co-references Property

alignment

Blocking Similarity

aggregator Labeled examples

Similarity

aggregator Classifier

Property

mappings Initial similarity

functions

Unlabeled examples RT

Similarity function generation

Labeled candidates

Unlabeled candidates RSLRSU

Learning phase

Resolution phase

Learning

Figure 6.1: A typical workflow of classification-based matching.

• The second component, similarity function generation, assigns the similarity met-rics to each property mappings and creates the initial similarity functions, which are later used to estimate the literal similarities.

• The third component is the blocking. This component generates the candidates, which are the pairs of instances having a possibility to be coreferent. The dates are divided into two parts, labeled and unlabeled sets. The labeled candi-dates are used to create the training data while to detect the coreferences among unlabeled candidates is the mission of the system.

• Using the initial similarity function generated in the second component, the fourth component, similarity aggregator, computes the literal similarities for all candi-dates and generates the examples for classification. The labeled and unlabeled candidates are used to construct the training and test data, respectively.

• Training data is input to the learning component to train a classifier.

• Finally, the last step is to apply the classifier on unlabeled examples, to predict their matching status.

As we earlier mentioned in Section 6.2, the examples can be the pairwise instances ofR_S and R_T. However, the blocking step is installed in this workflow for generalization. If a system uses all pairwise instances, the blocking step simply returns all of them without any filter.

The above workflow is very similar to the first 4 steps ofScLink. The key differences are thelearning and theclassifier. Classifier can be any model with the function that maps

each unseen example into the class positive or negative. Classification is a extensively studied area and therefore, plenty of classifiers are available. Some examples of the classifier for instance matching are J48 decision tree, Support vector machine (SVM), Random forest, and Multilayer perception, which have shown remarkable performances [74,81,114,134].

6.3.2 Ranking

In machine learning, the ranking is the problem of predicting the order of examples. For example, to predict the preference of an example among two or a larger collection of examples. This task is named learning to rank. As reviewed in Section 2.6.7, learning to rank consists of three approaches: pointwise, pairwise, and listwise. Among them, pair-wise ranking is the most suitable form for instance matching. This approach induces the ranking of examples by considering the relationship of every possible pairs of instances.

Pointwise approach considers the ranking of each example independently, it is not suit-able for our goal in instance matching. Listwise approach finds the preference of an example over a sorted collection. In instance matching, a sorted collection like that is not available because only positive/negative labels are observable. Concretely, given the set ha, Bi, if a pair ha, bii is positive and the others are negative, the preference can be defined only for each of [ha, bii,ha, bji], rather than for all of [ha, bji,ha, bki], where i6=j 6=k. Shortly, if we can only make sure that the first ranked example is correct, the order of the remaining examples is left unknown. Therefore, it is not suitable to apply listwise approach as well.

The general idea of pairwise approach is to find a ranking function f so that the order of two examplesaandbcan be defined bysign(f(a)−f(b)) ={1,−1}, where 1 meansa has higher preference thanband −1 means otherwise. f can be represented in different forms. A typical formf is the linear combination of the input feature vector: f =w×x, where w is called parameters of f. Another common form of f is the logistic function f = _1+e¹−wx. The performance of the ranking is determined by the parameter w and tuning w using training data is the task of learning algorithm. Basically, the learning algorithm optimizesw by minimizing the following loss:

L(f, w,X) = ^X

a∈X

b∈X,ℓ(b)<ℓ(a)

G(f(a)−f(b)) (6.1)

whereG is a preference function. G can be a hinge functionG(z) = (1−z)₊, exponential functionG(z) =e^−z, or logistic function G(z) = _1+e¹−z. In order to simply minimizeL, G is usually chosen so that L is convex (e.g., above listed form ofG). In that case, the

global minimal ofLis possible and identifiable. The detail of optimization algorithm is given in Section 6.4.2 in the context of our proposed ranking feature R2M.

6.3.3 Combination of classification and ranking

It is possible to include the ranking factor in the classifier. To our best knowledge, CRR [142] is the only one work that focuses on the combination of classification and ranking.

The general idea of CRR is to produce a model that can score the examples so that the result can help to make the class prediction as well as is capable of ranking the examples.

The detail of this combination is as follows.

L^CRR(f, w,X) =αL^R(f, w,X) + (1−α)L(f, w,X) (6.2) whereL(f,X) is the ranking loss (Equation 6.1) andL^R(h,X) is the classification loss, which is the logistic regression:

L^R(f, w,X) = 1

|X | X

z∈X

h(ℓ(z), f(z)) (6.3)

where:

h(y, f(z)) =y×f(z) + (1−y)×(1−f(z)) (6.4) and f is a logistic function f(z) = _1+e¹−z.

The advantage of this method is that the learning process can simultaneously optimize the classifier and ranking factor. In addition, the impact of regression and ranking are controllable by adjusting the meta-parameter α ∈ [0,1]. The priority of regression is proportional to the increase of α. However, CRR still contains some drawback. First, the trained model f has to carry two tasks regression and ranking, but is configured by only one parameter w. It is potentially ineffective as the ideal models of regression and ranking may be very different. Second, it is difficult to extend the idea to combine ranking and another classifier. The training of function-based classifier (e.g., Logistic regression, SVM, and Neural Network) can be done by minimizing the loss function like Equation 6.3. However, for some type of models, for example tree-based (e.g., ID3, and J48) or rule-based (e.g. IDA and OneR), it is impossible to include the ranking into the trained model using the 1 linear combination like CRR.

In our work, we try to include the ranking factor in the manner that is independent of the mechanism of the classifiers. We propose to include the ranking factor as elements of the feature vector. That is, the values derived from ranking function f(x) can be

included in the original feature vectorxto create a new vectorx^′. Similarly, given a set of examplesX, another new set of examples is created and the classification problem is deployed on the new dataX’. More detailed, the learning process will contain two steps.

The first step is to learn the ranking functionf from the original dataX and the second step is to learn the final classifierMfrom the modified dataX^′ =X ∪f(X).

The advantage of this method is that the classification modelMis independent with the ranking model f. That is, the ranking model is compatible with any classifier so that it is possible to employ any classifier. Furthermore, because the parameters of classifier M and those of ranking model f are independent, M and f can better adapt to the characteristic of the data.

ドキュメント内本文 Thesis 総合研究大学院大学学術情報リポジトリ A1888本文 (ページ 140-144)