Discussion - 本文 Thesis 総合研究大学院大学学術情報リポジトリ甲1878本文

Table 3.8: Performance comparison between QALD-2 challenge participant systems and BoTLRet for DBpedia test questions

Total answered Recall Precision F1 Measure questions

SemSek 80 0.48 0.44 0.46

Alexandria 25 0.46 0.43 0.45

MHE 97 0.40 0.36 0.38

QAKiS 35 0.37 0.39 0.38

BoTLRet 75 0.94 0.94 0.94

Although BoTLRet is not fully comparable with the QALD-2 challenge partici-pant systems, because in BoTLRet we used manually devised keywords while the participant systems did not, we include this experiment to show how BoTLRet would perform, if the required keywords are given correctly. We consider that the state of the art tools such as language parser, machine learning based keyword finder, and entity linker can be used find the correct keywords. In Chapter 5, we discuss it in more details.

Therefore, languages that are less sensitive in word-order e.g., Bangla, Japanese etc. might be difficult to adapt in the proposed framework.

However, as we adapt greedy template management technique where we

• select the best template according to the dataset statistics

• merge adjacent templates with their best template

It automatically adjust the best possible word order of a keyword among the three adjacent keywords. For example, for three adjacent keywords{k₁, k₂ and k₃}, the best templates can be constructed for {k₁ and k2} or {k₂ and k3}, therefore, to some extent, the keyword order is automatically adjustable. However, if number of keywords are more than four, this automatic approach will not work. In such a case, we consider that users can still use our framework by adjusting the word-order of the query.

Furthermore, we also find an issue in keyword-based information access, which is a more general problem in such technique. While a natural language query is more clearly defined, a keyword-based query some time is vague and not clearly defined.

For example, keywords {Barack Obama, child} could be meant for children of Barack Obama, or they could meant for parents of Barack Obama. Over Linked Data information access, they need to handle with different keywords. For exam-ple, to know the children we need to pose the query with {Barack Obama, child}

while to know the parent we need to pose keywords{Barack Obama, parent} and so on.

3.5.2 BoTLRet Addressed Research Challenges

Over Linked Data information access, BoTLRet addresses first two research chal-lenges that we described in Chapter 1 Section1.2.1. They are about

• “Complex Competency” where we described that information access over Linked Data requires complex user competency, from data modeling to data queries. As a result, accessing of Linked Data is often not easy, especially for general-purpose users who have very little knowledge about the internal structure of Linked Data.

• “Costly Links” where we described that information access over Linked Data network requires following links, which turn Linked Data search as a sub-graph search. However, a subsub-graph search over a large sub-graph is a subsub-graph isomorphism problem and is a classical NP-complete problem. Moreover, the traditional graph queries are not able to capture Linked Data’s rich semantics. As a result, accessing of Linked Data requires new kind of data link following that can handle exponential link following complexity but still can capture Linked Data’s rich semantics.

The below we discuss about how BoTLRet address these two challenges.

• First, BoTLRet is a keyword-based systems, which addresses the first chal-lenge. The users of our framework do not need to define extra information such as data type, or any special query technique etc. for the keywords, which usually are required for systems like [30, 31, 40]. Furthermore, users do not need to think of Linked Data’s complex data structure in data query. As users are familiar and comfortable with natural language-based or keyword-based query technique [77,91], we consider that users of our frameworks will find them easy. Thereby, we address the first challenge.

Furthermore, although the current framework requires users to manually construct the keywords, which users can devise by learning the dataset. An automatic approach can be adapted for this, which we will describe in Chap-ter 5.

• Second, BoTLRet adapts template-based Linked Data information access technique, which addresses the second challenge. The contemporary sys-tems used templates to find data links over Linked Data [95, 107]. Usually templates reduce data link finding complexity. Moreover, they also incor-porate Linked Data’s rich semantics. These factors instigated us to adapt the template-based information access technique. However, when templates are constructed for input keywords, the contemporary systems fail to manage them effectively because most of them used language-based tools such as lan-guage parser which are not stable, and are not well defined [30,31,66, 107].

As a consequence, the contemporary systems sometime construct wrong tem-plates which eventually generate wrong or empty results. On the other hand, selection of all such possibilities are not be an efficient technique for retrieval of data over a large dataset [95].

In our proposal, we tackle the mentioned challenge with greedy template management technique. We analyze Linked Data structure and propose fifteen different templates (shown in Table3.1 and 3.2). The proposed tem-plate generation technique is stable and well defined. According to Linked Data statistics, we can guide that in what situation which template need to generate. It solve the problem of unstable template generation that exists in the contemporary systems [30, 31, 66, 107]. Moreover, template merg-ing technique that we proposed in Section 3.3 gives defined direction how to handle query with more than two keywords. It conforms information ac-cess requirement over Linked Data that guides that Linked Data information access needs to be done holistically (described in Section 1.2.1).

Moreover, the greedy template generation technique that we adapt in our frameworks is computationally effective. It reduces data link finding com-plexity. This is because, if we follow all possible template generation possi-bilities, we should propose at least forty nine different templates, we serve it by only fifteen templates (shown in Table 3.4). According to the results shown in Table3.4and 3.5), we conclude that, our proposed framework per-forms the same level of performance as an exhaustive system. It conper-forms completeness competency of this greedy template generation technique.

Furthermore, comparing information access performances over Linked Data with contemporary systems like GoRelations [40], SemSek [4], Alexandria [71], QAKiS [20] etc [67] and our proposed framework (shown in Table 3.8), we can see that our template generation technique works effectively. Thefore, we consider that our proposed framework can address the second re-search challenge comprehensively.

ドキュメント内本文 Thesis 総合研究大学院大学学術情報リポジトリ甲1878本文 (ページ 77-81)