Conclusions - 芝浦工業大学学術リポジトリ

This dissertation presented a novel search engine system that utilized ontology and a relational database, including proposing several methods for graph image information extraction. My main objectives of this dissertation were presented as follows:

• To narrow the problem of semantic gap.

• To distinguish the graph types and propose a new graph-type classification system.

• To extract and locate the graph components.

• To suggest a new solution of Epsilon estimation for DBSCAN.

• To design ontology for a semantic-based OCR-error correction system and search engine.

• To extract extended information from the data section of the graph.

• To create a prototype of ontology-based graph search engine system.

• To evaluate the ontology-based search engine system with a traditional search engine system.

I principally addressed the problem of semantic gap. I conducted several experiments and evaluated the obtained results. It clearly showed that the system can identify and extract information from graph image. Moreover, the information were included into ontology integrated to the search system. As the results, my system can provide the information to users via ontology. Since, I guarantee that the problem of semantic gap was already solved by this research.

To achieve the objective of the system presented in Chapter 3, I introduced a new graph type classification system using several independent techniques to prepare and classify data, such as DFT, Hough transformation, and wavelet transformation.

This system contributed benefits and support to my search engine system. To effec-tively seek specific results, this was necessary to divide the graph types beforehand and extract significant information based on particular types. It supported three graph types: bar graph, line and plot graphs, and pie chart. However, the pie chart was uninvolved to the search engine system. Summing up the results, the accuracy from the proposed method reached to 0.91. This was an evidence of high performance system.

For the graph component extraction system presented in Chapter 4, I proposed a method to identify and extract the graph components from the graphs, such as X-title, Y-title, and optionally legend. To obtain X- and Y-titles, the method was quite straightforward because they usually locate at the bottom and left sides of the graphs, as opposed to legend. It may or may not locate in the graph. To detect it, I used DBSCAN to capture and group the data based on data density. DBSCAN needs Epsilon and MinPts parameters that must be set before clustering. My system could provide suitable Epsilon automatically by analyzing data position. Moreover, after many graph components were retrieved, this system can identify which class

the image outputs belong to. Based on the results, it can be concluded that the research has been very successful because the accuracy rate for classification was up to 0.93.

To overcome the goal of the system presented in Chapter 5, I designed an ontology and construct a novel OCR-error correction system. After I obtained the graph components from the previous system, I used OCR to recognize and convert texts to digitalized data. However, the misrecognition might occur. This system coped this problematic by using a suggestion from the ontology. In experiments, I compared performance between ontology-based and edit distance-based OCR error corrections. As the results obtained from the experiments, I acquired high accuracy and F-measure: 0.84% and 0.86% respectively. Moreover, I considered about image noise that might be the critical factor to reduce the performance of the system.

I used the proposed graph component extraction to obtain cleaned outputs. The results showed that the noise ratio was decreased comparing to a tradition image partition around 0.19%.

To fulfill my target of the research in Chapter 6, I must extract graph informa-tion located in the data secinforma-tion. I proposed a new graph informainforma-tion extracinforma-tion to examine how high of bars, how trends of data, and significant relationships. More-over, I designed an ontology and database that support both OCR-error correction and search engine system. To evaluate the system, I set up some simulations based on possible questions asked by users. I observed the obtained results from each simulation.

I integrated all implemented systems into one main system to extract the total information I needed from the graphs as well as constructed ontology and database.

I programmed a web-based application applicable to search and query thought my constructed ontology created by all extractable graph information. Ten participants helped me to evaluate the systems. They could select specific questions, settings, and input some keywords. They considered the returned results as either relevance or irrelevance. I validated the performance between my ontology-based and ES-based search engine systems. As the results, I concluded that my ontology-based search engine systems provided better performance than the traditional one due to higher

F-measure obtained. Moreover, the result from a questionnaire was supported my conclusion.

Regarding the limitations of the study, this system had covered the data from computer science and biology domains. However, it was also applicable to other domains if we expanded the target data. Due to graph types and a kind of graph limited, it could express the information extracted only from bar graph and 2Dchart which were in a general graph structure.

In conclusion, I proposed the systems to extract the graphical and linguistic information from the graph image itself and its descriptions. The system provided the great performance measurements; since it proved that it can mitigate the seman-tic gap problem and achieve entire objectives. It clarified that the ontology-based search engine system provides precise and concise graph information outperform-ing than traditional search engine systems. The major contribution is not only the new method of ontology-based search engine system but also an ontology design supporting graph information and descriptions.

ドキュメント内芝浦工業大学学術リポジトリ (ページ 177-180)