Conclusions - JAIST Repository: テキスト自動要約翻訳の統計的機械学習アプローチに関する研究

Figure 6.6: Examples of multiple sentence reduction output using statistical machine learning

Chapter 7 Machine Translation in Cross Language Text Summarization

This chapter introduces an example based machine translation method with the use of shallow information obtained from a chunking process. The proposed translation method can be applied for both translation and reduction in a CLTS system. We call this adaptive translation in CLTS, or chunking based example based machine translation(CEBMT).

7.1 Introduction

As described in chapter 2, the problem of cross language text summarization is how to combine a translation engine with a mono-language summarization system. Meanwhile, the translation engine is designed to translate whole sentences, not phrases. It does not perform as well when the input is a list of separate phrases, while summary outputs in a summarization system can often be in the form of a list of phrases. Our main idea here is with developing a translation engine which can be applicable applied for both the whole sentences and a list of sperate phrases. We also investigate the use of this method for reducing an input sentence that does not need any parsers. For this reason, we focus on using example based machine translation method in CLTS.

For convenience, let us summary the idea of example based machine translation (EBMT) as follow. EBMT originally proposed by Nagao [107], is one of the main ap-proaches of corpus-based machine translation. Following Nagao’s original proposal, sev-eral methods based on EBMT were presented [108], [109], [110], [111]. The excellent review paper of EBMT [112] described the main idea behind EBMT as follows. A given input sentence in a source language is compared with the example translations in the given bilingual parallel text to ﬁnd the closet matching examples so that they can be used in the translation of that input sentence. After ﬁnding the closest matching for the sen-tence in the source language, parts of the corresponding sensen-tence in the target language are constructed using structural equivalences and deviances in those matches.

One of the approaches which applied successful to translation from English to Turkish is the translation template learning method (TTL)[113][23]. This algorithm relies on the technique using the similarity and diﬀerence between two translation examples in the bilingual corpus to build template rules for translation. The advantage of this method is that it only uses the morphological parsing in the source sentence and the target sentence for both learning phase and translation phase in the machine translation system. In the

learning phase, it can generate template rules automatically in a simple way. In the translation phase, it shows a good translation result using those template rules learned from a small bilingual corpus as reported in[23].

In this paper, we focus on investigating the use of translation template learning method to apply for both translation and sentence reduction in CLTS system. We also study the application of this method to the problem of sentence reduction. Intuitively, when con-sidering long sentences as a source language and reduced sentences as a target language, the problem of sentence reduction is equivalent to the translation problem.

Although the translation template system is suitable as well to machine translation, there are some drawbacks as follows.

In the learning phase, with the lack of linguistic knowledge the amount of template rules using translation template learning is large and some of them causes the translation wrong. It is clear that linguistic information is useful for translation. In addition, unreli-able rules may reduce the performance of translation in both accuracy and computational times. Incorporating linguistic knowledge into template rules is therefore an expected approach. However, the problem here is how to obtain linguistic knowledge and how to incorporate them into translation template learning. The recent study using named entity information has been shown an improvement of the translation results [114]. But this work is only used to encode examples for the source language and has not applied to TTL method.

On the other hand, shallow parsing has been applied successfully to various nature language processing applications because of their accuracies as well as the easy implement-ing on other language than English. Many applications on natural language processimplement-ing has been used shallow parsing as the way to obtain linguistic information of language.

In this paper, we propose a novel translation template learning using shallow parsing to incorporate linguistic information to template rules.

In the translation phase, the advantage of this method is that it does not need any com-plex parsing such as syntactic parsing or semantic parsing and overcome the imperfectness of the rule-based machine translation. The disadvantages of the method are that a lot of templates can be matched with an input sentence and some of them cause the translation results are not conﬁdent. To overcome this problem, ¨Oz [115] presents a method which allows sorting template rules according to their conﬁdent factors. The translation results are sorted using their scores through the value of conﬁdent factors. However, this method needs to evaluate all matching rules for each input sentence to obtain the output results, while much of them are redundant rules. The exponential calculation problem will arise when an input sentence is long and the number of template rules is large. Following that point, we present a novel method based on a HMM model that uses constraints for set of matching rules with each input sentence. Thus, the translation results of an input sentence are obtained by ﬁnding a set of template rules that is most likely with our HMM model.

The rest of this chapter is organized as follows. Section 7.2 introduces an architecture of our chunking-based example-based machine translation system. Section 7.3 presents a learning phase in the architecture by describing an algorithm of shallow translation template learning. Section 7.4 gives a template translation learning using shallow pars-ing. Section 7.5 presents a translation phase that describes the template translation, the shallow template translation, and the combination of them with HMM model. Section 7.6 introduces the application of CEBMT to sentence reduction. Section 7.7 gives some

experimental results and section 7.8 shows our conclusions.

ドキュメント内 JAIST Repository: テキスト自動要約翻訳の統計的機械学習アプローチに関する研究 (ページ 81-84)