• 検索結果がありません。

JAIST Repository: Improving Phrase-based Machine Translation using Splitting Clause and Phrase Reordering

N/A
N/A
Protected

Academic year: 2021

シェア "JAIST Repository: Improving Phrase-based Machine Translation using Splitting Clause and Phrase Reordering"

Copied!
3
0
0

読み込み中.... (全文を見る)

全文

(1)

Japan Advanced Institute of Science and Technology

JAIST Repository

https://dspace.jaist.ac.jp/

Title Improving Phrase-based Machine Translation using Splitting Clause and Phrase Reordering

Author(s) Nguyen, Vinh Van Citation

Issue Date 2008-03-04

Type Conference Paper

Text version publisher

URL http://hdl.handle.net/10119/8227 Rights

Description

JAIST 21世紀COEシンポジウム2008「検証進化可能電子 社会」= JAIST 21st Century COE Symposium 2008 Verifiable and Evolvable e-Society, 開催:2008年 3月3日∼4日, 開催場所:北陸先端科学技術大学院大学 , GRP研究員発表会 セッションA-2発表資料

(2)

Improving Phrase-based Machine

Translation using Splitting Clause

and Phrase Reordering

Name: Nguyen Vinh Van

Supervisor: Prof. Akira Shimazu

March 4 2008

1

Aim of Research

Phrase-based Statistical Machine Translation (PSMT) systems represent re-cently the state-of-the-art in statistical machine translation. However, these phrase-based models have some limitations. Firstly, with these models PSMT usually are powerful in word reordering within short distance, however, long distance reordering is still problematic. Secondly, syntactic transformations in the source or target languages are not captured. Consequently, our re-search aim focuses on exploiting and supplying linguistic knowledge to a PSMT system.

2

Proposed approach

Firstly, we consider the clause splitting in more detail. We find the very long and complicated sentences which are hard and costly to translate. Splitting these sentences into a set of smaller clauses could report many benefits for translation.

Secondly, reordering problem (global reordering) is one of the major prob-lems in machine translation, since different languages have different word order requirements. We focuses on researching the ordering problem and aiming to improve both the quality of translation and computation time for decoding. Our approach is a global reordering model.

(3)

3

Progress of 2007

For the first problem, we present the CRFs-based framework model for Clause splitting. We use rich linguistic knowledge and a new bottom-up dynamic al-gorithm for decoding. The experiments show that our results are competitive as the previous results. The result is presented in the paper[1][2].

For the second problem, we present the new method for reordering in phrase based statistical machine translation. The experimental results with English-Vietnamese pair show that our method outperforms better both the accuracy and speed than the baseline PSMT.

4

Future Direction

We will to investigate to modifications of appropriate learning algorithms into the first and second problem. The implementations and experiments will apply for English-Japanese and English-French. Another work, we will integrate clause splitting into the machine translation system.

5

Publication

Journal paper

[1] Nguyen, V.V, Nguyen, L.M, A. Shimazu. ”Clause Splitting with Con-ditional Random Fields”, to be submitted.

Conference paper

[2] Nguyen, V.V, Nguyen, L.M, A. Shimazu. ”Using Conditional Random Fields for Clause Splitting”, In Proceedings of Pacling-07, pp. 58-65.

参照

関連したドキュメント

The Gaussian kernel is widely employed in Radial Basis Function (RBF) network, Support Vector Machine (SVM), Least Squares Support Vector Machine (LS-SVM), Kriging models, and so

GoI token passing fixed graph.. B’ham.). Interaction abstract

Indeed, when using the method of integral representations, the two prob- lems; exterior problem (which has a unique solution) and the interior one (which has no unique solution for

Therefore, in this study, focusing on Japanese and English as the source and target languages, respec- tively, we address these questions by proposing sim- plified Japanese for

matching, partly-matching or no-matching. In case of the 100% matching, the terms in the two ontologies are considered as equivalent. For example, the Danish term “videregående

The aim of this study is to improve the quality of machine-translated Japanese from an English source by optimizing the source content using a machine translation (MT) engine.. We

Our proposed method is to improve the trans- lation performance of NMT models by converting only Sino-Korean words into corresponding Chinese characters in Korean sentences using

As fiber made of materials like glass, carbon, aramid and vinylon, are very high resistance to corro- sion, more and more attempts are being made to utilize continuous fiber