• 検索結果がありません。

Is machine translation (MT) your friend or enemy? Challenges for English->Japanese translators

N/A
N/A
Protected

Academic year: 2022

シェア "Is machine translation (MT) your friend or enemy? Challenges for English->Japanese translators"

Copied!
53
0
0

読み込み中.... (全文を見る)

全文

(1)

Challenges for English into Japanese Machine Translation (MT): Can We Embrace MT for

Language Teaching?

Noon Lecture

University of Michigan, February 26, 2016

Takako Aikawa

Massachusetts Institute of Technology

(2)

Outline

1. A brief introduction of my background

1. A brief introduction of Machine Translation (MT) o Historical background of the development of MT o How does (Statistical) MT work?

2. Challenges for the English -> Japanese (EJ) MT o Open problems for MT in general

o Differences between English and Japanese

3. What can we do to improve the quality of MT for EJ translations?

o Crowdsourcing

o Controlled/Simplified English

4. Can MT be used for teaching Japanese?

o Ban, or Embrace MT?

5. Surprise! 

6. Q & A

(3)

What is a statistical machine translation (SMT)?

o Historical Development of MT

o How does SMT work?

(4)

A brief history of MT

http://www.smartling.com/blog/2012/04/20/a-brief-history-of- machine-translation/

(5)

• claims were made that the computer would replace most human translators

• 1~2 million words/hour with adequate commercial computers.

• When you would achieve this?

• If the experiments go well, then perhaps

within 5 years or so….

(6)
(7)

Moving to Statistical MT systems from hand-coded/rule-based MT

systems

Why Statistical? What’s wrong with

human rules?

(8)

Scalability

Rule-based MT: You need rich linguistic resources and computational linguists for

each language pair.

=> Too expensive and not scalable.

Statistical MT: All you need is bilingual parallel corpus data.

=> Resource light and so scalable.

Scalability and Sustainability

(9)

sustainability

Rule-based MT’s SMT

(10)

How does a statistical machine translation system work?

http://research.google.com/pubs/MachineTranslation

.html

(11)

Practical Use of MT (phone applications)

• http://translate.google.com/about/intl/en_ALL/

(12)

o Open problems for MT in general o English-Japanese specific issues

while presenting linguistic

differences between the two

languages

(13)

Open Problems for MT

(14)

Philipp Koehn

(15)

Lexical Ambiguity

Word ambiguity: many words have different senses depending on a given context

(e.g.)

Deposited money in my bank account.

Sitting on the bank of the Charles river.

Reading an interesting book.

Book your hotel as soon as possible.

(16)

VP-Attachment vs. NP-Attachment

John watched a bird [

PP

with binoculars].

|_______________| (vp-attachment) |________| (np-attachment)

John watched a bird [

PP

with feathers].

|_______________| (vp-attachment) |________| (np-attachment)

Syntactic Ambiguity

(17)

It’s raining cats and dogs.

どしゃ降りです。

How are you?

お元気ですか。

Idioms

Pronoun Resolution

Gender and Number Agreements for Pronouns in many languages

o MT has to know the antecedent of a pronoun for agreement purposes

o MT has to know the gender and the number of that antecedent

Human languages are ambiguous and still very

difficult for machine to handle for practical

use/purposes!

(18)

Why is the quality of the

English -> Japanese MT so bad?

(19)

Differences between English and Japanese

• SVO vs. SOV: Word Order

• Case-markers

• Pronouns

• Counters (Japanese)

(20)

SVO (head-initial) vs. SOV (head-final) Word Order

Causes problems for

word/phrase alignments

Taro read the book that he bought yesterday.

太郎は、きのう買った本を読んだ。

I think ………

………..……… と思う。

(21)

Case-markers=>Free Word Order

Taro ate the apple.

太郎がそのリンゴを食べた。

Taro ate the apple.

そのリンゴを太郎が食べた。

Taro gave Hanako the apple.

太郎が花子にそのリンゴをあげた。

Taro gave Hanako the apple.

そのリンゴを太郎が花子にあげた。

(22)

Postpositions

Taro ate the apple at school/at 3pm.

太郎がそのリンゴを学校で/午後3時に食べた。

Taro will come by train/by noon 。 太郎は電車で/正午までに来ます。

Taro ate the pie with his friends/with a knife.

太郎が友達と/ナイフでそのパイを食べた。

(23)

Pronouns

English: Overt Pronouns

Japanese: Empty Pronouns & the Principle of Avoid Pronouns

Pronoun resolution requires the understanding of a given context!

But MT is still at a sentence level….

The presence of empty pronouns in

Japanese causes a lot of challenges for the direction of Japanese -> English translation

as MT has to supply an appropriate

pronoun based on the context.

(24)

Which direction would be harder,

English -> Japanese or Japanese -> English?

Japanese is more ambiguous than English in many respects.

Translating an ambiguous language into an

unambiguous language is much harder.

(25)

Japanese Counters

Japanese counters (e.g., 本、人、冊 , etc.)

Let’s translate the following via Google Translate and Bing Translator.

I saw three dogs and two cats at his house.

I ate 3 cakes yesterday.

I read 4 books at the library today.

I have three students who are working for my project.

(26)

Google Translate (Oct. 10, 2014)

Google Translate (Oct. 29, 2014)

(27)

Bing Translator (Oct. 10, 2014)

Bing Translator (Oct. 29, 2014)

(28)

Other fundamental issues

• Word-breaking

o English: use a white space to indicate word-boundaries

o Japanese: needs a word-breaker for any sort of NLP/MT related tasks

• Proper Nouns

o How to read proper nouns

(e.g.) 高山 ( たかやま/こうざん/)

=> Takayama?/Koozan?

金山(かなやま/きんざん/

) =>Kanayzma?/Kinzan?

大山(おおやま/だいさん)

=>Ooyama?/Daisan?

国府(こくふ/こくぶ/こう)

=> Kokufu?/Kokubu?/Kou?

etc.

(29)

Why is the quality of the English ->

Japanese MT so bad?

Japanese and English

are so apart……

(30)

Differences in syntactic structures between two languages matter

English -> Japanese

Taro went to Chicago with his friend.

太郎が友達とシカゴに行った。

English -> Spanish

Taro went to Chicago with his friend.

Taro fue a Chicago con su amigo.

Japanese -> Korean

太郎が 友達 シカゴ 行った。

타로 가 친구 와 시카고 에 갔다.

(31)

What can we do to

improve the quality of English-Japanese MT?

Crowdsourcing (crowdsourcing of training data)

Controlled/Simplified English

(32)

“crowdsourcing”

( クラウドソーシング )

• http://en.wikipedia.org/wiki/Crowdsourcing

(33)

Turk Workers via Amazon

(34)

Training Data

The more the training data, the better the quality of an SMT system.

The cleaner the training data, the better the quality of an SMT system.

Companies like Google or Microsoft

are constantly looking for new

data!

(35)

Google Translate

(36)

Bing Translator

(37)

Duolingo

(38)

Controlled

English/Simplified

English

(39)

Boeing Simplified Technical English

(40)

Shannon Obrien

(41)

Impact on Controlled Language

http://mt-archive.info/MTS-2007-Aikawa.pdf

(42)

From:

http://conferences.tekom.de/fileadmin/tx_doccon/slides/477_Simplified_

English_and_MT_Best_Practices_for_Localization_Content_Optimizatio n_and_Simplification.pdf

(43)

やさしい日本語

(44)
(45)

Can MT be useful for teaching Japanese?

Many students are already using G-Translate for their homework, etc. etc.

今日と明日, 私はSection 3 の クラスに行きます。スケジューリングの競合があ ります。これは大丈夫ですか?

(46)

ありがとう。14:10~ は いいです。

今日はしゅくだいを与える。

はい、私は2時20分来ています! すぐにお会いし、相川先生!

私は来ていなかった、本当にごめんなさい。私のlabは、長いランチミーティングを 持っていました。

私はあなたを悩ませてごめんなさい。

Thank you. 2:10 is a good.

Today I give homework.

Yes, I am coming at 2:20!

And see you soon, Aikawa sensei!

I'm sorry that I didn't come. My lab has had a long lunch meeting.

I'm sorry for bothering you.

(47)

Can we embrace MT for language teaching?

If so, how?

(48)

http://ictforlanguageteachers.blogspot.com/2011/11/google- translate-friend-or-foe.html

Do we want to punish students for using Google Translate?

• Or do we want to embrace MT for language teaching?

(e.g.) We can use the output of Google Translate to raise our

students’ linguistic awareness. Ask our students to spot the

mistakes and explain why such mistakes have been made.

(49)

Thank you.

2:10pm works for me.

2:10pm is fine with me.

I can give you my homework today.

Yes, I will come at 2:20pm.

Yes, I will be coming at 2:20pm.

See you soon, Aikawa sensei.

Sorry that I didn't come.

I had a long lunch meeting at my lab.

Sorry to bother you.

We need to have more people.

There are many people who can describe the Japanese culture.

Ken wants Mary to listen to the tape.

I want you to study more.

Google Translate (Jan. 28, 2015)

ありがとう。

14:10は私のために動作します。

14:10には、私と一緒に大丈夫です。

今日はあなたに私の宿題を与えることができます。

はい、私は2時20分午後来る。

はい、私は2:20 pmに来るということだ。

、すぐに相川先生あなたを参照してください。

私が来なかったことを申し訳ありません。

私は私の研究室での長いランチミーティングを持っていた。

お邪魔して申し訳ありません。

私たちは、より多くの人々を持っている必要があります。

日本文化を記述することができる多くの人々があります。

ケンはメアリーはテープを聞くことを望んでいる。

私はあなたがもっと勉強したいと思います。

Registry/Style/Politeness Issues

Simple verbs (e.g., work, have, do, etc.)

Tense/Aspect

To Infinitive/Ing form Usage of Overt Pronouns

(50)

Google Translate (Jan. 28, 2015)

ありがとう。

14:10は私のために動作します。 (…works for me.)

14:10には、私と一緒に大丈夫です。 (…fine with me.) 今日はあなたに私の宿題を与えることができます。

はい、私は2時20分午後来る。

はい、私は2:20 pmに来るということだ。

、すぐに相川先生あなたを参照してください。

私が来なかったことを申し訳ありません。

私は私の研究室での長いランチミーティングを持っていた。(..had a long lunch meeting.)

お邪魔して申し訳ありません。

私たちは、より多くの人々を持っている必要があります。

日本文化を記述することができる多くの人々があります。

ケンはメアリーはテープを聞くことを望んでいる。(Ken wants Mary to …) 私はあなたがもっと勉強したいと思います。(I want you to …)

(51)

Web-based translation tools are chaning the way language is taught…

(52)

Should translation tools be embraced? Tolerated?

Banned?

"Best practices are evolving," Merschel said. "We're just at the beginning of the conversation. We don't want to ban them. We should embrace and examine new technology.

Some language instructors don't believe online translation tools are a threat to the learning process. ….but some are NOT.

We involve students in the learning process too. You need to use it at the right time and in the right way. It's like before you use a calculator, you've got to know how to add and subtract.”

"It's pretty easy to show them the limitations of the tools,"

Decades ago, foreign language instruction leaned heavy on literal translation and

memorization. But today, many language instructors eschew rote memorization in favor of interaction-based learning steeped in critical thinking and language immersion, Porter said.

And the language instructors of today didn't have Google Translate when they did their training.

"We're of a generation that learned language without access to these tools,”

http://today.duke.edu/2011/10/translation

(53)

Thank you!

Takako Aikawa

(taikawa@mit.edu)

参照

関連したドキュメント

According to immunohistochemical studies of gastric carcinomas, MT-MMP-1 is predominantly localized in and on carcinoma cells {30). On the other hand, gelatinase A is

goals are neither fixed nor predictable, and therefore this may not be effective.. H) Learning standard varieties, that is, British or American English often forces learners to

(1961) ‘Fundamental considerations in testing for English language proficiency of foreign students’ in Center for Applied Linguistics: Testing the English Proficiency of

The aim of this study is to improve the quality of machine-translated Japanese from an English source by optimizing the source content using a machine translation (MT) engine.. We

Our proposed method is to improve the trans- lation performance of NMT models by converting only Sino-Korean words into corresponding Chinese characters in Korean sentences using

Proceedings of EMEA 2005 in Kanazawa, 2005 International Symposium on Environmental Monitoring in East Asia ‑Remote Sensing and Forests‑.

This dissertation aimed to develop a method of instructional design (ID) to help Japanese university learners of English attain the basics of internationally

This dissertation aimed to develop a method of instructional design (ID) to help Japanese university learners of English attain the basics of internationally