Challenges for English into Japanese Machine Translation (MT): Can We Embrace MT for
Language Teaching?
Noon Lecture
University of Michigan, February 26, 2016
Takako Aikawa
Massachusetts Institute of Technology
Outline
1. A brief introduction of my background
1. A brief introduction of Machine Translation (MT) o Historical background of the development of MT o How does (Statistical) MT work?
2. Challenges for the English -> Japanese (EJ) MT o Open problems for MT in general
o Differences between English and Japanese
3. What can we do to improve the quality of MT for EJ translations?
o Crowdsourcing
o Controlled/Simplified English
4. Can MT be used for teaching Japanese?
o Ban, or Embrace MT?
5. Surprise!
6. Q & A
What is a statistical machine translation (SMT)?
o Historical Development of MT
o How does SMT work?
A brief history of MT
http://www.smartling.com/blog/2012/04/20/a-brief-history-of- machine-translation/
• claims were made that the computer would replace most human translators
• 1~2 million words/hour with adequate commercial computers.
• When you would achieve this?
• If the experiments go well, then perhaps
within 5 years or so….
Moving to Statistical MT systems from hand-coded/rule-based MT
systems
Why Statistical? What’s wrong with
human rules?
Scalability
Rule-based MT: You need rich linguistic resources and computational linguists for
each language pair.
=> Too expensive and not scalable.
Statistical MT: All you need is bilingual parallel corpus data.
=> Resource light and so scalable.
Scalability and Sustainability
sustainability
Rule-based MT’s SMT
How does a statistical machine translation system work?
http://research.google.com/pubs/MachineTranslation
.html
Practical Use of MT (phone applications)
• http://translate.google.com/about/intl/en_ALL/
o Open problems for MT in general o English-Japanese specific issues
while presenting linguistic
differences between the two
languages
Open Problems for MT
Philipp Koehn
Lexical Ambiguity
Word ambiguity: many words have different senses depending on a given context
(e.g.)
Deposited money in my bank account.
Sitting on the bank of the Charles river.
Reading an interesting book.
Book your hotel as soon as possible.
VP-Attachment vs. NP-Attachment
John watched a bird [
PPwith binoculars].
|_______________| (vp-attachment) |________| (np-attachment)
John watched a bird [
PPwith feathers].
|_______________| (vp-attachment) |________| (np-attachment)
Syntactic Ambiguity
It’s raining cats and dogs.
どしゃ降りです。
How are you?
お元気ですか。
Idioms
Pronoun Resolution
Gender and Number Agreements for Pronouns in many languages
o MT has to know the antecedent of a pronoun for agreement purposes
o MT has to know the gender and the number of that antecedent
Human languages are ambiguous and still very
difficult for machine to handle for practical
use/purposes!
Why is the quality of the
English -> Japanese MT so bad?
Differences between English and Japanese
• SVO vs. SOV: Word Order
• Case-markers
• Pronouns
• Counters (Japanese)
SVO (head-initial) vs. SOV (head-final) Word Order
Causes problems for
word/phrase alignments
Taro read the book that he bought yesterday.
太郎は、きのう買った本を読んだ。
I think ………
………..……… と思う。
Case-markers=>Free Word Order
Taro ate the apple.
太郎がそのリンゴを食べた。
Taro ate the apple.
そのリンゴを太郎が食べた。
Taro gave Hanako the apple.
太郎が花子にそのリンゴをあげた。
Taro gave Hanako the apple.
そのリンゴを太郎が花子にあげた。
Postpositions
Taro ate the apple at school/at 3pm.
太郎がそのリンゴを学校で/午後3時に食べた。
Taro will come by train/by noon 。 太郎は電車で/正午までに来ます。
Taro ate the pie with his friends/with a knife.
太郎が友達と/ナイフでそのパイを食べた。
Pronouns
English: Overt Pronouns
Japanese: Empty Pronouns & the Principle of Avoid Pronouns
Pronoun resolution requires the understanding of a given context!
But MT is still at a sentence level….
The presence of empty pronouns in
Japanese causes a lot of challenges for the direction of Japanese -> English translation
as MT has to supply an appropriate
pronoun based on the context.
Which direction would be harder,
English -> Japanese or Japanese -> English?
Japanese is more ambiguous than English in many respects.
Translating an ambiguous language into an
unambiguous language is much harder.
Japanese Counters
Japanese counters (e.g., 本、人、冊 , etc.)
Let’s translate the following via Google Translate and Bing Translator.
I saw three dogs and two cats at his house.
I ate 3 cakes yesterday.
I read 4 books at the library today.
I have three students who are working for my project.
Google Translate (Oct. 10, 2014)
Google Translate (Oct. 29, 2014)
Bing Translator (Oct. 10, 2014)
Bing Translator (Oct. 29, 2014)
Other fundamental issues
• Word-breaking
o English: use a white space to indicate word-boundaries
o Japanese: needs a word-breaker for any sort of NLP/MT related tasks
• Proper Nouns
o How to read proper nouns
(e.g.) 高山 ( たかやま/こうざん/)
=> Takayama?/Koozan?金山(かなやま/きんざん/
) =>Kanayzma?/Kinzan?大山(おおやま/だいさん)
=>Ooyama?/Daisan?国府(こくふ/こくぶ/こう)
=> Kokufu?/Kokubu?/Kou?etc.
Why is the quality of the English ->
Japanese MT so bad?
Japanese and English
are so apart……
Differences in syntactic structures between two languages matter
English -> Japanese
Taro went to Chicago with his friend.
太郎が友達とシカゴに行った。
English -> Spanish
Taro went to Chicago with his friend.
Taro fue a Chicago con su amigo.
Japanese -> Korean
太郎が 友達 と シカゴ に 行った。
타로 가 친구 와 시카고 에 갔다.
What can we do to
improve the quality of English-Japanese MT?
Crowdsourcing (crowdsourcing of training data)
Controlled/Simplified English
“crowdsourcing”
( クラウドソーシング )
• http://en.wikipedia.org/wiki/Crowdsourcing
Turk Workers via Amazon
Training Data
The more the training data, the better the quality of an SMT system.
The cleaner the training data, the better the quality of an SMT system.
Companies like Google or Microsoft
are constantly looking for new
data!
Google Translate
Bing Translator
Duolingo
Controlled
English/Simplified
English
Boeing Simplified Technical English
Shannon Obrien
Impact on Controlled Language
http://mt-archive.info/MTS-2007-Aikawa.pdf
From:
http://conferences.tekom.de/fileadmin/tx_doccon/slides/477_Simplified_
English_and_MT_Best_Practices_for_Localization_Content_Optimizatio n_and_Simplification.pdf
やさしい日本語
Can MT be useful for teaching Japanese?
Many students are already using G-Translate for their homework, etc. etc.
今日と明日, 私はSection 3 の クラスに行きます。スケジューリングの競合があ ります。これは大丈夫ですか?
ありがとう。14:10~ は いいです。
今日はしゅくだいを与える。
はい、私は2時20分来ています! すぐにお会いし、相川先生!
私は来ていなかった、本当にごめんなさい。私のlabは、長いランチミーティングを 持っていました。
私はあなたを悩ませてごめんなさい。
Thank you. 2:10 is a good.
Today I give homework.
Yes, I am coming at 2:20!
And see you soon, Aikawa sensei!
I'm sorry that I didn't come. My lab has had a long lunch meeting.
I'm sorry for bothering you.
Can we embrace MT for language teaching?
If so, how?
http://ictforlanguageteachers.blogspot.com/2011/11/google- translate-friend-or-foe.html
• Do we want to punish students for using Google Translate?
• Or do we want to embrace MT for language teaching?
(e.g.) We can use the output of Google Translate to raise our
students’ linguistic awareness. Ask our students to spot the
mistakes and explain why such mistakes have been made.
Thank you.
2:10pm works for me.
2:10pm is fine with me.
I can give you my homework today.
Yes, I will come at 2:20pm.
Yes, I will be coming at 2:20pm.
See you soon, Aikawa sensei.
Sorry that I didn't come.
I had a long lunch meeting at my lab.
Sorry to bother you.
We need to have more people.
There are many people who can describe the Japanese culture.
Ken wants Mary to listen to the tape.
I want you to study more.
Google Translate (Jan. 28, 2015)
ありがとう。
14:10は私のために動作します。
14:10には、私と一緒に大丈夫です。
今日はあなたに私の宿題を与えることができます。
はい、私は2時20分午後来る。
はい、私は2:20 pmに来るということだ。
、すぐに相川先生あなたを参照してください。
私が来なかったことを申し訳ありません。
私は私の研究室での長いランチミーティングを持っていた。
お邪魔して申し訳ありません。
私たちは、より多くの人々を持っている必要があります。
日本文化を記述することができる多くの人々があります。
ケンはメアリーはテープを聞くことを望んでいる。
私はあなたがもっと勉強したいと思います。
Registry/Style/Politeness Issues
Simple verbs (e.g., work, have, do, etc.)
Tense/Aspect
To Infinitive/Ing form Usage of Overt Pronouns
Google Translate (Jan. 28, 2015)
ありがとう。
14:10は私のために動作します。 (…works for me.)
14:10には、私と一緒に大丈夫です。 (…fine with me.) 今日はあなたに私の宿題を与えることができます。
はい、私は2時20分午後来る。
はい、私は2:20 pmに来るということだ。
、すぐに相川先生あなたを参照してください。
私が来なかったことを申し訳ありません。
私は私の研究室での長いランチミーティングを持っていた。(..had a long lunch meeting.)
お邪魔して申し訳ありません。
私たちは、より多くの人々を持っている必要があります。
日本文化を記述することができる多くの人々があります。
ケンはメアリーはテープを聞くことを望んでいる。(Ken wants Mary to …) 私はあなたがもっと勉強したいと思います。(I want you to …)
Web-based translation tools are chaning the way language is taught…
Should translation tools be embraced? Tolerated?
Banned?
"Best practices are evolving," Merschel said. "We're just at the beginning of the conversation. We don't want to ban them. We should embrace and examine new technology.”
Some language instructors don't believe online translation tools are a threat to the learning process. ….but some are NOT.
“We involve students in the learning process too. You need to use it at the right time and in the right way. It's like before you use a calculator, you've got to know how to add and subtract.”
"It's pretty easy to show them the limitations of the tools,"
Decades ago, foreign language instruction leaned heavy on literal translation and
memorization. But today, many language instructors eschew rote memorization in favor of interaction-based learning steeped in critical thinking and language immersion, Porter said.
And the language instructors of today didn't have Google Translate when they did their training.
"We're of a generation that learned language without access to these tools,”
http://today.duke.edu/2011/10/translation