事例分析 - 実験 - 修士論文流暢性・意味保存性を考慮したニューラル文法誤り訂正

3.3 実験

3.3.5 事例分析

訂正を反復することで出力が改善した例3つを表 10に示す．例 (a)には2箇所の文法誤りと1箇所の綴り誤りがあり，訂正モデルは1度目の訂正で2つの文法誤りに関しては正しく訂正できたが，綴り誤りに関しては訂正できず，さらに綴り誤りの影響を受けて“litterature”の前に不要なtheを挿入した，しかし2回目の訂正で“litterature”の綴り誤りを修正し，3回目の訂正で“the”を削除して出力を改善できている．例 (b)では，綴り誤りと語形誤りが複合した誤りを段階的に訂正できた例で，1回目の訂正で綴り誤りを訂正したことで2回目で副詞の

“obviously”から形容詞“ovbious”に訂正できた．例 (c)では，1回目の訂正で後半の“thy”を“they”に直せたため，2回目で前半にある“Thy”や“there self”を

“They”と“themselves”にそれぞれ訂正することに成功している．

一方，訂正を繰り返すことで性能が悪くなった例を表11に示す，例 (d)1回目は適切な訂正であるが，2回目の訂正でtheを誤って挿入したためF0.5スコアが低下した．例(e)の2回目の訂正のように，文をさらに流暢にしようとするあまり正誤の判定が難しい訂正を行っているような例も見られた．

4 終わりに

本研究では，人間らしい文法誤り訂正のために，まず参照無し自動評価手法を提案した．提案手法はベンチマークデータ上において従来の参照有り評価よりも正確な評価を行うことができた．また，提案手法は訂正システムの性能の向上に役立つ可能性を示した．次に，文法誤り訂正において訂正を繰り返すことによる効果を調査した．文法誤り訂正では誤りを多く含むような文を一度に全て正しく訂正するのは困難な場合があり，そのような文が繰り返し処理により改善されていくことが期待されたが，実験の結果，効果は限定的なものであり，多くの文では 2回目以降の訂正が行わないことがわかった．一方で，訂正が悪化する事例よりは改善する事例の方が多く見られた．また，事例分析を通して訂正システムの今後の課題を考察した．

謝辞

本研究を進めるにあたり，多くの皆様のご協力，ご助言をいただきましたことに，

ここに心より感謝申し上げます．主指導教員である乾健太郎教授には，ご多忙の

中，研究活動だけでなく進路に関することなど多くのご指導，ご助言を頂きましたことに心より感謝申し上げます．副指導教員である鈴木潤准教授には，同じく研究活動に関して多くのご助言を頂きましたことに心より感謝申し上げます．ご多忙の中審査委員をお引き受けくださいました，張山昌論教授，篠原歩教授に心より感謝申し上げます．研究方針や研究手法，論文執筆に関しまして，直接のご指導を頂いた水本智也特任研究員に心より感謝申し上げます．研究会などで多くのご助言を頂いた松林優一郎研究員に心より感謝申し上げます．また，日々の議論におきまして，多くのアドバイスを頂きました乾・鈴木研究室の皆様に感謝申し上げます．最後になりましたが，学校生活におきまして関わってくださいましたすべての皆様に感謝致します．

References

[1] Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. There’s no com-parison: Reference-less evaluation metrics in grammatical error correction.

InProceedings of the 2016 Conference on Empirical Methods in Natural Lan-guage Processing, pp. 2109–2115. Association for Computational Linguistics, 2016.

[2] Keisuke Sakaguchi, Courtney Napoles, Matt Post, and Joel Tetreault. Re-assessing the goals of grammatical error correction: Fluency instead of gram-maticality. Transactions of the Association for Computational Linguistics, Vol. 4, pp. 169–182, 2016.

[3] Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceed-ings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics, 2005.

[4] Ondˇrej Bojar, Yvette Graham, and Amir Kamran. Results of the wmt17 metrics shared task. In Proceedings of the Second Conference on Machine Translation, pp. 489–513. Association for Computational Linguistics, 2017.

[5] Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Edward Gillian. Hu-man evaluation of grammatical error correction systems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 461–470. Association for Computational Linguistics, 2015.

[6] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation. InProceedings of 40th Annual Meeting of the Association for Computational Linguistics, pp.

311–318, 2002.

[7] Daniel Dahlmeier and Hwee Tou Ng. Better evaluation for grammatical error correction. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 568–572, 2012.

[8] Mariano Felice and Ted Briscoe. Towards a standard evaluation method for grammatical error detection and correction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computa-tional Linguistics: Human Language Technologies, pp. 578–587. Association for Computational Linguistics, 2015.

[9] Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault.

Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Process-ing (Volume 2: Short Papers), pp. 588–593. Association for Computational Linguistics, 2015.

[10] Robert Dale and Adam Kilgarriﬀ. Helping our own: The hoo 2011 pilot shared task. InProceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pp. 242–249. Associ-ation for ComputAssoci-ational Linguistics, 2011.

[11] Robert Dale, Ilya Anisimoﬀ, and George Narroway. Hoo 2012: A report on the preposition and determiner error correction shared task. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pp. 54–62. Association for Computational Linguistics, 2012.

[12] Ondˇrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, and Marcos Zampieri. Findings of the 2016 Conference on Machine Translation. InProceedings of the First Confer-ence on Machine Translation, pp. 131–198. Association for Computational Linguistics, 2016.

[13] Ondˇrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Ru-bino, Lucia Specia, and Marco Turchi. Findings of the 2017 conference on

machine translation. In Proceedings of the Second Conference on Machine Translation, pp. 169–214. Association for Computational Linguistics, 2017.

[14] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. A study of translation edit rate with targeted human an-notation. In In Proceedings of Association for Machine Translation in the Americas, pp. 223–231, 2006.

[15] Marcin Mi lkowski. Developing an Open-source, Rule-based Proofreading Tool. Software: Practice and Experience, Vol. 40, No. 7, pp. 543–566, 2010.

[16] Michael Heilman, Aoife Cahill, Nitin Madnani, Melissa Lopez, Matthew Mul-holland, and Joel Tetreault. Predicting grammaticality on an ordinal scale.

In Proceedings of the 52nd Annual Meeting of the Association for Compu-tational Linguistics (Volume 2: Short Papers), pp. 174–180. Association for Computational Linguistics, 2014.

[17] Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. Jfleg: A fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Compu-tational Linguistics: Volume 2, Short Papers, pp. 229–234. Association for Computational Linguistics, 2017.

[18] Robert Parker, David Graﬀ, Junbo Kong, Ke Chen, and Kazuaki Maeda.

English Gigaword Fifth Edition LDC2011T07. Philadelphia: Linguistic Data Consortium, 2011.

[19] Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, and Martin Chodorow. TOEFL11: A Corpus of Non-Native English. Technical report, Educational Testing Service, 2013.

[20] Jey Han Lau, Alexander Clark, and Shalom Lappin. Unsupervised prediction of acceptability judgements. InProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.

1618–1628. Association for Computational Linguistics, 2015.

[21] Tom´aˇs Mikolov. Statistical language models based on neural networks. Pre-sentation at Google, Mountain View, 2nd April, 2012.

[22] BNC Consortium. The British National Corpus. version 3 (BNC XML Edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium, 2007.

[23] Michael Denkowski and Alon Lavie. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380. Association for Computational Linguistics, 2014.

[24] Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Ray-mond Hendy Susanto, and Christopher Bryant. The conll-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Confer-ence on Computational Natural Language Learning: Shared Task, pp. 1–14.

Association for Computational Linguistics, 2014.

[25] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill™: A bayesian skill rating system. In B. Sch¨olkopf, J. C. Platt, and T. Hoﬀman, editors, Advances in Neural Information Processing Systems 19, pp. 569–576. MIT Press, 2007.

[26] Christopher Bryant and Hwee Tou Ng. How far are we from fully automatic high quality grammatical error correction? In Proceedings of the 53rd An-nual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1:

Long Papers), pp. 697–707. Association for Computational Linguistics, 2015.

[27] Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. Assessing the ability of lstms to learn syntax-sensitive dependencies.Transactions of the Association for Computational Linguistics, Vol. 4, pp. 521–535, 2016.

[28] Roman Grundkiewicz and Marcin Junczys-Dowmunt. Near human-level per-formance in grammatical error correction with hybrid machine translation.

In Proceedings of the 2018 Conference of the North American Chapter of

the Association for Computational Linguistics: Human Language Technolo-gies, Volume 2 (Short Papers), pp. 284–290. Association for Computational Linguistics, 2018.

[29] Shamil Chollampatt and Hwee Tou Ng. A multilayer convolutional encoder-decoder neural network for grammatical error correction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp. 5755–

5762, 2018.

[30] Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, and Ken-neth Heafield. Approaching neural grammatical error correction as a low-resource machine translation task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguis-tics: Human Language Technologies, Volume 1 (Long Papers), pp. 595–606.

Association for Computational Linguistics, 2018.

[31] Tao Ge, Furu Wei, and Ming Zhou. Fluency boost learning and inference for neural grammatical error correction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1055–1065. Association for Computational Linguistics, 2018.

[32] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N.

Dauphin. Convolutional sequence to sequence learning. CoRR, Vol.

abs/1705.03122, , 2017.

[33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, NIPS, pp. 5998–6008. Curran Associates, Inc., 2017.

[34] Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, and Niki Parmar. Weakly supervised grammatical error correction using iterative decoding. CoRR, Vol. abs/1811.01710, , 2018.

[35] Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, and Yuji Mat-sumoto. Mining revision log of language learning sns for automated japanese error correction of second language learners. In Proceedings of 5th Interna-tional Joint Conference on Natural Language Processing, pp. 147–155. Asian Federation of Natural Language Processing, 2011.

[36] Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. Building a large an-notated corpus of learner english: The nus corpus of learner english. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications.

[37] Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel R. Tetreault.

GLEU without tuning. CoRR, Vol. abs/1605.02592, , 2016.

[38] Shamil Chollampatt and Hwee Tou Ng. Neural quality estimation of gram-matical error correction. InProceedings of the 2018 Conference on Empiri-cal Methods in Natural Language Processing, pp. 2528–2539. Association for Computational Linguistics, 2018.

[39] Tao Ge, Furu Wei, and Ming Zhou. Reaching human-level performance in automatic grammatical error correction: An empirical study. CoRR, Vol.

abs/1807.01270, , 2018.

[40] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. En-riching word vectors with subword information. Transactions of the Associ-ation for ComputAssoci-ational Linguistics, Vol. 5, pp. 135–146, 2017.

発表文献一覧

学術論文誌

• 浅野広樹,水本智也, 乾健太郎. 文法性・流暢性・意味保存性に基づく文法誤り訂正の参照無し評価. 自然言語処理, Vol.25, No.5, pp.555-576, December 2018.

国際会議論文

• Hiroki Asano, Tomoya Mizumoto, and Kentaro Inui. Reference-based Met-rics can be Replaced with Reference-less MetMet-rics in Evaluating Grammat-ical Error Correction Systems. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP), pp.343-348, November 2017.

国内会議・研究会論文

• 浅野広樹, 水本智也, 松林優一郎, 乾健太郎. 文法誤り訂正の文単位評価におけるリファレンスレス手法の評価性能. 第4回自然言語処理シンポジウム・第234回自然言語処理研究会, 8 pages, December 2017.

• 浅野広樹, 水本智也, 乾健太郎. 文法誤り訂正のためのリファレンスレス評価. 言語処理学会第23回年次大会, pp.947-950, March 2017.

• 浅野広樹, 水本智也,乾健太郎. 文法誤り訂正における反復訂正の効果検証．

言語処理学会第25回年次大会，March 2019 (to appear).

Table 10: スコアが改善された例訂正

回数文

0 People tends to choose other medias , and that is why litterature is in danger .

(a) 1 Peopletendto choose othermedia, and that is whythe litterature is in danger .

2 People tend to choose other media , and that is whythe literature is in danger .

3 People tend to choose other media , and that is why ϕ literature is in danger .

0 On one side , it is obvioualy that many advantages have been brought to our lives .

(b) 1 On one side , it is obviously that many advantages have been brought to our lives .

2 On one side , it isobviousthat many advantages have been brought to our lives .

0 Thy are busyin there self ,thy dont spend time to help the society that they live in .

2 They are busy themselves , they do n’t spend time to help the society that they live in .

ドキュメント内修士論文流暢性・意味保存性を考慮したニューラル文法誤り訂正 (ページ 30-40)