• 検索結果がありません。

Analyzing the Impact of Spelling Errors on POS- Tagging and Chunking in Learner English

N/A
N/A
Protected

Academic year: 2021

シェア "Analyzing the Impact of Spelling Errors on POS- Tagging and Chunking in Learner English"

Copied!
1
0
0

読み込み中.... (全文を見る)

全文

(1)

Comparing the results of POS-tagging

Base+Affix (Orig) v.s. Base+Affix (Gold)

95.31% -> 95.54%0.23↑

Comparing the number of correct POS

the number of correct POS for misspelled words increased

i.e. 344 -> 465, 489 -> 528

for the number of correct POS for surrounding words, there was nearly no difference

e.g. affix information (e.g. ed, ing)

Various types of spelling errors

Some spelling errors have effective information that helps determine POSs

by using spell checker, the accuracy improves 0.06%

The number of spelling errors that were correctly assigned to POSs with spell checker (74)

The number of spelling errors that were incorrectly assigned to POSs with spell checker (49)

Accuracy of spell checker is not 100%

can correct unknown errors

difficult to correct known word errors

correct unknown errors to different words

Does ideal spell checker have positive effect on POS-tagging?

Analyzing the Impact of Spelling Errors on POS- Tagging and Chunking in Learner English

Tomoya Mizumoto Ryo Nagata

Background Summary

Performance Analysis of Spelling Errors

Experiments

have investigated performance of POS-tagging

focused our investigation on spelling errors

Experimental Setup

are used on NLP tasks that target learner English

10 of the 12 teams used POS-tagging in the CoNLL ST

also are used for linguistic analysis of learner English

explored characteristic patterns in learner English

POS sequences can be used to distinguish between mother tongue interferences

Detailed investigation would improve related tasks

none of studies described the root cause of POS- tagging errors in detail

1. Extent of performance degradation due to spelling errors

2. Types of spelling errors 3. Effect of a spell checker

93 93.75 94.5 95.25 96

Base Base+Affix

95.54

94.46

95.37

94.21

95.31

93.97

Original Spell checker Gold

0 150 300 450 600

misspelled previous next

596 547

528

598 542

465

588 544

489

590 540

344

Base(Orig) Base(Gold)

Base+Affix(Orig) Base+Affix(Gold)

Performance of POS-tagging: 0.23%↓ 

Spelling errors do not influence accuracy of   estimating POS of their surrounding words 

No DIFF on performance between known and unknown Improvement: 0.06% → spell checker is not required

Extent of performance degradation due to spelling errors

Types of spelling errors Effects of spell checker

Learner English includes 3.4%

spelling errors

assuming that POS-tagging fails for all unknown words: performance 3.4%↓

Effect of misspelled words have on them or their surrounding words

It is very ineresting/*interesting game . Final seen/*scene is very good .

e.g. Unknown word error:  

      typographical (studing/*studying)         Known word error:  

       homophones (sea/*see)、 

       derivations (smell/*smelly) e.g. movile → movie or mobile

Data 

Train: in-house data 

16,375 sentences、213,017 tokens 

Test: Konan-JIEM Corpus 

3,260 sentences、30,517 tokens 

The number of spelling errors: 654 

Spell Checker 

 based on noisy channel model 

Method of POS-tagging 

used conditional random field (CRF) 

tools: CRF++ (default parameter) 

feature: surface, original form, specific character 

+ suffix (Base)

#TP #FP #FN Precisio

n

Recall F-score

409 197 120 67.49 77.32 72.07

 Unknown errors: 487         

Known errors: 167

Experimental Results

 Results of POS-tagging (Accuracy)

Results of POS-tagging for misspelled words   and their surrounding words

Extent of performance degradation due to spelling errors

POS-tagging performance dropped 0.23% due to spelling errors

The effect of affix information for spelling errors

by using affix information, POS-tagger could identify the correct POS for approximately 120 misspelled words

Unknown word error v.s. known word error

Analyze the words that Base+Affix (Original) can not identify

unknown: 143/487 (29%), known: 46/167 (27.5%)

Effects of a spell checker

the ratio are not difference between unknown and known

Spell checker does not have positive effect for POS-tagging It is sufficient to assign POS tags using affix information

Base+Affix (Original) Base (Spell checker) pepole/Noun, singular people/Noun, plural

tow/Noun, singular two/Numeral

Base+Affix (Original) Base (Spell checker) tero/Noun, singular (corr: terrorist) to/Noun, plural

tittle/Noun, singular (corr: title) little/Adjective

Types of spelling errors

344 -> 465

参照

関連したドキュメント

Fitting the female AD incidence data by the ordered mutation model with the value of the susceptible fraction set equal to f s ¼ 1 gives the results plotted in Figure 5(a).. Notice

Keywords: continuous time random walk, Brownian motion, collision time, skew Young tableaux, tandem queue.. AMS 2000 Subject Classification: Primary:

In our previous papers, we used the theorems in finite operator calculus to count the number of ballot paths avoiding a given pattern.. From the above example, we see that we have

σ(L, O) is a continuous function on the space of compact convex bodies with specified interior point, and it is also invariant under affine transformations.. The set R of regular

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

In this paper we establish the strong convergence and almost stability of the Ishikawa iteration methods with errors for the iterative approximations of either fixed points of

This paper presents an investigation into the mechanics of this specific problem and develops an analytical approach that accounts for the effects of geometrical and material data on

While conducting an experiment regarding fetal move- ments as a result of Pulsed Wave Doppler (PWD) ultrasound, [8] we encountered the severe artifacts in the acquired image2.