• 検索結果がありません。

A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia

N/A
N/A
Protected

Academic year: 2021

シェア "A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia"

Copied!
1
0
0

読み込み中.... (全文を見る)

全文

(1)

・ Integrate a crowdsourcing service and brat

A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia

Abstract Result

Collected ten annotations per an article

Nyctalopia, also called night-blindness, is a

condition making it difficult to see in relatively low light. Nyctalopia may exist from birth, or be caused by injury or severe malnutrition.

Wikipedia article “Nyctalopia”

Annotation policy

X promotes Y

Y is activated when X is activated

X suppresses Y

Y is inactivated when X is activated

Kazuaki Hanawa, Akira Sasaki, Naoaki Okazaki , Kentaro Inui

Approach Goal

(Tohoku University ,

Tokyo Institute of Technology)

・ Annotate causal relation instances in Wikipedia

・ Collected 95,008 causal relation instances in 1,494 Wikipedia articles

(http://www.cl.ecei.tohoku.ac.jp/wikipedia_pro_sup/)

・ The corpus can be used as supervision data for automatic recognition of causal relation

instances

・ Revealed valuable facts for improving the annotation process of this task

Contributions

suppress promote

⟨ PRO, nyctalopia, night-blindness ⟩

⟨ SUP , nyctalopia, see in relatively low light ⟩

⟨ PRO_BY , nyctalopia, injury ⟩ = ⟨ PRO, injury, nyctalopia ⟩

⟨ PRO_BY , nyctalopia, severe malnutrition ⟩

promote

Using brat in crowdsourcing

Example

Micro-F1 between gold standard

m : Number of annotators

n : Adopt only spans with n or more exactly matched annotations

Percentage of POS of head words

Noun 90.17- Mark 2.27

Verb 5.76 Particle 0.27

Auxiliary verb 1.09 Adverb 0.02

Adjective 0.41 Prefix 0.01

Automatic recognition

・ Use n = 2 data as training and test data

・ IOB2 notation was applied to the causal relations (e.g., B-PRO, I-PRO, B-SUP, I-SUP )

・ Use one-layer bi-directional LSTM

Label precision recall F1

PRO 0.507 0.364 0.424

SUP 0.354 0.275 0.310

PRO_BY 0.470 0.344 0.397

SUP_BY 0.259 0.178 0.211

Numbers of words and bunsetsu chunks

bunsetsu chunks words

Annotation interface of brat Crowdsourcing interface

Complete button

iYd2UwmHr51p Complete the task

If the password is correct, the worker could claim rewards

One out of ten is a test question

The character-level F1 score of a worker’s annotation is ...

less than 0.3

external site

Incorrect password 0.3 or more

F9pw4JkD0lk3 Correct password Enter the password

… and result in high numbers of abnormal white blood cells.

Symptoms may include bleeding and bruising problems, …

Treatment may involve some combination of chemotherapy, …

PROSUP

PRO_BY SUP_BY

the number of annotator

0 10

・ It may be sufficient to limit annotation spans to noun phrases

・ Allowing crowd workers to choose their segment boundaries may be necessary

1 2 3 … 10+ 1 2 3 20+

・ Increasing the number of annotators improves the result

・ Five annotations per article may be sufficient

参照

関連したドキュメント

The input specification of the process of generating db schema of one appli- cation system, supported by IIS*Case, is the union of sets of form types of a chosen application system

Zheng and Yan 7 put efforts into using forward search in planning graph algorithm to solve WSC problem, and it shows a good result which can find a solution in polynomial time

The computational results of a large group of problem instances with different parameters setting suggest that DP outperforms the CPLEX solver in run time required for obtaining

The algebraic approach described in the pre- vious section allows for the theoretical analysis of linear second order DAEs (1.1), but it cannot be used for the development of

Note that the derivation in [7] relies on a formula of Fomin and Greene, which gives a combinatorial interpretation for the coefficients in the expansion of stable Schubert

The formation of unstaggered and staggered stationary localized states (SLSs) in IN-DNLS is studied here using a discrete variational method.. The func- tional form of

The random intercept models proposed before may be debatable for fitting repeated measures of weighted change in EDSS, since they underestimate the change for patients, whose

In the present work, resuming from part of [9], we investigate a methodology based on the characteristic equation, which seems particularly practical for the scalar prototype