Reliability of the Lesson Observation lnstrument

雛 M欝 T

Chapter 3 Reliability of the Lesson Observation lnstrument

This chapter discusses the validity and reliability of our lesson observation

instrument． We follow Guarino， et al．（2001） to verify reliability．

3．I A previous s伽dy testing validity and reliability of t血e SIOP

Guarino， et al．（2001） tested the validity and reliability of the SIOP． They

defined three maj or SI dimensions，（1） Preparation，（2） lnstruction， and （3）

Review／Evaluation， and examined the consistency of the rating among raters （inter−rater

reliability）． Among the three dimensions， Preparation included two main features of lesson preparation and building background， which consisted of 6 assessment features；

deterrnining the lesson obj ectives and content obj ectives and selecting age−appropriate

content concepts and vocabulary， and assembling supplementary materials to

contextualize the lesson． lnstruction included the main features of comprehensible input， strategies， interaction， practice／application， and lesson delivery． These main

features were finther broken down into 20 items， such as making connections with

students background experiences and prior learning， modulating teacher speech，

emphasizing vocabulary development， using multimodal techniques， promoting

higher−order thinking skills， grouping students appropriately for language and content

development， and providing hands−on materials． Review／Evaluation consisted of 4

items； assessment of student comprehension and learning of all lesson obj ectives．

Guarino， et al．（2001） asked four teachers to observe and rate 6 video−taped

lessons， each of which lasted about 45 minutes． The raters were experienced SI teachers and the total teaching experience of each teacher was over 30 years． Three of

the video−taped lessons were perforrned by sheltered instruction specialists and the other

three were conducted by non−SI teachers． Cronbach s alpha was used to confirm the interd−rater reliability for the three dimensions above． The result indicated that all the

correlations among the raters achieved ．90 or higher， and thus the rating was considered

significantly consistent． They also compared the scores given to SI−based lessons and

non−SI based lessons． The result showed that the raters tended to score higher fbr the

SI based lessons than fbr non−SI based lessons． This suggested the SIOP was an appropriate tool fbr measu血g SI based lesson．

32 lnterrater reliability of our observation instrument

To test the reliability of our observation instrument， we put it to use in the

Japanese junior high school context and calculated the interrater reliability， following

the procedure described in Guarino， et al．（2001）．

3．2．1 Raters

The raters were two people； the author and her supervisor， who was a faculty of

auniversity of teacher education with I 6 years of experience in teaching pre／in−service

teacher education． The author was a pre−service teacher with virtually no experience

in teaching． Both of the raters worked on the present study and had a fairly deep grasp

ofthe SIOP．

3．2．2 The lessons observed

Three video−recorded English lessons were observed and rated． Each of them lasted 45 minutes and all of them were taught by a Japanese teacher of English． Two

lessons （Lesson A and B， respectively） were recorded on October 20th， 2010 and a third

one （Lesson C） was recorded in February， 2001． ln Lesson A， the teacher， who had 14

years of teaching experience， taught the first grade of j unior high school． The main

focus of the lesson was to practice and repeat the words previously leamed from their

textbook． The most of the lesson was delivered in a teacher−centered way． Lesson B

was taught by a teacher with 12 years of teaching experience． Since the lesson was conducted j ust before the final exam would begin， the teacher had to focus on the

review of learning and little interaction between teacher−students or student−student was

observed． Due to the fact that Lesson A and B were teacher−centered and very little

interaction was observed， we used another video−recorded lesson which had contrasting

features with these two lessons． The lesson was taught by a male teacher with 25 years

of teaching experience． He taught third graders and provided a lot of oppo1加nities fbr

communication and teacher−student or student−student interaction． The students were engaged in activities in pairs， which were effectively configured． He frequently assessed students learning during the lesson by checking the worksheet the students worked on or asking questions which seemed to assess students understanding of the lesson content．

3．2．3 Procedure

The raters watched the three different English lessons individually and scored

them according to the procedure described in Chapter 2． After collecting the

evaluation sheets， Pearson correlations coefficient was calculated for all three lessons．

The result is shown in Table 2．

Table 2． Pearson correlations coefficients between the two raters

A ^B ^C ^Mean

O．818 O．859 O．943 O．907

The mean of the coefficients of the three lessons was ．907， which is considered an

appropriate estimate of interrater reliability． However， individual coefficients of the

rating of the three lessons varied丘om．818 fbr Lesson A to．943 fbr Lesson C． To identify the cause ofthe discrepancy， we examined how the scores for Lessons A， B and

C were distributed for each main feature． Table 3 below shows the peroentages of the

congruence of the two raters．

Table 3． The congruence of the two raters

lesson building

、oomprehons三華

ts／trategies scaffolding

・…am・・・脚・・d ^{奄iﾄ・i・押tl} ・・…漁Pf

潔TP、蕊1鵬， e一㌦罵g

Lesson A IOO．Oe／o ^{so．oo／，} o．oe／， 1 66．7e／， 66．7 ／e 2s．oe／， 66．70／， o．oo／， 66．70／o 1 oo．eo／o

Lesson B 5e．OO／o 100．Oe／e 33，30／o 100．eo／， o．ea／， so．ew！o so．o ／o 25．oo／， 66．7 ／e IOO．O ／o

1．esson C IOO．Oe／o 100．OO／o 100．OO／o ， 66．70／o 葺

66．7e／， 75 ．00／． 66．7e／． leo．oo／， lee，eo／． ^{leo．oo／，}

As is evident， the scores for comprehensible input， scaffolding， interaction and

lesson delivery given for Lessons A and B agreed to a lesser degree than those for

Lesson C． We attribute the results to the fact that the raters were not yet trained and

tended to rely on subjective judgment in determining the degree of achievement for each feature． Also descriptors for each feature on the evaluation sheet did not guide

the raters to a sound judgment． This is a point requiring furthers investigation for

improvement of the instrument． Having said that， the scores given for the main

features in Lesson C showed agreement between the raters． This is presumably

because the lesson style was highly teacher−centered and matched the framework of the

SIOP and our observation instrument． This poses another issue； that is， the S IOP and

our observation instrument both presupposed particular types of lesson； interactive，

student−centered， activity一 or task−based， and communicative． Our problem is whether

it is necessary to incorporate other features which are observed in more traditional teaching styles． We will discuss these issues in the next chapter．

3．3 Summary

Although some inconsistency of scores among the raters was observed and

further improvement is necessary， we believe our lesson observation instrument has been modified to the Japanese context to evaluate lessons in a relatively reliable way．

Since our aim of developing the lesson evaluation instrument is to promote better teacher development， we should have involved the teachers teaching the lessons into the

evaluation processes． That way the instrument could be refined． This issue will be pursued in future research．

ドキュメント内 The Development of a Classroom Observation Instrument for English Lessons in Japanese Junior High Schools (ページ 40-45)