GROUP TEST RESULTS: INTERPRETATION AND
ARRANGEMENT FOR PROGRAM EVALUATION
LeslieI.Brezak
INTRODUCTION
Anyone who can read, add, subtract, multiply and dividecan make use of thisstudy, Its
main purpose isto provide enough informationso thatthereader ingeneral,and teachers inparticularcan understand and use group testresults, and isdesignedtofurtherleadone
intotheprocess of program evaluation.
Inaddition, thispaper has value beyond itsmain stated purpeses.Itprovides other
informatienthatone usually would have tosearch throughmany other sources tofind.This informationincludes:
1)a descriptionofCriterionReferencedBasicSkillsTestsand GroupAchievementTests;
2) definitionsof various types of scores and statistical concepts; and
3) samples of the computational steps tobe followedto arrive at means, medians and
percentiles.
The mere word tttesting"
often frightens'teachers as itsomewhat implies statistical manipulations and other rigorous concepts that are thoughtto be complex and time
consuming. The same word, C'testing",
usually terrifiesstudents, fearfulof the dire
consequences of failureor exposure tothe unknown.
Itisthe auther's hopethatthisstudy will helptoallay some of thefearsteachershave
about testingand as a result, aid students by providing bettertestsforthem totake, UNDERSTANDING TESTS
Why are testsadministered?
Tests are administered to filla need fordata.Most schools embarked on various educational programs require thattestingtake placeto j.udgetheeffects of such programs,
Many of the programsrequire evaluation oi theircontent and worth. Inorder tejudgea
program's werth, measurements must be taken. These measurements, inmany cases can
- 157 -
bestbetakenbya testingprograrn.Moreover,testing programs can providemuch needed
information.Testresults can demonstratethestudents' improvementintheirachievernent of learninggoals.Also,some testscan beused tocompare a school's results against other schoels on botha citywide and a natienwide basis.
What subjects are usually tested?
Presently,throughout the Western world, especially intheUnitedStates,there isa
strong thrust inthe area of what iscalled BasicSkills.Allschools seem concerned with
Readingand Mathematicsespecially intheareas ofpre-schoolthrough high school. These two subjects appear tobethefocusof cencern across allgrade levels.
What kinds of testsare administered?
Inthefieldof education, testsare administered tomeasure learning,The levelofstudent
learningismeasured tosee ifstudents are ready forthenext learningtask.Differentkinds
of testsare used forthispurpose. The two we will consider heremost often used to
measure learningare the Basic SkillsTest and theGroup Achievement Test,
The BasicSkillsTest(Criterion-Referenced)
One of the simplest tests related to Reading isa SpellingTest (seeFig. 1).At the ElementarySchoollevelforexample, by using a pfe-testat the beginningof the week a teacher can testthe students' knowledge of ten words ending inttat". Supposethatthe teacherfindsthatthestudent$ cannot spell any of the words correctly. During theweek,
thewords ending int`at" can betaught tothestudents. At theend of theweek, by testing them on the same words again, the teacher measures the learningof each student against
thecriterion (measure)of beingable tospellten simple words ending inCtat".
Basic Skil]s
SpellingTest 1) cat2)
bat3 ) fat4)
hat5)
mat
6) pat7 )rat8 )sat9 )tat10)
vat
Fig,1 Examples of the Two TestTypes
GroupAchievement SpellingTest
1) cat2)
bat3) bake
4) lamb 5) whose 6) where
7) tongue 8) length 9) occurrence
10) misspell
158-
LESLIE I,BREZAK:GROUP TEST RESULTS:INTERPRETATION AND ARRANGEMENT FOR PROGRAM EVALUATION
.
The Group AchievementTest
Group Achievement Teststry to measure learningover a largearnount of material,
Becauseso much learningisbeingcovered by the Group AchievementSpellingTestfor
example, only a fewwords of differingdifficultycan beincludedtomeasure differentlevels
of Spellingskills (seeFig.1).This limiton the number of words to be used causes
test-rnakerstouse only small samples of all thewords they would likestudents tospell,
The test-makerswould liketobeconfident thatthe words spelled are thosethatseparate poorer spellers from betterspellers. To be sure of this,beforethe testisofficially
published,the test-makers administer the test totheusands of students at differentgrade levelstosee how well theyspellthesample words. The scores of allthesestudents are put
intotablesofnumbers called Norms. When you wish toknow how well, by companson, a
student hasdone on a SpellingAchievement Testyou look up thatstudent's score ina Tableof Norms.
How dothe two testtypes compare?
Testresults fromBasicSkillsTests are very likelytobedifferentfromresults forGroup AchievementTests.Inreality thisfindingistheroughly expected and predictable,These
differences,and thereasons forthem become clearer when Item Analysisisperformed.
ItemAnalysesfortheTwo TestTypes
Thistype of examination can bedone forbothBasicSkillCriterionReferenced Tests
and Group AchievementTests.The results foreach question on thetestare examined.
Then the number of students who cerrectly answered each question,or the percentage
correct forthestudents islisted,For example, letus assume thefollowingtest results:
BasicSkills Group Achlevement
SpellingTest SpellingTest
Words Correct Words Cor,rect
Words M % Words M. %
1 18 90 1 18 90
2 18 90 2 18 90
3 18 90 3 16 80
4 18 90 4 14 70
5 18 90 5 12 60
6 18 90 6 10 50
- 159 -
7 18
8 18
9 18
10 18
Fig.2 Results
909090
90of
the Two TestTypes
78910 8
6
2,2
40301010
Usingour SpellingTestexample, we see that90% of all thestudents who took theBasic SkillsTestspelled allof the words ending int"at"
correctly. For theAchievementTest, 90% of thestudents spelled Stcat" and t'bat" correctly, butonly 10% spelled t'misspell"
and
t!occurrence"
correctly. Theseare differenttypesof results thatare expected forthese two very differenttypes of tests.
Dees thismean thatthestudents who teok theBasicSkillsTestwere betterspellers than those who took theAchievementTest?
No!The two testsare not of equal difficulty,We don'tknow how students who correctly spelled 90% of one syllable words ending in'tat" would do with words like{:occurrence"
and `'misspell". We cannot really compare the two testsunless the same students have
taken thesame tests.
While bothtestsdealwith Spelling,theBasicSkillsTestdealswith avery limitedgroup
of spelling words; thoseof one syllable ending int{at", The AchievementTest dealswith a wide range of spelling ability: one syllable words ending in'iat", words with silent "b's"
and t'e's",
the ttwh" combination, and some words thatare frequentlymisspelled.
Despiteitslimitations,the Basic Ski]lsSpellingTestdoeshave some usefulmess, Itcan tellus what percentageof our students taking thetestknow how tospellwords ending in
Ctat". However,ifwe wanted toknow whether our students were trulygood spellers, fora
wide range of words, we would haveto use a Group Achievement Test. Then we could compare our students' scores, as agroup,with thoselistedintheNorm tablesprovided by
the testpublisher,The students' scores intheNorm tablesrepresent largenumbers of students livingin various places, Therefore,after making the comparison, we would
know how our students compare toother students on a nationwide basis,
What should a testpossess?
Whichevertype of testone chooses, tobeuseful itmust bebothreliable and valid. There
are two types of validity thatare important fortestinterpretation,These are Content Validityand PredictiveValidity,
- 160 -
LESLIE I.BREZAK,GROUP TEST RESULTS:INTERPRETATION AND ARRANGEMENT FOR PROGRAM EVALUATION
ContentValidity
Contentvalidity means thatthetestcovers thesubjects itissupposed todealwith, For
example, ifan ArithmeticTestdidnot havequestions dealingwith Fractions,itcould not
betotallycontent valid forthe entire area of Arithmetic,
PredictiveValidity
Predictivevalidity means that a student who earns a certain score on one test is
expected, more or lesstoearn a similar score on another test.Ifone thinksback tothe
BasicSkillsSpellingTest,itcan be seen thata testlimitedtoone syllable words ending
inttat" cannot have much predictivevalidity.That testonly dealswith the lowestlevel
skills a student might havelearned,The Basi¢ SkillsTesttellsus very littleabout thewide
rarige of skillsa student haslearned.
On theother hand,the Group AchievementTesthad examples of words ranging from
very simple te rather difficult.Again one can see that thistestismuch more likelyto indicatewhat a student will beable todoinfutureeducational experiences, Therefore,it
willhave much more predictivevalidity than theBasicSkillsTestbegauseitdoesa better jobof examining a wide range ofstudent skills.
Reliability
Reliabilitymeans thatthe testresults fora student who isretested soon after the first testingwill be,more or less,thesame, We would not want our testingmeasure tochange while itwas beingused. Changeswould confuse the meaning of theresults,
UNDERSTANDING SCORE RESULTS
How are the scores reported?
There are many ways to report scores. Some of the most common ways will be
considered here.
Raw Scores
The basisof allscores istheraw score. Itisso named becauseithasnot beenchanged
inany way. A raw score issimply themark a student has earned on a test.Ferexample, ifone could spell seven out of ten words correctly on an examination thentheraw score
forthatstudent would be seven.
Percent
There are other ways tolookat score results. For instance,seven correct out of tenis
equal toseventy percent (70%).Percentisa way todescribestudent scores basedon the
number one hundred.
- 161-
Percentisa very helpfulconcept because itallows us te compare tests of different lengths.Supposethat duringone week a ten word spel]ing testwas given on which a
student scored six out of tencorrect, Then,the next week, a fgurword spelling testwas
givenon which the student spelled three words correctly.
We are able to compare the results forthese testsalthough they were of different lengths.To explain this,the firstweek's testscore of six out of tenequals sixty percent
(60%).The second week's testscore of threeout offourequals seventy-five percent(75%).
From thiswe can see thatthestudent's spelling mark improvedfromthefirsttothesecond
test.
These two typesof scores describedare used for Tests of Basic Skills,For Group AchievementTests,theraw scores are compared with tablesof scores representing allthe
students who took the Achievement Test<the Norm group).
Percentiles
The tables for the AchievetnentTests contain percentiles,stanines and grade
equivalents. Percentilesare related to percent correct. They describethe percentage of students who were originally tested(theNorm group) thatscored belowa certain scere.
Forexample, suppose thatthetwenty students who took theSpellingAchievementTest
shown inFig.1scored inthefollowingway: one student hadalltenwords spelled correctly, two students spelled nine words correctly, two had eight spelled correctly, fivehad seven spelled correctly, two hadsix spelled correctly, two hadfivespelled correctly, one hadfour
spelled correctly, fivehad threespelled correctly, none had only two spelled correctly, none
had on]y one word spelled correctly, and none hadall tenwords misspelled.
The score of these twenty students can behandledmore easi]y ifwe Iistthem inorder.
One way todothisistolistthetotalnumber of possiblecorrect answers, Inthiscase, there
are eleven possible correct answers ranging upwards from zero toten.We putthese inthe firstcolumn.
Possible Number of Collection ' Percentiles
Raw Students Column
Score PerSc6re
---d---p---i---F--
10 1 20
9 2 19
8 2 17
7Median 5Mode 15 50th
- 162 -
LESLIEI,BREZAK:GROUPTEST RESULTS,INTERPRETATION AND ARRANGEMENT FOR PROGRAM EVALUATION
6Mean 2 10
5 2 8
4 1 6 25th
3 5Mode 5
2 O O
1 O O
o o o
--- ---.---"... ..-. -. .."-.-"...-"..-.- .--,"-"
Total=: 20
Fig,3 Computing Percentiles
Next,we make a column forthe number of students who spelled each possiblenumber of spelling words correctly, Becausetherewere twenty students, thiscelumn must total twenty.
For our thirdcolumn, we add togetherthenumber of students who earned each possible
raw score (numberof words correct). We start our column fromthelowestpossiblescore, For example, therewere no students who had zero, one or two words spelled correctly, For thatreason, therows forzero words correct, and forone word correct and fortwo words correct are maked with zeroes. Ferthreewords spel]ed correctly, thesewere fivestudents inthatrow. Forfourwords correct, there was one student inthe row, We add thenumber of students forthree words correct(5) tothose with fourwords correct(1) and thatresult
issix (6)fortheCollectionColumn.When allthestudents who took thetesthavehadtheir
scores added togetherinthisway, thenurnber at thetopof theCollectionColumn must be twenty(twentybeingthe totalnumber of students who took the SpellingAchievement Test),
Now that the scores havebeenputintoa table,they can beused more easily todiscuss percentiles.Forexample, we can findthescore belowwhich fiftypercent of thestudent scores are found. This point iscalled the fiftiethpercentile.By loekingat the Collection
Column,we can see thattenstudents (50%ef al]twenty students) had scores that fell belowseven words correct. Therefore,theraw score of seven isat the 50thpercentile.
To findthetwenty-fifthpercentile(thescore belowwhich 25% of thestudent scores are
found),you can see that the scores of fivestudents fellbelow thescore of fourwords spelled correctly, Because fivestudents are twenty-fivepercent of our totaloftwenty who took the test,a raw score of fourcorrect isfoundtobeat the 25thpercentile.
- 163 -
Stanines
The tablesforAchievementTestscores also contain Staninesand Grade Equivalents, Staninesare another formof percentiles.They divideallthe99percentilesintonine equal parts.The firststanine beginsatthefirstpercentile,while the laststanine ends at the99th
percentile.
Grade Equivalents
Grade equivalent scores are more complex than percentilesand stanines. For this
explanation, suppose a fourthgrade elementary student earns a grade equivalent score of
sixth grade nine months(6.9) on a Group AchievementTestforfourthgraders.Thisonly
rneans thatthestudent hadthesame raw score as someone inthenorm group who was mne
months intothe sixth grade when thatsame testwas taken. Itdoesnot mean thatthe
fourthgrader scoring 6,9could functionon thesixth grade using sixth gradematerials,
How can thescores of groups bedescribed?
Intrying to describe many scores, forexample, totellhow well all20students who took
theSpellingAchievementperformed,itisnot helpfultostate each individualscore. There
are ways tosummarize many sceres. Thisleadsus to a discussionof theMode, Mean and
Median.
The Mode
The Mode isthemost frequentscore. Itisthescore thatthegreatestrtumber ofstudents earned. Semetimes thisscore describesthe entire group beingexamined, However there
are times,when itdoesnot accomplish this.Inthe case ef our group of twenty students(see
Fig.3),there were two modes, One mode was foundat the score of threewords spelled
correctly, the other found at thescorb of threewords spelled cofrectly. Becausethereare two modes we rnust findanother way tosummarize thescores of our twenty students, The Mean
Inaddition tosearching fora mode, an average or mean score can becomputed. Allthat
isneeded istoadd togethertheraw scores earned by allthe students on theSpellingTest
and dividethattotalby twenty, the totalnumber ofstudents who took the test.For our SpellingTest,all thestudent scores totaled120,which when dividedby 20,results ina average or mean of 6.Therefore,six was themean Spellingscore fortheclass.This tells us thatthestudents' average was to spell more than halfof thewords correctly.
The Median
Anotherway to describea group of scores isto use the median. The median or 50th percentileisthe pointbelowwhich 50% of thestudents' scores are found,Thatmeans that
- 164 -