DRESREM 2: An Analysis System for Multi-document Software Review Using Reviewers' Eye Movements

全文

(1)奈良先端科学技術⼤学院⼤学学術リポジトリ Nara Institute of Science and Technology Academic Repository: naistar. Title. Author(s). Citation. DRESREM 2: An Analysis System for Multi-document Software Review Using Reviewers' Eye Movements. Uwano, Hidetake; Monden, Akito; Matsumoto, Ken-ichi. 2008 The Third International Conference on Software Engineering Advances, 26-31 Oct. 2008, Sliema, Malta. Issue Date. 2008. Resource Version. author © 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,. Rights. including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.. DOI. 10.1109/ICSEA.2008.49. URL. http://hdl.handle.net/10061/12757.

(2) DRESREM 2: An Analysis System for Multi-Do ument Software Review using Reviewers' Eye Movements. Hidetake Uwano, Akito Monden, Ken-i hi Matsumoto Graduate S hool of Information S ien e, Nara Institute of S ien e and Te hnology, Japan. fhideta-u,. g. akito-m, matumoto is.naist.jp. Abstra t. To build high-reliability software in software development, software review is essential. Typi ally, software review requires do uments from multiple phases su h as requirements spe i

(3) ation, design do ument and sour e ode to reveal the in onsisten ies among them and to ensure the tra eability of deliverables. However, most previous studies on software review (reading) te hniques fo us on

(4) nding defe ts in a single do ument in their experiments. In this paper, we propose a multi-do ument review evaluation system, DRESREM 2. This system re ords reviewers' eye movements and mouse/keyboard operations for analysis. We ondu ted eye gaze analysis of reviewers in design do ument review with multiple do uments (in luding requirements spe i

(5) ation, design do ument, et .) to on

(6) rm the usefulness of the system. For the performan e analysis, we re orded defe t dete tion ratio, dete tion time per defe t, and

(7) xation ratio of eye movements on ea h do ument. As a result, reviewers who on entrated their eye movements on requirements spe i

(8) ation found more defe ts in the design do ument. We believe this result is good eviden e to en ourage developers to read high-level do uments when reviewing lowlevel do uments. 1. Introdu tion. Software review1 is a te hnique to improve the quality of software do uments and dete t defe ts (i.e. bugs or faults) by reading the do uments [2℄. In software review, a developer reads requirements spe i

(9) ation, design do ument, sour e ode and other do uments to understand systems' fun tions and stru tures, then dete ts defe ts from the do uments. Defe t dete tion by 1 In this paper, we use the word "review" to indi ate software review, inspe tion, walkthrough and/or other reading te hniques.. review an be performed in the early phases of software development without implementing the system, therefore, future rework osts an be redu ed [5℄. Espe ially in large s ale proje ts, be ause defe t dete tion and orre tion onsume huge resour es, defe t dete tion by review is ne essary. Many of studies about review have been ondu ted, su h as proposal of systemati reading te hniques, experimental omparison of reading te hniques, and analysis of reviewers' behavior [1, 3, 4, 5, 6, 7℄. A ording to these experiments, PBR (Perspe tive-Based Reading) is relatively more e e tive than CBR (Che klist-Based Reading) and AHR (Ad-Ho Reading), whi h are ommonly used te hniques in the industry [6℄. Most of these studies assume only a single do ument is used in the review. In these studies, subje ts read a target do ument (requirements spe i

(10) ation, design do ument, sour e ode or others) with a spe i

(11) review te hnique, and

(12) nd defe ts from the do ument. However, software review in the industry uses not only the target do ument but also other relevant do uments [10℄. For example, in sour e ode review, reviewers read sour e ode as well as the requirements spe i

(13) ation and design do ument to understand system stru tures, fun tions, and data stru tures. Also, reviewers ompare sour e ode (target do ument) with the requirement/design do ument (related do ument) to

(14) nd in onsisten ies between design and implementation. Moreover, omparison of do uments is performed to on

(15) rm the tra eability among di erent phases of do uments. In su h multi-do ument review, time spent to read ea h do ument and reading pro edure should a e t the defe t dete tion performan e. Hen e, we believe empiri al and quantitative analysis of review behavior in multi-do ument review is required for development of e e tive review te hniques and/or guidelines. In this paper, we propose a multi-do ument review evaluation system, DRESREM 2. This system is an enhan ement of our previous system DRESREM [9℄,.

(16) whi h was used to re ord reviewers' program reading pro edure (single do ument review). Our enhan ed system DRESREM 2 has the following four hara teristi s to enable us to analyze multi-do ument review pro esses using reviewers' eye movements: 1) Dete tion of reviewers' do ument swit hing (among requirements spe i

(17) ation, design do ument, sour e ode, he klist, et .), 2) Line-wise eye movement re ording, 3) A feature enabling reviewers to take notes about dete ted defe ts during software review and 4) Data analysis support (visualizing and replaying eye movements and do ument swit hing.) These hara teristi s fa ilitate observation of multiple do ument review and analysis of eye movements. In the following Se tions, we explain the ar hite ture of the system and its hara teristi s. Then we des ribe an experiment of design do ument review to evaluate the system's usefulness. 2. Multi-do ument Review. Industry developers usually review the target do ument (e.g. sour e ode) with its high-level do uments (requirements spe i

(18) ation, design do ument) or other related do uments (test spe i

(19) ation, user manual) [10℄. Figure 1 des ribes the relationship between sour e ode and other do uments at sour e ode review. Sour e ode has several blo ks of fun tions, methods, lasses, et . In single-do ument review (only with sour e ode), a reviewer reads all the blo ks to understand the program wholly (e.g. through F un tion a to F un tion d) and tries to

(20) nd any defe t during program understanding. In addition to this, in multi-do ument review (sour e ode review with requirement/design do ument), reviewer reads ea h blo k of sour e ode (e.g. F un tion b) as well as related blo ks in the requirement/design do ument (e.g. Design A and Requirement ) to

(21) nd any in onsisten ies among different levels of blo ks. This a tivity is \ omparison" rather than \understanding." To analyze su h reviewer a tivity in the multido ument review, we adopt eye movements of reviewers. Using the eye movements allows us to observe of reviewers' reading patterns and quantitative analysis of the relationship between the pattern and the review performan e. We implemented a multi-do ument review evaluation system to re ord reviewers' eye movements in the review. To make lear the purpose of the system, we present four requirements to be satis

(22) ed by the system.. Requirement R1: Dete tion of do ument swit hing. 0'1$+%'.'-/* *2'&+3 +&4/ +# 8 9 )'*+,- (#&$.'-/

(23) 7. "#$%&' &#(' 5 6. ! . Figure 1. Sour e ode review with multiple do uments.. To observe multi-do ument review a tivities, the system is required to identify whi h do ument is read by the reviewer. Usually, multiple do uments were displayed in multiple windows or a window that has a tab to swit h do uments displayed in the window, hen e do uments an be overlapped with other do uments during review tasks. This means the urrent fo us of the reviewer annot be identi

(24) ed from oordination of eye movements alone. Therefore, the system should have a fun tionality to identify whi h do ument is urrently fo used on by the reviewer by re ording tab swit hing a tivities and window fo using a tivities.. Requirement R2: Line-wise re ording of eye movements. A primary onstru t of a do ument is a line. In parti ular, most programs are written on a onestatement-per-line basis. So, it is reasonable to onsider that the reviewer reads the do ument in units of lines. Hen e, the measuring environment has to be apable of identifying whi h line of the do ument the reviewer is urrently looking at. Note that the information must be stored as logi al line numbers, whi h is independent of the font size or the absolute position where the do ument lines are urrently displayed.. Requirement R3: Enable reviewers to take notes about dete ted defe ts.

(25) To analyze the relationship between review performan e and the reading pro edure, details of dete ted defe ts need to be re orded. Hen e, the system must enable reviewers to take notes about details of dete ted defe ts, e.g. do ument name, lo ation (line number) and date.. Requirement R4: Data analysis support. Preferably, the measuring environment should provide tool support to fa ilitate an analysis of the re orded data. In parti ular, features to play ba k and visualize the data will ontribute to the eÆ ient analysis. Su h features are also useful for the purpose of edu ating novi e reviewers.. In Se tion 3, we explain how the proposed system satis

(26) es these requirements. 3. DRESREM 2 3.1. Outline. A multi-do ument review evaluation system, DRESREM 2 was developed based on a single-do ument review evaluation system, DRESREM [9℄. Figure 2 shows the ar hite ture of DRESREM 2. This system onsists of an Eye Tra king Devi e, Fixation Analyzer and Review Platform. We used non- onta t eye mark tra ker EMR-NC2 to re ord subje ts' eye movements. Figure 3 shows the eye tra king devi e used in the system. Fixation Analyzer is a software tool to al ulate

(27) xation points from sampled gaze points. Fixation is a parti ular oordinate at whi h the eye mark stays for a given moment. The

(28) xations an be useful to distinguish an instan e of reading from a glan e. Review Platform is a software system to show do uments for the reviewers and re ord their operations. This platform was implemented in Java language omprising 5700 steps and 80 lasses with SWT (The Standard Widget Kit)3 . The platform shows the do uments to reviewers through Do ument Viewer. A s reenshot of Do ument Viewer is shown in Figure 4. Reviewers sele t a do ument that they want to display on the Do ument pane using the Do ument tab lo ated on the top of the Do ument viewer. DRESREM 2 measures how a reviewer reads ea h line of a do ument on the omputer display using eye movements and operation logs (e.g. do ument swit hing and window s rolling.) When a reviewer

(29) nds a defe t in a do ument, the reviewer takes notes about the defe t in the pop up window, whi h appears when 2 http://www.eyemark.jp/ 3 http://www.e lipse.org/swt/. the reviewer double- li ks a line of the do ument. The system re ords these notes with a do ument name, line number and date. In addition, the reviewer an take notes about anything whenever he/she wants using Memo pane. The reviewer an also sear h for any keyword in a do ument using the Sear h pane during the review.. 3.2. System fun tions and pro edures. The pro edure of re ording reviewers' eye movements and operations is as follows (Figure 2). Do uments used in the review are displayed in the Do ument Viewer. The Eye Tra king Devi e outputs the reviewer's gaze points, represented as oordinates (x, y) on a display. These sampled gaze points are onverted to

(30) xation points by the Fixation Analyzer. Window Event Capturer observes user operations on a Do ument Viewer and re ords Window information, i.e. window position and window size, and urrent s roll position (line number) of the do ument urrently fo used on. This satis

(31) es Requirement R1. Fixation Point/Line Converter al ulates the logi al line number of a do ument from Window information and

(32) xation points. This satis

(33) es Requirement R2. Reviewer operations su h as defe t des ription re ording, keyword sear hing and taking of notes are re orded by Operation Re order, then Review Information Integrator ombines the operations and eye movements to reate the time series data of the review history. This satis

(34) es Requirement R3. Re orded eye movements and operations are visualized in Result Viewer. Figure 5 shows an example of visualized eye movements and operations in a sour e ode review. In this

(35) gure, the left side of the window shows a sour e ode that is read in the review, and the right side of the window des ribes eye movement

(36) xations and operations as a bar hart. Also, the sequen e of the eye movements an be played ba k in this window. These features satisfy Requirement R4. DRESREM 2 outputs review history (i.e. time series of eye movements and operations) as three type formats, do ument-wise, blo k-wise and line-wise. In ea h format, eye movements were re orded as series of

(37) xations on do uments, blo ks or lines. These formats allow users to easily analyze the review history from di erent granularities..

(38) > >J LBD _ <_K\E ;E J>>K a>J?_DB =>HI@B <JE B\? >D= EB_DHLC @B@> C =BBHJ =BEHD;`J;>< DBH>D=;<b N iWY WhWZ :;<=>? @>ABC DBE ;FBC G>HI@B <J EHD>KKC E? ;JHL [\B @>AB@B<J [\B ;@_bB. cRdefWSQ gNWhWZ. BA ;B? L;EJ>D\. kQ iWje gNWhWZ. WY NWh {SiRZfPQNRS {SQW|ZPQRZ >b;H_K K;<B <I@BDEC =_JBEC = ID_J;><E MNOPQNRS NSRh TRNSQ U VNSW yYWSQ XPQ eZWZ :;<=>? ;<> XRSYWZQWZ lmnomp qrstuvwx ] ;^_J;>< `>;<JE WZPQNRS iWdRZ WZ. {fP|W TZRdWjjRZ. `BD_J;>< ;<>. a_@`KB= b_FB `>;<JE. MNOPQNRS SP kzWZ. yzW XPfWZP }~m wso mnom. Figure 2. Ar hite ture of DRESREM2.. requirements spe i

(39) ation, design do ument, data

(40) le, and a he klist for design do ument review | were used. The original requirements spe i

(41) ation and design do ument ontained no defe t. We inje ted nine defe ts to the design do ument. The review was

(42) nished when a subje t (reviewer) on luded that the design do ument had no more defe ts. Subje ts were eleven graduate students and one fa ulty member of Nara Institute of S ien e and Te hnology. Their average programming experien e was 7.6 years. Two of them had software development experien e in industry. 4.2. Materials. Figure 3. Eye tra king devi e EMR-NC.. 4. Case Study 4.1. Outline. We experimentally evaluated the usefulness of the proposed system. In the experiment, subje ts were asked to

(43) nd defe ts in a design do ument by reading all given do uments. In the review, four do uments |. Do uments used in the experiment were about a rental house sear h system a tually used in an industrial training workshop. The do uments onsist of requirements spe i

(44) ation, design do ument and data

(45) le. This system reads a data

(46) le in whi h a set of rental houses is listed. A system user inputs a ondition about rental houses (e.g. distan e from the nearest train station, oor spa e and rent) that he/she wants to look at. A ording to the user input, the system outputs a list of rental houses that mat h the ondition..

(47) . ¡¢ . . . Figure 4. S reenshot of Do ument Viewer.. Requirements Spe i

(48) ation:. This do ument onsists of 40 lines of Japanese text, des ribing system fun tions and requirements.. Design Do ument: This do ument des ribes de-. tails of ea h fun tion's interfa e, data and pro esses. It onsists of 30 lines of Japanese text.. Data File:. This

(49) le is read by the system when the system starts. The

(50) le onsists of a list of rental houses.. Che klist:. This is a generi he klist for a design do ument review, written based on existing literature [7, 8℄.. We inje ted nine defe ts in total (three defe ts for ea h of three defe t types) into the design do ument in advan e. Defe t types are des ribed as follows.. In onsisten y with requirements: This defe t. type means that the design do ument ontains a fun tion des ribed in the requirements spe i

(51) ation but it does not ful

(52) lls the requirements.. Omission of requirements: This means, the design do ument has no des ription about a fun tion des ribed in the requirements spe i

(53) ation.. Ex ess design:. This defe t type indi ates there exist ex ess des riptions in the design do ument, whi h have not been des ribed in the requirements spe i

(54) ation. This defe t an be also onsidered as insuÆ ient des ription of the requirements spe i

(55) ation.. 4.3. Result. Twelve subje ts' data were olle ted in the experiment. One of them was removed from the analysis be ause of insuÆ ient data a ura y of eye movements. The average review time was 25 minutes. From the interview of reviewers ondu ted after the experiment, we on

(56) rmed that the motivation of subje ts to

(57) nd defe ts was kept high during the experiment. All subje ts found at least three bugs (the average was 5.45). Using the replay fun tion of DRESREM 2 extensively, we investigated the eye movements of the individual subje ts. As a result, we found that every reviewer swit hed do uments frequently in their review. Figure 6 depi ts an example of reviewers' eye movements in the experiment. This graph des ribes time series of eye movements on ea h line of do uments, the horizontal axis shows

(58) xation ID (transitions of

(59) xa-.

(60) Figure 5. Example of eye movement visualization using DRESREM 2.. tion among lines) and the verti al axis shows the line number of do uments. In the

(61) gure, eye transitions between requirements spe i

(62) ation and design do ument were observed. In the experiment, reviewers swit hed do uments every 19.9 se onds on average. From a quantitative analysis, we found that reviewers spent di erent

(63) xation time on ea h do uments. Table 1 shows the per entage of

(64) xation time for ea h do ument. This result indi ates that reviewers spent a fair amount of time on the design do ument. They spent most time on the requirements spe i

(65) ation and design do ument (96.5% on average.) However, there were quite a few di eren es in their reviews. For example, Subje tA spent only 19.8% on the requirements spe i

(66) ation and spent most time on the design do ument (72.9%.) On the other hand, Subje tB on entrated more on the requirements spe i

(67) ation (40.9%) and less time on the design do ument (54.8%.) The result of a statisti al analysis revealed a signi

(68) ant orrelation between the defe t dete tion ratio of "omission of requirements" and review time on requirements spe i

(69) ation (r = 0:593, p value = 0:054.) This suggests that to

(70) nd the omission of requirements in the design do ument, we need to read the requirements spe i

(71) ation. It an be said that reading the design do ument. Table 1. Fixation ratio for ea h do ument.. Requirements Design Data

(72) le Che klist Other. Average 28.5% 68.0% 0.6% 2.2% 0.7%. Minimum 19.8% 54.8% 0.0% 0.0% 0.0%. Maximum 40.9% 73.3% 1.2% 3.6% 3.8%. only yields less understanding of the system requirements. 5. Con lusion. In this paper, we proposed a multi-do ument review evaluation system, DRESREM 2. The proposed system re ords reviewers' eye movements and mouse/keyboard operations and visualizes them to analyze the relationship between review performan e and reading pro edure. The system also provides features to play ba k the eye movements and the operations for a qualitative analysis of software review a tivities. We experimentally evaluated the usefulness of the proposed system. In the experiment, a design do u-.

(73) «¬¤®©. ······ ············ ········ · · ··· · ···· ·· · · ··. · ······· ·· ·· ·· · ··· ········· ······ · ¯¬°±¤²¬³¬©§ ····· · · ·· · ···· · ···· ··· ····· ·· ´. ·· ········ ···· ··· ·· · ···· · · · · ···· ······ · · · ···· ·· ···· · · ········ ······ ··· · · ··········· · ·· · ·· ·· ·· ········· ·· · · · · · · · · ·· ···· ···· ·············· ···· ·· ·· ··· ···· ····· · ·· ·· ·················· ·· · ·· · ·· ····· · ··· · ····· · ·· ··········· · · · · · · · · · · ··· ······· · ············ ··· · · ·· ················ ···· ·· ······· ·· · · · · ·· ············ ·· · · · ······ ···· ··· ···· ······· ·········· ········· ·· ···· ·· · ···· · ········· ····· ··· ··· ············ · ·· · ·· · · ··· ·· · · · · · · · · · · · · · · · ······· · · · · · ·· · · ·· ·· · ····· ··· ··· ········ · ······ ··· ········ · ·· ····· ··· · · · · · ·· · · ·· · · · · · · ·· · · · ·· · ·· · ·· ··· · · ·· · · · · · · ·· ··· ····· ·· ··· ·· · ···· ·· ·· ···· ·· ·· · ··· ·· ·· · · · ··· · ···· ······· · · · · · · · ·· ··· ···· · · ··· · ·· ··· · ··· · · · · ·· · · · ·· ···· ·· ·· ···· · ·· ·· ·· ··· · ·· ·········· ·· · ·· ··· · ·· · ··· · µ´´ ¶´´´ ¶µ´´ £¤¥¦§¤¨© ª«. Figure 6. An example of eye movements.. ment review using multiple do uments (requirements spe i

(74) ation, design do ument and others) was performed. As a result, the system ontributed to revealing the reading pro ess that a e ted the review performan e. The result of a statisti al analysis revealed a signi

(75) ant orrelation between the defe t dete tion ratio of "omission of requirements" and review time on requirements spe i

(76) ation. This suggests that to

(77) nd the omission of requirements in the design do ument, we need to read the requirements spe i

(78) ation. The major limitation of our experiment is that the sample size was small (twelve subje ts). Also, we used just one software system to be reviewed. In the future, we will in rease the sample size with di erent software systems. More detailed analysis of eye movements su h as fun tion-wise (blo k-wise) analysis and time series analysis are also important future work. 6. A knowledgment. This work was ondu ted as a part of the StagE Proje t, the Development of Next Generation IT Infrastru ture, supported by the Ministry of Edu ation, Culture, Sports, S ien e and Te hnology, Japan. Referen es. [1℄ V. R. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull, S. Srumgard, _ and M. V. Zelkowitz. The empiri al investigation of perspe tive-based reading. An International Journal of Empiri al Software Engineering,. 1(2):133{163, 1996.. [2℄ B. W. Boehm. Software Engineering E onomi s. Prenti e Hall, 1981. [3℄ M. E. Fagan. Design and ode inspe tion to redu e errors in program development. IBM Systems Journal, 15(3):182{211, 1976. [4℄ M. Halling, S. Bi, T. Gre henig, and M. Kohle. Using reading te hniques to fo us inspe tion performan e. In Pro eedings of 27th Euromi ro Workshop Software Pro ess and Produ t Improvement, pages 248{257, 2001. [5℄ O. Laitenberger, T. Beil, and T. S hwinn. An industrial ase study to examine a non-traditional inspe tion implementation for requirements spe i

(79) ations. In Pro eedings of Eighth IEEE Symposium on Software Metri s, pages 97{106, 2002. [6℄ O. Laitenberger and J. DeBaud. An en ompassing life y le entri survey of software inspe tion. Journal of Systems and Software, 50(1):5{31, 2000. [7℄ A. A. Porter, L. G. Votta, and V. R. Basili. Comparing dete tion methods for software requirements inspe tion - a repli ated experiment. IEEE Transa tion on Software Engineering, 21(6):563{575, 1995. [8℄ T. Thelin, P. Runeson, and C. Wohlin. An experimental omparison of usage-based and he klist-based reading. IEEE Transa tion on Software Engineering, 29(8):687{704, 2003. [9℄ H. Uwano, M. Nakamura, A. Monden, and K. Matsumoto. Exploiting eye movements for evaluating reviewer's performan e in software review. IEICE Transa tions on Fundamentals, E90-A(10):317{328, O tober 2007. [10℄ K. Wiegers. Peer Reviews in Software - A Pra ti al Guide. Addison-Wesley, 2002..

(80)