Experimental design - Technological experimental evaluation

5.5 Technological experimental evaluation

5.5.1 Experimental design

For question (A) (Ref. 5.5 A), to evaluate the performances of the proposed recommendation approaches, five approaches were implemented. As Tab.5.3 shows, five approaches can be identified into three typical recommendation methods: content-based filtering, collaborative filtering and hybrid filtering. The first is conventional content-based filtering by using keyword based approach.

The learning resources and user profiles are sets of keywords. The second is our proposed method, which is content filtering based on keyword map. The third is conventional collaborative filtering, that is item-to-item collaborative filtering. In this approach, we used a 1-to-5 scale evaluation because it’s widely used in other recommender systems and has proved to be fine enough to collect product evaluations. The forth is collaborative filtering based on learner’s relationship (LR-CF) proposed by us, which integrates learners’ comparison matrix and the fifth is KM-CBF combined with LR-CF. Our system was implemented by using KM-CBF combined with LR-CF. In addition, we developed four procedures based on the database of our system, and we can also extract the learning resources recommendation lists of the other four approaches.

For question (B) (Ref. 5.5 B), as Tab.5.4 shows, three approaches were implemented to demonstrate that learning process is the effective component to inference the learners’ preferences. At the same time, as Tab.5.5 shows, we demonstrated the effect of learning step in learning process. And for question (C) (Ref. 5.5 C), as Tab.5.6 shows, two approaches were implemented to demonstrate that social interactions can improve the performance of recommendation. In addition, as Tab.5.8 shows, the approach based on analysis of social interactions considering learners’ understanding and knowledge level proposed by this research and the approach based on diffuse of innovation theory without considering learners’ understanding and knowledge level [Wan et al., 2008] are compared.

Chapter 5: Recommendation of Learning Resources 109

(1) Participant and procedure

And 10 learners that were students of masters’ degree and postgraduate studied as a group participated in the experiment. The participants’ average age was 31. The experimental period was one month. Finally, the learners evaluated the first ten of ranking lists of the recommendations.

(2) Analytical method

The existing studies about recommender systems have used a number of different measures for evaluating the success of a recommender system. In this research, we use the following measures:

accuracy, coverage and perfect predication.

i) Accuracy empirically measures how close a recommender system predicted ranking of items for a user differs from the user’s true ranking of preference. In the measures of recommender systems’

evaluation, the most popularity is accuracy. The measures of accuracy have been divided by Herlocker et al. into three categories [Herlocker et al., 2002].

• The first category includes Predictive Accuracy Metrics, such as Mean Absolute Error, Mean Squared error and normalized mean absolute error. These metrics measure how close the recommender’s predictions are to the true user ratings. For instance, the Mean Absolute Error (MAE) [Sarwar et al., 2000] evaluates the accuracy by measuring the mean average deviation between the expected ratings and the true ratings.

• The second category, Classification Accuracy Metrics, includes methods such as Receiver Operating Characteristic curves (ROC curves) and the F1 metric, and measures how often a recommender system can decide correctly whether an item is beneficial for the user and therefore should be suggested to him. Those metrics require a binary classification of items into useful and not useful. The recall-metric indicated the effectiveness of a method for locating interesting items. The precision metric represented the extent to which the items recommended by a method really are interesting to users.

• The last category of metrics, called Rank Accuracy Metrics, measures the proximity of a predicted ordering of items, as generated by a recommender system, to the actual user ordering of the same items.

110 Chapter 5: Recommendation of Learning Resources

The choice among those metrics should be based on the selected user tasks and the nature of the data sets. In some case, applying incorrect evaluation metrics may result in selecting an inappropriate recommendation method. We wanted our proposed algorithms to generate a top-N recommendation list rather than derive a predicted score for already rated items. In addition, estimating the utility of a list of recommendations requires a model of the way users interact with the system. As above mentioned, we used Rank Accuracy Metrics.

Breese et al. suggested a metric based on the expected utility of the recommendation list [Breese et al., 1998]. The utility of each item is calculated by the difference in vote for the item and a

“neutral” weight. The metric is then calculated as the weighted sum of the utility of each item in the list where the weight signifies the probability that an item in the ranking list will be viewed.

This probability was based on an exponential decay.

å å

a a

R 100 R_max (5.11)

max

R

a is the maximum achievable utility if all observed items had been shown at the top of the ranking list. In addition,

R

_a is an expression as below.

å

^-

-=

j j

j a a

d R

₍

r

₁^,₎_/( ₁₎

2 ) 0 , max(

a (5.12)

In the above expression,

r

_a_,_jis user

a

’s predicted vote to item j. d is the neutral vote and a is the viewing halflife. The halflife is the number of the item on the list, there is a 50-50 chance for the user to review that item.

ii) Coverage measures the percentage of items for which a recommender system is capable of making predictions. In order to compare the performance of proposed content filtering based on keyword map and conversional content-based filtering, we use the following measure. Since the accuracy cannot distinguish between a relevant item from a more relevant item, to cope with the

Chapter 5: Recommendation of Learning Resources 111

problem and to measure the quality of justification, we use an item-oriented measure, called completion coverage. For a learner u that receives a recommendation list L, the completion coverage for the justification list J is defined as follows:

å å

F f

i J

c f

i fi

i fi i

f u p

f u p c J

u Coverage Completion

) , (

)}

, ( , min{

) ,

( ⁽ ^, ⁾ (5.13)

which each pair

( f

, c

_fi

)

denotes that element

f

_i has overall frequency

c

_fi inside L and

) , ( u f

p

is the frequency of

f

_i in the keyword map profile of u. Completion coverage takes values in the range [0,1], whereas values closer to 1 correspond to better coverage.

iii) Perfect predications are the percentage of perfect estimations as regards the total estimations made. Perfect estimations are those which match the value voted by the user, taking as estimation the rounded value of the aggregation of the k-neighborhoods. Here we use perfect predications to compare the proposed collaborative filtering based on learner’s relationship with the collaborative filtering.

ドキュメント内 COLLABORATION AND RECOMMENDATION FOR GROUP LEARNING USING IN e-NOTEBOOK SYSTEM (ページ 122-125)