KAZUKO TAKAHASHI1,
HIROYA TAKAMURA2, and MANABU OKUMURA2
1KEIAI UNIVERSITY, FACULTY OF INTERNATIONAL STUDIES 2 TOKYO INSTITUTE OF TECHNOLOGY, PRECISION AND INTELLIGENCE LABORATORY PACIFIC-ASIA KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD-07)
2
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusions
3
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusions
4
Motivation
Estimating the probability with which the sample belongs to the predicted class ( class membership probability ) is useful in many applications such as document classification.
Human decision making e.g. the NANACO system
displays outputs fromthe automatic systemas candidates of occupational codesto help human
annotators (coders) for the occupation coding in social surveys
withclass membership probabilities
5
The Occupation Coding
Occupation Data Occupational Code - job task(open-ended)
- industry(open-ended) oneof nearly 200 - employment categories - job title
- firm size
6 2&1: Regular employee No Managerial post
Industry Firm size
Candidates of Occupational Codes Occupation Data
Attribute (Education)
Class Membership Probability
to arrange the delivery vehicles load and unload of luggage
A Picture by the NANACO System
8 : From 500 to 999
Employment & Job title
9 : Junior high school
1 : No managerial post 2 : Regular employee
Job task
563 a transportation clerk 685 transportation laborers 556 shipping/sorting clerks 604 automobile drivers 560 postal/communication clerks
The NANACO system is used for the JGSS(Japanese General Social Surveys) and the SSM(Social Stratification and Social Mobility)survey.
資料編(3)
7
Existing Methods
(for Binary Classifier)
• Platt’s Method (Sigmoid function) P (f) = 1 / (1 + exp (Af+B) )
• Zadrozny’s Binning Method
• Isotonic Regression Method ( PAV algorithm )
Expansion by dividing a multiclass classifierinto binary classifiers
0.1 0.3 0.4 0.5 0.7 0.9 0.9
-2 -1.5 -1.3 -0.5 0 0.2 0.5 0.6 0.8 0.9
0 0 0 0 1 0 0 1 0 1
0 0 0 0 0.3 0.3 0.3 0.5 0.5 1 score Status(acc
uracy) Accuracy Accuracy for each bin
8
What is the Problem in Multiclass Classification?
• The relationship among the scores the 1stclass’s score > the 2ndclass’s score >
the 3rdclass’s score > … > the nthclass’s score
• The 1stclassis determined not by the absolute value of the score, but by the relative position among the scores.
Example 1 Example 2 the 1stclass’s score 1.5 large 0.1 small the 2ndclass’s score 1.4 large -1.5 small the status of the 1stclass incorrect correct
9
For Effective Estimation
• Does class membership probabilityfor the 1stclassdepend not only on the 1stclass’s scorebut also on other classes’ scores?
• It would be better to use not only the 1st class’sscore, but also other classes’ scores.
10
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusion
11
Proposed Method
• Using multiple classification scores
• As a method for estimating class membership probabilities
a. (indirectly) Using “an accuracy table”
b. (directly) Applyinga logistic regression
12
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusions
13
A Method Using an Accuracy Table
Axis of the 2ndclass’s scores Axis of the 1st
class’s scores
An Accuracy Table(e.g. 2-dimensions)
BothBinning Method andIsotonic Regression Method are difficultto be extended for multi-dimensions. These methods are not easy to sortall samples according to some criteria.
0.15 0.36 0.29 0.53 0.39 0.28 0.67 0.53 0.48 0.46
14
Process
Using multiple scores
STEP 1 Create cells for an accuracy table.
STEP 2 Smooth accuracies.
STEP 3 Estimate class membership
probability for an evaluation sample.
15
STEP 1 Create Cells for an Accuracy Table
Training data Training data n-fold cross
validationTest data
Scores, Status an Accuracy table { fi },correct/incorrect
Accuracy= # correctly-classified samples / # samples
0.15 0.36 0.29 0.53 0.39 0.28 0.67 0.53 0.48 0.46
16
STEP 2 Smooth Accuracies
Axis of the 2nd class’s scores Axis of the 1st
class’s scores
Target cell
17
Smoothing Methods
• Using only a target cell
–Laplace’s law(Lap) PLap(f) = (Np(c(f)) + 1) / (N(c(f)) + 2) –Lidstone method(Lid) PLid(f) = (Np(c(f)) + δ) / (N(c(f)) + 2δ)
• Using not only a target cellbut also surrounding cells –moving average method(MA)
PMA(f) =(Np(c(f))/N(c(f)) +Σs∈Nb(c(f))Np(s)/N(s)) / n –Median Method(Median)
PMedian(f) = medians∈Nb(c(f)) {Np(c(f))/N(c(f)) , {Np(s)/N(s)}
–moving average with coverage method(MA_cov) PMA_cov(f) = (Np(c(f))/N(c(f))C(c(f))+Σs∈Nb(c(f))(Np(s)/N(s))C(s))
/ (C(c(f)) + Σs∈Nb(c(f))C(s))
18
STEP 3 Estimate class membership probability for an evaluation
sample
Scores of an evaluation sample{ fi }
Axis of the 2ndclass’s scores Axis of the 1st
class’s scores
Class membership probability P =0.29
0.15 0.36 0.29 0.53 0.39 0.28 0.67 0.53 0.48 0.46
19
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusion
20
A Method Applying a Logistic Regression
Formula of a Logistic Regression P (f1,・・・ ,fn) = 1 / (1 + exp (ΣAifi+ B) )
{Ai},B : parameter fi : the ithclass’s score
21
Process
Using multiple scores
STEP 1 Estimate parameter with maximum likelihood method.
STEP 2 Estimate class membership
probability for an evaluation sample.
22
STEP 1 Estimate parameter with maximum likelihood method
Training data Training datan-fold cross validation Test data
Scores, Status { fi },correct/incorrect
{Ai},B : estimated with maximum likelihood
23
STEP 2 Estimate class membership probability for an evaluation
sample.
Scores of an evaluation sample { fi }
Class membership probability
P(f1,・・・ ,fn) = 1 / (1 + exp (ΣAifi+ B) )
24
Table of Contents
1. Motivation 2. Proposed Method
a. A Method Using an Accuracy Table b. A Method Applying a Logistic Regression 3. Experiments
4. Conclusions
25
The Purpose of Experiments
• Experiment 1
- Evaluation of various methods including the proposed methods
• Experiment 2
- Evaluation of the best method
26
Experimental Setting
• Classifier
–one-versus-restmethod to extendSVMsto a multiclass classifier
•A linear kernel
•Soft margin parameter C = 0.6
•Features
e.g. the JGSS dataset (on the next slide) - words in responses to “job task”
- words in responses to “industry”
- responses to “employment status” and “job title”
–Naïve Bayes classifier
27
DataSet
• The JGSS dataset (23,838 samples)
–Japanese survey data (open-ended) –The number of classes is nearly200
–Training data : old data (JGSS-2000, -2001 , -2002) Test data : new data (JGSS-2003)
• The 20 Newsgroups dataset (18,828 samples)
–English newspaper articles –The number of classes is 20 –5-fold cross validation
28
Cell Intervals for an Accuracy Table
• Cell intervals
0.05 0.1 0.2 0.3 0.5 etc.
The relationship between cell intervalsand the number of cells
Cell Intervals 0.05 0.1 0.2 0.3 0.5
# cells(the 1st
class’s score used ) 60 30 16 12 7
29
Evaluation Metrics
• In experiment 1
–Negative log-likelihood a Loss function L = Σ (- yilog ( pi) + (1 - yi) log ( 1 - pi) )
yi: status of an evaluation sample ( correct:1 incorrect:0 ) pi: predicted class membership probability of an evaluation
sample
When L is lower, the method is better.
• In experiment 2 –Reliability diagram
the predicted values vs. the true values
–ROC (receiver operating characteristic) curve FPF (false positive fraction) vs. TPF (true positive fraction)
–Ability to detect misclassified samples 30
The Proposed Method for Creating Cells
Classifier DataSet
SVMs JGSS dataset
SVMs 20 Newsgroups
dataset
Naïve Bayes classifier 20 Newsgroups
dataset Equal
intervals 2369.3 (# cells=30)
1472.3 (# cells=30)
1679.8 (# cells=16) Equal
samples 2678.3 (# cells=12)
1572.9 (# cells=12)
1671.0 (# cells=12) Negative Log-likelihood in the best case in each method
31
Experiment 1 (1/2)
Cell Inter-vals
Used Scores No
Smoo-thing
Lap Lid MA Med
-ian MA
_cov Logistic regres -sion 0.1 rank1rank1 &
rank2 2309.3
-2368.9 2356.8
2368.9 2355.8
2367.5 2245.8
2372.6
-2364.7 2232.7
2367.6 2246.9
0.2 rank1rank1 &
rank2 2371.3
-2371.0 2252.7
2370.3 2254.7
2369.3 2240.6
2370.0 2341.8
2369.3 2235.0
2367.6 2246.9
0.5 rank1rank1 &
rank2 2381.9 2265.8
2381.8 2265.6
2381.6 2265.7
2395.9 2327.5
2396.4 2298.8
2409.9 2320.6
2367.6 2246.9
Negative Log-likelihood (SVMs the JGSS dataset)
32
Experiment 1 (2/2)
Cell Inter-vals
Used Scores No
Smoo-thing
Lap Lid MA Med
-ian MA
_cov Logistic regres -sion 0.1 rank1rank1 &
rank2 1472.3
-1472.4 1390.2
1472.2 1388.3
1468.1 1362.3
1469.6
-1467.4 1360.3
1482.3 1386.6
0.2 rank1rank1 &
rank2 1472.5
-1472.7 1365.4
1472.5 1366.9
1474.4 1374.9
1473.3 - 1482.7
1377.7 1482.3 1386.6
0.5 rank1rank1 &
rank2 1487.4 1388.1
1487.5 1387.7
1487.4 1387.8
1503.9 1447.2
1497.0 1408.7
1537.9 1479.4
1482.3 1386.6
Negative Log-likelihood (SVMs the 20 Newsgroups dataset)
33
Negative Log-likelihood with SVMs on Both Datasets
• A method using an accuracy table
rank1 & rank2< rank1 & rank2 & rank3<< rank1
• A method applying a logistic regression
rank1 & rank2 & rank3< rank1 & rank2<< rank1
• Using multiple scoreswas much effective in SVMs –The method using an accuracy table
(cell intervals = 0.1 and a smoothing method = MA_cov) was the best of all cases.
–A methodapplying a logistic regressionwas stable. 34
Negative Log-likelihood with Naïve Bayes classifier on the 20 Newsgroups dataset
# Cells (the 1stclass’s score used)
Used Scores
No smoothing Lap MA Median MA_cov
30 rank1rank1 &
rank2
-1680.6 1439.7
1670.1 1409.8
1668.4
-1675.0 1415.3
16 rank1rank1 &
rank2
1680.2
-1679.8 1428.1
1679.6 1515.5
1675.8
-1696.2 1536.2
7 rank1rank1 &
rank2
1697.2
-1697.2 1474.8
1712.0 1626.3
1713.6 1644.8
1732.8 1664.1
In the case of the method using an accuracy table
35