Advanced Lecture on
Neural Information Processing Systems (Lecture 02)
Ichiro Takeuchi
Nagoya Institute of Technology
Computer programs learned by themselves
▶
Learn a computer program
double func(double x1, double x2) {
double y = ???;
return y;
}
that satisfies the following input-output relations:
x
1= 2, x
2= 4 ⇒ y = 5
x
1= 1, x
2= 8 ⇒ y = 2
x
1= 6, x
2= 9 ⇒ y = 7
x
1= 3, x
2= 3 ⇒ y = 4
Input output relations
Linear models
▶
Consider linear input-output relations:
double func(double x1, double x2) {
double y = w1*x1 + w2*x2;
return y;
}
▶
Linear model for d-dimensional input x ∈ R
d:
y = f(x
1, x
2, . . . , x
d) = w
1x
1+ w
2x
2+ . . . + w
dx
dSimple linear regression: 1D input case
f(x) = wx
Examples
▶
Find a function
y = f(x) = wx
that satisfies the following input-output relations:
Example 1
x = 2 ⇒ y = 1 x = 6 ⇒ y = 3 x = 8 ⇒ y = 4 x = 4 ⇒ y = 2
Example 2
x = 2 ⇒ y = 1
x = 6 ⇒ y = 2
x = 8 ⇒ y = 4
x = 4 ⇒ y = 3
Plots
Minimizing errors
▶
Training data
input x output y x
1y
1x
2y
2.. . .. . x
ny
n▶
Minimizing the sum of squared errors:
w
∗= arg min
w∈R
∑
n i=1(y
i− wx
i)
2Exercise
▶
Compute the optimal w ∈ R that minimizes the sum of squared errors:
E :=
∑
n i=1(y
i− wx
i)
2where the training set is given as
x = 2 ⇒ y = 1
x = 6 ⇒ y = 2
x = 8 ⇒ y = 4
x = 4 ⇒ y = 3
Multiple linear regression
f(x
1, x
2, . . . , x
d) = w
1x
1+ w
2x
2+ . . . + w
dx
dCaution: notations of the training data
▶
The training data is represented by X ∈ R
n×dand y ∈ R
n:
n
X
×d:=
x
11x
12· · · x
1dx
21x
22· · · x
2d.. . .. . . .. .. . x
n1x
n2· · · x
nd
=
x
1x
2.. . x
n
, y
n×1
:=
y
1y
2.. . y
n
▶
x
ij∈ R : the j
thinput variable of the i
thtraining instance
▶
x
i∈ R
d: the input vector of the i
thtraining instance
Least square linear regression
▶
Training data:
X
n×d
:=
x
11x
12· · · x
1dx
21x
22· · · x
2d.. . .. . . .. .. . x
n1x
n2· · · x
nd
, y
n×1
:=
y
1y
2.. . y
n
▶
Linear model estimation by LS method:
w
∗= arg min
w∈Rd
∑
n i=1(y
i− (w
1x
i1+ . . . + w
dx
id))
2LS linear regression in matrix vector form
▶
The sum of squared error is written as
E :=
∑
n i=1(y
i− (w
1x
i1+ . . . + w
dx
id))
2=
∑
n i=1( y
i− x
⊤iw )
2= (y − Xw)
⊤(y − Xw)
= w
⊤(X
⊤X)w − 2(X
⊤y)
⊤w + y
⊤y
Optimality conditions
Minimizing the sum of squared errors
▶
The sum of squared error:
E = w
⊤(X
⊤X)w − 2(X
⊤y)
⊤w + y
⊤y
▶
The optimality condition:
∂E
∂w = 2(X
⊤X)w − 2(X
⊤y) = 0
▶
Normal equations:
(X
⊤X)w = X
⊤y
Example
Example
Final exersize
▶
Consider linear regression problem with a constant term:
▶ Training data: {(xi, yi)∈R×R}ni=1
▶ Model: f(xi) =w0+w1xi, i= 1, . . . , n
▶ Problem: minw0,w1∈R ∑n
i=1(yi−(w0+w1xi))2
▶
Show that the solution of the problem is formulated as the following system of linear equations:
nw
0+ (
n∑
i=1
x
i)
w
1=
∑
n i=1y
i(
n∑
i=1
x
i)
w
0+ (
n∑
i=1