Advanced Lecture on Neural Information Processing Systems (Lecture 02)

(1)

Advanced Lecture on

Neural Information Processing Systems (Lecture 02)

Ichiro Takeuchi

Nagoya Institute of Technology

(2)

Computer programs learned by themselves

▶

Learn a computer program

double func(double x1, double x2) {

double y = ???;

return y;

}

that satisfies the following input-output relations:

x

₁

= 2, x

₂

= 4 ⇒ y = 5

x

₁

= 1, x

₂

= 8 ⇒ y = 2

x

₁

= 6, x

₂

= 9 ⇒ y = 7

x

₁

= 3, x

₂

= 3 ⇒ y = 4

(3)

Input output relations

(4)

Linear models

▶

Consider linear input-output relations:

double func(double x1, double x2) {

double y = w1x1 + w2x2;

return y;

}

▶

Linear model for d-dimensional input x ∈ R

^d

:

y = f(x

₁

, x

₂

, . . . , x

_d

) = w

₁

x

₁

+ w

₂

x

₂

+ . . . + w

_d

x

_d

(5)

Simple linear regression: 1D input case

f(x) = wx

(6)

Examples

▶

Find a function

y = f(x) = wx

that satisfies the following input-output relations:

Example 1

x = 2 ⇒ y = 1 x = 6 ⇒ y = 3 x = 8 ⇒ y = 4 x = 4 ⇒ y = 2

Example 2

x = 2 ⇒ y = 1

x = 6 ⇒ y = 2

x = 8 ⇒ y = 4

x = 4 ⇒ y = 3

(7)

Plots

(8)

Minimizing errors

▶

Training data

input x output y x

₁

y

₁

x

₂

y

₂

.. . .. . x

_n

y

_n

▶

Minimizing the sum of squared errors:

w

^∗

= arg min

w∈R

∑

n i=1

(y

_i

− wx

_i

)

²

(9)

Exercise

▶

Compute the optimal w ∈ R that minimizes the sum of squared errors:

E :=

∑

n i=1

(y

_i

− wx

_i

)

²

where the training set is given as

x = 2 ⇒ y = 1

x = 6 ⇒ y = 2

x = 8 ⇒ y = 4

x = 4 ⇒ y = 3

(10)

Multiple linear regression

f(x

₁

, x

₂

, . . . , x

_d

) = w

₁

x

₁

+ w

₂

x

₂

+ . . . + w

_d

x

_d

(11)

Caution: notations of the training data

▶

The training data is represented by X ∈ R

ⁿ^×^d

and y ∈ R

ⁿ

:

n

X

×d

:=



 

 

x

₁₁

x

₁₂

· · · x

_1d

x

₂₁

x

₂₂

· · · x

_2d

.. . .. . . .. .. . x

_n1

x

_n2

· · · x

_nd



 

  =



 

  x

₁

x

₂

.. . x

_n



 

  , y

n×1

:=



 

  y

₁

y

₂

.. . y

_n



 

 

▶

x

_ij

∈ R : the j

^th

input variable of the i

^th

training instance

▶

x

i

∈ R

^d

: the input vector of the i

^th

training instance

(12)

Least square linear regression

▶

Training data:

X

n×d

:=



 

 

x

₁₁

x

₁₂

· · · x

_1d

x

₂₁

x

₂₂

· · · x

_2d

.. . .. . . .. .. . x

_n1

x

_n2

· · · x

_nd



 

  , y

n×1

:=



 

  y

₁

y

₂

.. . y

_n



 

 

▶

Linear model estimation by LS method:

w

^∗

= arg min

w∈R^d

∑

n i=1

(y

i

− (w

1

x

i1

+ . . . + w

d

x

id

))

²

(13)

LS linear regression in matrix vector form

▶

The sum of squared error is written as

E :=

∑

n i=1

(y

_i

− (w

₁

x

_i1

+ . . . + w

_d

x

_id

))

²

=

∑

n i=1

( y

_i

− x

^⊤_i

w )

2

= (y − Xw)

^⊤

(y − Xw)

= w

^⊤

(X

^⊤

X)w − 2(X

^⊤

y)

^⊤

w + y

^⊤

y

(14)

Optimality conditions

(15)

Minimizing the sum of squared errors

▶

The sum of squared error:

E = w

^⊤

(X

^⊤

X)w − 2(X

^⊤

y)

^⊤

w + y

^⊤

y

▶

The optimality condition:

∂E

∂w = 2(X

^⊤

X)w − 2(X

^⊤

y) = 0

▶

Normal equations:

(X

^⊤

X)w = X

^⊤

y

(16)

Example

(17)

Example

(18)

Final exersize

▶

Consider linear regression problem with a constant term:

▶ Training data: {(x_i, y_i)∈R×R}ⁿ_i=1

▶ Model: f(xi) =w0+w1xi, i= 1, . . . , n

▶ Problem: min_w₀_,w₁_∈R ∑_n

i=1(y_i−(w₀+w₁x_i))²

▶

Show that the solution of the problem is formulated as the following system of linear equations:

nw

₀

+ (

_n

∑

i=1

x

_i

)

w

₁

=

∑

n i=1

y

_i

(

_n

∑

i=1

x

_i

)

w

₀

+ (

_n

∑

i=1

x

²_i

)

w

₁

=

∑

n i=1

x

_i

y

_i

Advanced Lecture on Neural Information Processing Systems (Lecture 02)