• 検索結果がありません。

バイオメトリクス授業資料 iwatawiki lec14 s

N/A
N/A
Protected

Academic year: 2018

シェア "バイオメトリクス授業資料 iwatawiki lec14 s"

Copied!
14
0
0

読み込み中.... (全文を見る)

全文

(1)

060310391

0560565

14

2017/1/16 13:00-14:45

@1 - 4

1

DNA

2

• 

• 

• 

• 

–  k-means

SOM

• 

–  SVM

3

• 

mRNA

4

(2)

h>p://www.scq.ubc.ca/spot-your-genes-an-overview-of-the-microarray/ Art by Jiang

Long

cDNA

2

2

5

cDNA

= /

h>p://www.promega.com/enotes/applicaRons/ap0066.htm

6

GeneChip

1

mRNA

→ cDNA

→ cRNA

→ cRNA

GeneChip

20 25

10 20

Perfect Match (PM) Mismatch

(MM)

MM PM 1

h>p://www.scq.ubc.ca/spot-your-genes-an-overview-of-the-microarray/ Art by Jiang Long 7

GeneChip PM MM

•  1

•  PM MM

Schadt et al. (2000) J Cell Biochem 80: 192

8

Perfect Match PM Mismatch MM

(3)

RNA-seq

9

Wang et al. (2009) Nat Rev Genet. 10: 57–63

RNA-seq

Conesa et al. (2016) Genome Biology 17:13 10

‘align-then-assemble’ and ‘assemble-then-align’

11

Haas and Zody (2010)

Nat. Biotech. 28: 421-423

• 

– 

– 

– 

• 

– 

• 

– 

-

12

(4)

• 

–  hierarchical clustering

–  non-hierarchical clustering)

– 

13

1 2 3 4

1 1.53 2.38 2.80 0.60

2 1.03 2.54 3.29 0.80

3 0.85 0.21 0.34 3.02

4 1.03 0.82 0.94 1.20

:

1.0 0.95 -0.92 -0.88

0.95 1.0 -0.74 -0.77

-0.92 -0.74 1.0 0.93

-0.88 -0.77 0.93 1.0

1.0 0.95 0.92 0.88

0.95 1.0 0.74 0.77

0.92 0.74 1.0 0.93

0.88 0.77 0.93 1.0

0.0 0.74 4.13 2.55

0.74 0.0 4.36 0.77

4.13 4.36 0.0 2.02

2.55 2.94 2.02 0.0

(

r =

(x

i

− x

i=1 n

)(y

i

− y )

(x

i

− x

i=1 n

)

2

(y

i

− y

i=1 n

)

2

14

15

1

1

1

1

1.0 1.5 2.0 2.5 3.0 3.5 4.0

0 .5 1 .0 1 .5 2 .0 2 .5 3 .0

Sample ID

Exp re ssi o n l e ve l

2

2

2

3 2

3 3

3

4 4 4

4

•  1

•  dendrogram

• 

• 

16

(5)

d(x i , x j ) = (x i1 − x j1 )

2 ++ (x ip − x jp ) 2

x i = (x i1 ,…, x ip ), x j = (x j1 ,…, x jp )

(Euclidean distance)

d(x i , x j ) = x i1 − x j1 ++ x ip − x jp

(Minkowski distance)

d(x i , x j ) = max x { i1 − x j1 ,…, x ip − x jp }

(Maximum distance)

d(x i , x j ) = x i1 − x j1

p ++ x ip − x jp

1/p p

(Manha>an distance)

17

p = 1

p = 2

p → ∞

d(x i , x j ) =

x i1 − x j1

x i1 + x j1 ++

x ip − x jp

x ip + x jp

(Canberra distance)

1. 

2. 

1

3. 

4.  1

2-3

18

0 1 2 3 4 5

0 1 2 3 4 5

Expression level in Exp.1

Exp re ssi o n l e ve l in Exp .2

1

2

3 4

1 5

2

3

4

1

g e n e 1 g e n e 2 g e n e 5 g e n e 3 g e n e 4

0 .5 1 .0 1 .5 2 .0 2 .5 3 .0

Dendrogram

hclust (*, "average")

D ist a n ce

2

3

4

2

19

(1)

1.  nearest neighbor method

a.k.a. single linkage

2. furthest neighbor method

a.k.a. complete linkage

A B

20

(6)

(2)

3. group average method

2. centroid method

× ×

21

(3)

4. median method

5. Ward’s method

× ×

×

×

×

×

×

d(A,B) = E(A  B) - E(A) - E(B) 22

0 1 2 3 4 5

012345

Expression level in Exp.1

Expression level in Exp.2

g e n e 3 g e n e 4 g e n e 5 g e n e 1 g e n e 2 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0

Complete linkage

hclust (*, "complete")

D ist a n ce

gene1

gene2 gene5

gene3

gene4

g e n e 1 g e n e 2 g e n e 5 g e n e 3 g e n e 4

0 .8 1 .0 1 .2 1 .4 1 .6 1 .8

Single linkage

hclust (*, "single")

D ist a n ce

dendrogram

23

• 

g e n e 3 g e n e 4 g e n e 5 g e n e 1 g e n e 2

0 .8 1 .2 1 .6

Euclidean distance

hclust (*, "single")

dist(c, method = "euclidian")

H e ig h t g e n e 3 g e n e 4 g e n e 5 g e n e 1 g e n e 2 0

.6 1 .0 1 .4

Maximum distance

hclust (*, "single")

dist(c, method = "maximum")

H e ig h t g e n e 3 g e n e 4 g e n e 5 g e n e 1 g e n e 2

1 .2 1 .6 2 .0

Manhattan distance

hclust (*, "single")

dist(c, method = "manhattan")

H e ig h t g e n e 1 g e n e 2 g e n e 5 g e n e 3 g e n e 4 0 .1

5 0 .2 5 0 .3 5

Canberra distance

hclust (*, "single")

dist(c, method = "canberra")

H e ig h t

24

(7)

• 

•  1-r ij 1-|r ij |

25

(1)

• 

•  0, 15, 30 1, 2, 3,

4, 8, 12, 16, 20, 24

•  0

• 

Eisen et al. (1998) PNAS 95: 14863

Cholesterol

biosynthesis

Cell cycle ( )

Immediate-early

response

Signaling &

angiogenesis

( )

Wound healing &

Tissue remodeling

→ 26

1 2 3 4

1 1.53 1.03 0.85 1.03

2 2.38 2.54 0.21 0.82

3 2.80 3.29 0.34 0.94

4 0.60 0.80 3.02 1.20

27

(2)

•  60

cell lines

• 

Ross et al. (2000) Nat Genet 24: 227

28

(8)

• 

•  k-means

• 

(Self-organizing maps SOM

29

k-means

1.   k k

2.  k

3. 

4.  2-3

1

30

k-means

Bishop CM (2006) Pa>ern recogniRon and machine learning. Springer.

31

•  386 1311SNPs

•  k− 5

• 

32

●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

● ●●●●

●●

●●

●●

●●

● ●

●●

●●●

● ●●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●● ●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

−20 −10 0 10 20

−1001020

Rep 0

PC1

PC2

●●

● ●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●● ●

●●●

● ●●●

●●

● ●

● ●

●●

●●

● ●●

●● ●●●●

−10 −5 0 5 10 15

−10−505101520

Rep 0

PC3

PC4

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

● ●●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

−20 −10 0 10 20

−1001020

Rep 1

PC1

PC2

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

● ●●●

●●

●●

●●

● ●

●●

● ●

●●

●●● ●

●●

●●●

● ●●●

● ●

● ●

●●

●●

● ●●

●● ●●●●

● ●

−10−5 0 5 10 15

−10−505101520

Rep 1

PC3

PC4

×

1

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

●●

●●

●●

●●●

● ●●●●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

● ●

●●

●●●

●● ●

−20 −10 0 10 20

−1001020

Rep 2

PC1

PC2

●●

● ●

● ●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

● ●●●

●●●

●●

● ●

● ●

●●

●●● ●

●●

● ●●●

●●

● ●

● ●

●●

●●

●●

● ●

●● ●●●●

−10 −5 0 5 10 15

−10−505101520

Rep 2

PC3

PC4

2

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●●

●●

●●

● ●

● ●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●● ●

−20 −10 0 10 20

−1001020

Rep 10

PC1

PC2

● ●

● ●

●●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

● ●●●

● ●

●●

● ●

● ●

●●

● ●

●●● ●

●●

●●

●●●

● ●●

● ●

● ●

● ●

●●

●●

● ●

●● ●●●●

●●

−10 −5 0 5 10 15

−10−505101520

Rep 10

PC3

PC4

10

(9)

• 

•  L

Soukas et al. (2000) Genes & Development 14: 963

33

k-medoids

•  : 229

22

• 

48

•  k-medoids R

cluster pam

48 medoids

•  48

● 48

•  229

48

34

-6 -4 -2 0 2 4 6

-4 -2 0 2

PC1

PC 2

-4 -2 0 2 4 6

-2 0 2 4

PC3

PC 4

PC 1 all

pca.tr$x[, i]

Frequency

-6-4-20246

010203040

PC 1 k-medoids

pca.tr$x[kmed$id.med, i]

Frequency

-6-4-20246

0246810

PC 2 all

pca.tr$x[, i]

Frequency

-4 -2 0 2 4

01020304050

PC 2 k-medoids

pca.tr$x[kmed$id.med, i]

Frequency

-4 -2 0 2 4

0246810

PC 3 all

pca.tr$x[, i]

Frequency

-4-20246

01020304050

PC 3 k-medoids

pca.tr$x[kmed$id.med, i]

Frequency

-4-20246

0246810

PC 4 all

pca.tr$x[, i]

Frequency

-4-20 24 6

020406080

PC 4 k-medoids

pca.tr$x[kmed$id.med, i]

Frequency

-4-20 24 6

051015

•  medoid

•  medoid

• 

1.  5x5

2.  w(0)

3.  g i 1

4.  g i

BMU (best matching unit)

5.  BMU BMU

g i

w(t+1) = w(t) + h(d, t)(g i – w(t))

ht(d, t) = θ(d, t)α(t)

θ(d, t) α(t)

6.  3-5

θ(d, t) α(t)

35

g i = (0.70, 0.23, 0.31)

w BMU = (0.60, 0.26, 0.30)

 g i

BMU

w(t +1) = w(t) + h(d,t)(g i − w(t))

BMU

h(0,t) = 0.8 BMU

w (t +1) =

0.60

0.26

0.30

"

#

$

$ $

%

&

'

' '

+ 0.8 ×

0.70

0.23

0.31

"

#

$

$ $

%

&

'

' '

0.60

0.26

0.30

"

#

$

$ $

%

&

'

' '

"

#

$

$ $

%

&

'

' '

=

0.52

0.29

0.29

"

#

$

$ $

%

&

'

' '

h(1,t) = 0.4 h(d>1, t) = 0

w (t +1) =

0.66

0.72

0.72

"

#

$

$ $

%

&

'

' ' + 0.4 ×

0.70

0.23

0.31

"

#

$

$ $

%

&

'

' '

0.60

0.72

0.72

"

#

$

$ $

%

&

'

' '

"

#

$

$ $

%

&

'

' '

=

0.68

0.53

0.55

"

#

$

$ $

%

&

'

' '

(2,3)

36

(10)

•  h>p://

genomics.stanford.edu

•  6 × 5 828

• 

• 

• 

Tamayo et al. (1999) PNAS 96: 2907

37

SOM

38

• 

• 

39

•  supervised learning

– 

1 → (classificaRon)

→ regression

•  SVM

Random Forest

•  unsupervised learning

– 

•  k-means

40

(11)

support vector machine (SVM)

• 

• 

• 

• 

41

SVM

42

basis funcRon)

(kernel funcRon

feature space)

input space)

43

k(x,z) = x T z

k( x,z) = (x T z + c) M

k(x,z) = exp − x − z

2

2 σ 2

$

%

&

&

'

(

) )

44

(12)

45

• 

y = 5sin(x) + e

e ~ N (0, 1)

0 2 4 6 8 10

-6 -4 -2 0 2 4 6

linear regression

data$x

d a ta $ y

• 

• 

-3 -2 -1 0 1 2 3

0.00.20.40.60.81.0

Shape of kernel (beta = 1)

x

exp(-beta * x^2)

•  2

•  2 x i , x j

•  x, y

k(x j , x i ) = exp − β x j − x i

( 2 )

y = f (x) = α j

j=1

n

k(x j , x)

x x j k(x j , x)

α j

y x

x

φ

feature space)

input space)

x φ x

y = m =1 w m x m

M + e = w T x + e y = k=1 K w k φ k (x) + e = w T φ(x) + e

w = j=1 α n φ(x j )

n

y = α j φ (x j )

T φ (x)

j=1

n + e = α j k(x j , x)

j=1

n + e

47

K =

k(x 1 , x 1 ) k(x n , x 1 )

  

k(x 1 , x n )  k(x n , x n )

⎜ ⎜

⎟ ⎟

• 

R( α ) = y i − α j

j=1

n k(x j , x i )

⎝⎜

⎠⎟

2

i=1

n

= (y − K α ) Τ ++ (y − K α )

• 

•  R α R α α

α = (K Τ K) −1 K T y = K −1 y

0 2 4 6 8 10

-6 -4 -2 0 2 4 6

kernel regression without regularization

data$x

d a ta $ y

• 

overfivng

• 

48

R( α ) = (y − K α ) Τ " (y − K α ) + λα Τ K α

• 

λ

•  R α R α α

α = (K + λ I) −1 y

0 2 4 6 8 10

-6 -4 -2 0 2 4 6

kernel regression (lmbd = 0.4 )

data$x

d a ta $ y

0 2 4 6 8 10

-6-4-20246

kernel regression (lmbd = 0.04 )

data$x

data$y

0 2 4 6 8 10

-6-4-20246

kernel regression (lmbd = 4 )

data$x

data$y

•  λ = 0.04 •  λ = 4

(13)

SVM

-1.0 -0.5 0.0 0.5 1.0

-1 .0 -0 .5 0 .0 0 .5 1 .0

Linear kernel

x[,1]

x[ ,2 ]

-1.0 -0.5 0.0 0.5 1.0

-1 .0 -0 .5 0 .0 0 .5 1 .0

Gaussian kernel

x[,1]

x[ ,2 ]

SVM

49

Brown et al. 2000 PNAS 97:262

•  SVM

• 

•  2,467 79 5

50

• 

– 

• 

– 

• 

– 

false discovery rate: FDR

• 

51

• 

• 

• 

–  eQTL QTL

52

(14)

• 

(R 5)

• 

•  :

•  ISBN-10: 4320019253

•  ISBN-13: 978-4320019256

• 

• 

R

h>p://www.amazon.co.jp/

53

54

参照

関連したドキュメント

防災課 健康福祉課 障害福祉課

1 低炭素・高度防災 都市を目指した環境

1 低炭素・高度防災 都市を目指した環境

10 特定の化学物質の含有率基準値は、JIS C 0950(電気・電子機器の特定の化学物質の含有表

第二の,当該職員の雇用および勤務条件が十分に保障されること,に関わって

Q7 

[r]

また、手話では正確に表現できない「波の音」、 「船の音」、 「市電の音」、 「朝市で騒ぐ 音」、 「ハリストス正教会」、