2017/11/24
2
1
(associa.on analysis)
(associa.on study)
……T……C……A……G……T………A…………A………
•
DNA
……T……C……A……A……T………A…………A………
……T……A……G……G……T………A…………A………
……G……C……A……A……T………C…………T………
……T……A……A……G……T………A…………A………
……T……C……A……A……T………A…………A………
……T……C……A……G……T………A…………T………
……T……A……G……G……T………C…………A………
……G……C……A……G……C………A…………T………
……T……A……A……G……C………A…………A………
……T……C……A……G……C………C…………T………
……T……A……A……A……C………C…………A………
……T……C……G……G……C………A…………T………
……G……A……A……G……C………A…………A………
……T……C……A……A……C………C…………A………
A
B
C
:
: :
DNA
200
70/80
23/120
200
2
0 7 1 A
T
T
T
T
T
C
•
• QTL
•
• GWAS
• χ
2• Fisher
•
•
• 1 2
•
3
• 3
AA, Aa, aa 3
30,000•
•
2005
4
Pictured by Dr. Satoshi Niikura
5
QTL
• DNA
QTL
•
•
•
•
×
6
• DNA
SNP
•
•
•
ATCGAG TAGACT TATACG
ATCGAG TAGACT TATACG
ATCGAG TAGACA TATACG
ATCGAG TAGACT TATACG
ATCGAG TAGACT TATACG
ATCGAG TAGACT TATACG
ATCGAG TAGACA TATACG
ATCGAG TAGACA TATACG
ATCGAG TAGACT TATACG
ATCGAG TAGACA TATACG
7
(single nucleo.de polymorphisms: SNPs)
GeneChip Rice 44K SNP Genotyping Array
• 44,100 SNPs (10kb 1 SNP)
Tung et al. (2010) Rice 3:205-217
8
QTL
DNA
GWAS
DNA
Morrell et al. (2012) Nature Review Gene.cs 13:85
QTL vs.
linkage disequilibrium: LD
10
linkage disequilibrium: LD
QTL
(Modified from Balding 2006)
SNP
( )
SNP
( )
SNP
12
(Rafalski 2002
●B ●b
●A pAB pAb pA
●a paB pab pa
pB pb
r2= D
2
pApapBpb
(●● 1,●● 0
)
D = p
AB− p
Ap
B= p
ABp
ab− p
Abp
aBr2= 0.25
2
0.5× 0.5 × 0.5 × 0.5
= 1
AB
AB
AB
r2= 0.1024 r2= 0
A B
13
•
•
(Rafalski 2002 )
14
LD
Gupta et al. (2005)
•
•
15
AA
aa
AA aa
χ2 Fisher Fisher’s exact test
16
χ
2(R) (S)
AA f11 (11) f12 (4) f1. (15) aa f21 (3) f22 (7) f2. (10) f.1 (14) f.2 (11) n (25)
p(R) = f.1 / n = 14 / 25 = 0.56 p(S) = f.2 / n = 11 / 25 = 0.44 AA p(AA) = f1. / n = 15 / 25 = 0.60 aa p(aa) = f2. / n = 10 / 25 = 0.40
↓ R AA n × p(R) × p(AA) = 25 × 0.56 × 0.60 = 8.4 R aa n × p(R) × p(aa) = 25 × 0.56 × 0.40 = 5.6 S AA n × p(S) × p(AA) = 25 × 0.44 × 0.60 = 6.6 S aa n × p(S) × p(aa) = 25 × 0.44 × 0.40 = 4.4
χ2= (obs− exp)
2
∑
exp =(11− 8.4)28.4 +
(3− 5.6)2 5.6 +
(4− 6.6)2 6.6 +
(7− 4.4)2 4.4 = 4.57
(r-1)(c-1) χ2 r c
5%
χ0.01
2 (1) = 6.63 >χ2= 4.57 >χ0.05 2 (1) = 3.84
5
17
:
-4 -2 0 2 4 6 8
mQQ mqq
QTL 㑇ఏᏊᆺ
QQ qq
࣐ 勖
࢝ 勖 㑇 ఏ Ꮚ ᆺ
AA
aa
40 10
10 40
-4 -2 0 2 4 6 8
₯ᅾⓗ࡞ΰྜศᕸ
-4 -2 0 2 4 6 8
࣐࣮࣮࢝
⾲⌧ᆺศᕸ
mAA
-4 -2 0 2 4 6 8
maa
y i = u + β j x ij + e i
x y
-0.2 0.2 0.6 1.0
-20246
Marker genotype
Phenotype
i AA xi=2
aa xi=0
xi yi
2 D C N
83
0 2 18
-3 -2 -1 0 1 2 3
455055
x
y
y
i= α + β x
i+ ε
i= y ˆ
i+ ε
iy
dependent (response) variable
independent (explanatory) variable
SNP
β
regression coefficient
ε
residuals yi
ˆ
y i=α+βxi
xi εi
α
intercept, constant term y
19
The method of least squares
ε
i= y
i− (α + βx
i)
( 2
SSE = εi2
i n
∑
= (yi−α − βxi) 2 in
∑
:
∂SSE
∂β =−2 i (yi−α − βxi)xi
n
∑
= 0∂SSE
∂α =−2 i (yi−α − βxi)
n
∑
= 0a = yi
i
∑
n n− b ixi∑
n n = y − bx b =xiyi−
i
∑
n ixi∑
n iyi∑
n nxi2− xi
i
∑
n( )
2 ni
∑
n= i(xi− x )(yi− y )
∑
n(xi− x )2
i
∑
nn xi
i
∑
nxi
i
∑
n xi 2 i∑
n#
$
%
%
&
' ( ( a b
#
$ % & ' ( = iyi
∑
nxiyi
i
∑
n#
$
%
%
&
' (
-3 -2 -1 0 1 2 3 (
455055
x
y
yi ˆ
y i=α+βxi
xi εi
20
(a b α β
y i = u + β j x ij + e i
1311
SNPs
All materials can be downloaded from hfp://ricediversity.org/
SNPs …
0e+00 1e+08 2e+08 3e+08
05102030
position (bp)
−log10(p)
LD
LD
Rafalski and Morgante 2004 Oraguzie et al. 2007) 23
:
p
… suppose that a would-be gene7cist set out to
study the “trait” of ability to eat with chops7cks in the
San Francisco popula7on by performing an
associa7on study with the HLA complex. The allele
HLA-A1 would turn out to be posi7vely associated
with ability to use chops7cks … because the allele
HLA-A1 is more common among Asians than
Caucasians.”
Lander and Schork (1994)
HLA Human Leukocyte Antigen
24
?
1
2
(Modified from Balding 2006)25
• Indica
Japonica
y i = u + β j x ij + v k q ik
k=1
K
∑ + e i
1.0
0.0 0.5
Bayesian
Structure Pritchard et al. (2000) Gene.cs 155:945– 959
4
6 K=6
2
qi1 = 0.78, qi3 = 0.32, 0
0e+00 1e+08 2e+08 3e+08
0510152025
position (bp)
−log10(p)
…
→ 29
A
• Yu et al. (2006) Nat. Genet. 38: 203
y i = u + β j x ij + v k q ik
k=1
K
∑ + α i + e i
a ~ N (0, Aσ α 2 )
var(α
i) = a(i, i)σ
α2cov(α
i,α
i') = a(i, i ') σ
α2→
0e+00 1e+08 2e+08 3e+08
01234
position (bp)
−log10(p)
1SNP
SNP
SNP
A A
B B
B A
32
B A
SNPs
SNP
A
A B
B A
33
B
Bayesian
QTL
y i = u + β j γ j x ij
j=1
J
∑ + v k q ik
k=1
K
∑ + α i + e i
• Iwata et al. (2007) Theor Appl Genet 114:1437–1449
γ
j= 1
0
! "
#
j SNP
j SNP
γ
j~ B(1, p
j)
0e+00 1e+08 2e+08 3e+08
0.00.20.40.60.81.0
position (bp)
Posterior prob. of QTL
Bayesian
→
γ j
…
Chr. 3
GS3
63 kb
id3008333
546 kb
id3008127
Chr. 5
qSW5
id5002699
1.0 0.95
66 kb 0.88
0e+00 1e+08 2e+08 3e+08
0.00.20.40.60.81.0
position (bp)
Posterior prob. of QTL
SNPs
Q, K
• Q
– Structure (Pritchard et al. 2000)
– Price et al. 2006)
– Zhu and Yu 2009)
• K:
– Loiselle et al. (1995),
Ritland et al. (1996))
– Zhao et al. 2007
–
37
GWAS
Atwell et al. (2010) Nature 465: 627
→ a
39
A B (
C ( D
false posi.ve rate: FPR = B / (A + B)
false nega.ve rate: FNR = C / (C + D)
false discovery rate: FDR = B / (B + D)
40
• FDR: false discovery rate
1. K P
2. P
3. FDR q* 5%
i
4. 1 i
Benjamini and Hochberg (1995)
) ( )
2 ( ) 1
(
P P
KP ≤ ≤ ! ≤
K
P
iiq
*
)
(
≤
• p
• K P
p K × P
•
P
(i)≤ iq
*
K ⇔ q
*
≥ KP
(i)i
• q* i
K × P
• q* 5%
GWAS
43
• GWAS Hd6
Yano et al. (2016) Nature Gene.cs 48: 927
Yano et al. (2016) Nature Gene.cs 48: 927 44
Yano et al. (2016) Nature Gene.cs 48: 927 45
46
47
Yang et al. (2003) Am. J. Hum. Genet 73: 627
• ACTN3 α-c.nin-3
• X α-c.nin-3
• a-ac.nin-3 a-ac.nin-2 ACTN3
126,559 GWAS
• Fig. 1 3 SNPs
Rietveld et al. 2014, PNAS 13790, Ward et al. 2014 PLoS ONE e100248)
• R2 0.02%
• Fig. 2
48
Rietveld et al. (2013) Science 340: 1467
•
•
•
•
•
• DNA
QTL
49
50
• 2050
1.7
•
2050 90 …
Tester M, Langridge P(2010) Science 327: 818
•
1975
/ 292
1982
1984
Meuwissen et al. (2001) Gene.cs 157:1819
DNA 5
…
Garcia-Ruiz et al.
Proc Natl Acad Sci 113(28): E3995-4004
“The most drama7c response to genomic selec7on was observed for the lowly heritable traits DPR, PL, and SCS. Gene7c trends changed from close to zero to large and favorable, resul7ng in rapid gene7c improvement in fer7lity, lifespan, and health in a breed where these traits eroded over 7me.”
(
, ,
)
Watanabe et al. (2005) Ann Bot. 95:1131
…
DNA
DNA
DNA
LD
GS
DNA
yi: i
xij: i j
y
x
3x
4x
1x
2x
Ky = f (x
1, x
2,, x
K)
(y) DNA (x
i)
f (x)
f (.)
y DNA x
1,...,x
K
vs.
Watanabe et al. (2005) Ann Bot. 95:1131
DNA
DNA
yi=µ + βjxij+ ei
j= 0 N
∑
...
DNA
vs.
yi=µ + βjxij+ ei j= 0
N
∑
vs.
yi=µ + βjxij+ ei
j= 0 N
∑
hfp://www.soran.net/index_html/A0084070.htm
1 3
F1
6
GS GS
GS GS
3
Marker assisted selec.on: MAS
…
×
↓
QTL QTL
MAS
• ≠
– QTL
– QTL QTL
• QTL
– QTL
– QTL
GS
• QTL
→ QTL
x
y = f (x)
\\\x
y
273 142
304 423
173 373 234 138 203133 223
y = f (x)
GWAS GS
GS model
GWAS model
“large p small n” problem
p >> n
x, y 1
↓
↓
60K …
…
y
i= µ + β
jx
ij+ e
ij=0 J
∑
PRESS
Se
GS
0 2 4 6 8 10
024681012
0 2 4 6 8 10
024681012
x
y
0 2 4 6 8 10
024681012
x b b y= 0+ 1
∑
=
+
=
7
1 0
k k kx
b b y
(n-fold cross-validation)
1. n
2. i
3. i 2
4. 2, 3 n
5.
2
PRESS
n leave-one-out
y ˆ
iy
iy ˆ
iy ˆ
iy
i0e+00 1e+08 2e+08 3e+08
01234
GWAS result
Position (bp)
−log10(p)
SNPs
x
212x
468x
985x
276x
647S: SNPs
y i = x ij b j
j∈S
∑ + e i
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
6 7 8 9 10 11 12
78910
y.obs
y.pred
MLR
r 2 = 0.36
argmin
b
(E) = argmin
b
(y
i− x
iTb )
2i
∑ + λ b
2#
$ % &
' (
y i = x ij b j
j
J
∑ + e i = x i T b + e i
b
2= b
j2j J
∑
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
6 7 8 9 10 11 12
7.58.08.59.09.5
y.obs
y.pred
r 2 = 0.50
MLR
Δr 2 = +0.14
LASSO
argmin
b
(y
i− x
iTb )
2i
∑ + λ b
1#
$ % &
' (
y
i= x
ijb
jj J
∑ + e
i= x
iTb + e
iLASSO:
b
1
= b
jj J
∑
-2 -1 0 1 2
0.00.51.01.52.0
x
y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6 7 8 9 10 11 12
7.07.58.08.59.09.510.0
y.obs
y.pred
LASSO
r 2 = 0.54
MLR
Δr 2 = +0.18
ridge
0e+00 1e+08 2e+08 3e+08
−0.040.000.02
Ridge
Position (bp)
Coefficients
0e+00 1e+08 2e+08 3e+08
−0.20.00.10.2
LASSO
Position (bp)
Coefficients
Ridge, LASSO
Ridge LASSO
0 shrink
GS3
LASSO
GS3
↓
Ridge polygene LASSO
QTL
←
Elas.c net
argmin
b
(y
i− x
i Tb)
2 i∑ + λ( 1 −α
2
b
2+ α b
1
)
#
$ % &
' (
y
i= x
ijb
jj J
∑ + e
i= x
iTb + e
ib
1
= b
jj J
∑
b
2= b
j 2 j J