Microarray基礎分析Idocx

(1)

Visit the web site for related materials:

http://lucas.genome-analyst.org/activities-articles/course/nrpb2011q4

Microarra

y 基礎分

析

第一次分析就上手

董建億

(2)

What is Microarray Technique?

Principle of microarray

Technique platform

 Spotting array

 Pre-synthetic probe (glass slide)

 In situ synthesis

 Ink-jet technology (e.g. Agilent system)

 Photolithographic technology (e.g. Affymetrix system)

Quality Check before Microarray experiments

Sample quality check

Experiment Design

Where is experiment variation?

 Signal: Fat or not.

 Noise: Batch, gender, age…..

System errors

 Technique repeats

 Biological repeats

 How many repeats?

1

Good Fat Normal

Batch Age Batch Age

1 1 6

2 2 9

3 3 18

4 1 15

5 2 8

6 3 10

(3)

General Analysis workflow

Data preprocessing

Normalization

Signal adjustment (flag, flooring…)

Probe summarization

Signal transform (fold change, log transform)

Data QC/QA

QC index (Hybridization control)

Visualization (PCA, MDS, Box plot..)

Find differentially expressed genes

Statistical analysis

p-Value correction

Biological interpretation

Gene annotation

Gene set analysis

Gene network analysis

2

(4)

Data preprocessing

Normalization

Dual color system

Single color system

Dual color system

– Probe flag

– R/G Normalization

– Signal flooring

– Convert to signal ratio.

– Log Transform

Single color system

– Background correction

– Summarization

– Normalization

– Log Transform

– Signal flooring

3

(5)

Signal adjustment (flag, flooring…)

Probe summarization

Signal transform (fold change, log transform)

4

(6)

Microarray data quality control and assessment

Is hybridization reaction OK?

Is signal comparable?

Is expected variations observed?

Distance measurement

 Euclidean

√ ^∑

ⁱ⁼¹⁶

⁽

^X^iA

⁻

^X^iB

⁾

²

 Manhattan

∑

i=1 6

|

^XiA

⁻

^XiB

|

 Pearson Correlation

5

(7)

∑

i=1 6

(

X_i

−X )(Y

_i

−Y )

(n−1)S

_XS_Y

6

(8)

Distance Matrics

Hierarchical Clustering

Dimension reduction

Principle Component Analysis or Matric MultiDimensional Scaling

7

(9)

Find differentially expressed genes

Statistics

Two samples (ratio)

One class sample

Two class sample

 t-test(Welch’s t test)

 limma

Multiple Classes sample

 One way ANOVA

 Limma

Pair comparison

 pair t-test

 2 way ANOVA

Multi-factor analysis (n-way ANOVA)

Time course

1 2 3 4 5

8

(10)

Comparison of Statistics

Test Variance modeling Reference Package R

Welch’s T-test - Fixed variance

- Heterodasticity ^Welch CIT (internal)

ANOVA -Fixed variance

-homoscedasticity ^Fisher CIT (internal)

Wilcoxon Non parametric Wilcoxon stats

SAM Non parametric Tusher et al

2001 ^samr

RVM

Inverse gamma distribution on the variance (estimated from all the data set)

Wright & Simon

2003 CIT (internal)

Limma Moderate t-test. Usual variance replaced by a conditional variance. Bayesian approach

Smyth

2004 ^limma

VARMIXT Gamma mixture model on the variance Delmar et al 2005 varmixt

SMVAR Mixed model (fixed condition effect and random gene effect)

Jaffrézic et al

2007 ^SMVar

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/?tool=pubmed

Type I Error Power Stablility Ease of Calucation

Sample size small large small large 　 use　 time　

t-test + +++ + +++ +++ +++ +++

anova +++ +++ + +++ +++ +++ +++

wilcoxon + + + ++ ++ +++ ++

SAM +++ +++ + ++ ++ ++ ++

RVM + + +++ +++ ++ + +

Limma +++ +++ +++ +++ +++ ++ +++

VarMixt +++ +++ +++ +++ + + +

SMVar + + ++ +++ + ++ +++

9

(11)

p-Value Correction

Multiple test correction (MTC)

αe

Experiment-wise significant level

αc

comparison-wise significance level

Bonferroni correction

K independent test, so all α is equal. so overall number of type I errors = K x α

α

^{e =}

α

^{c/K =>}

α

^{c =}

α

^{e x K}

Correct p = p x K

Dunn-Sidak

α

^{e = 1-(1-}

α

^c)^k

Correct p = 1-(1-p)^1/k

Bonferroni Step-down (Holm)

1st. corrected p-value = p-value x n(gene number) 2nd. corrected p-value = p-value x n-1

Microarray基礎分析Idocx

Microarra

y 基礎分

析

第一次分析就上手

董建億

What is Microarray Technique?

Principle of microarray

Technique platform

Quality Check before Microarray experiments

Sample quality check

Experiment Design

Where is experiment variation?

System errors

1

General Analysis workflow

Data preprocessing

Normalization

Signal adjustment (flag, flooring…)

Probe summarization

Signal transform (fold change, log transform)

Data QC/QA

QC index (Hybridization control)

Visualization (PCA, MDS, Box plot..)

Find differentially expressed genes

Statistical analysis

p-Value correction

Biological interpretation

Gene annotation

Gene set analysis

Gene network analysis

2

Data preprocessing

Normalization

Dual color system

Single color system

3

Signal adjustment (flag, flooring…)

Probe summarization

Signal transform (fold change, log transform)

4

Microarray data quality control and assessment

Is hybridization reaction OK?

Is signal comparable?

Is expected variations observed?

Distance measurement

√ ∑

(

−

)

∑

|

−

|

5

∑

(

−X )(Y

−Y )

(n−1)S

6

7

Find differentially expressed genes

Statistics

Two samples (ratio)

One class sample

Two class sample

 t-test(Welch’s t test)

 limma

Multiple Classes sample

 One way ANOVA

 Limma

Pair comparison

 pair t-test

 2 way ANOVA

Multi-factor analysis (n-way ANOVA)

Time course

1 2 3 4 5

8

Comparison of Statistics

√ ^∑

⁽

⁻

⁾

⁻

Sample size small large small large 　 use　 time