Visit the web site for related materials:
http://lucas.genome-analyst.org/activities-articles/course/nrpb2011q4
Microarra
y 基礎分
析
第一次分析就上手
董建億
What is Microarray Technique?
Principle of microarray
Technique platform
Spotting array
Pre-synthetic probe (glass slide)
In situ synthesis
Ink-jet technology (e.g. Agilent system)
Photolithographic technology (e.g. Affymetrix system)
Quality Check before Microarray experiments
Sample quality check
Experiment Design
Where is experiment variation?
Signal: Fat or not.
Noise: Batch, gender, age…..
System errors
Technique repeats
Biological repeats
How many repeats?
1
Good Fat Normal
Batch Age Batch Age
1 1 6
2 2 9
3 3 18
4 1 15
5 2 8
6 3 10
General Analysis workflow
Data preprocessing
Normalization
Signal adjustment (flag, flooring…)
Probe summarization
Signal transform (fold change, log transform)
Data QC/QA
QC index (Hybridization control)
Visualization (PCA, MDS, Box plot..)
Find differentially expressed genes
Statistical analysis
p-Value correction
Biological interpretation
Gene annotation
Gene set analysis
Gene network analysis
2
Data preprocessing
Normalization
Dual color system
Single color system
Dual color system
– Probe flag
– R/G Normalization
– Signal flooring
– Convert to signal ratio.
– Log Transform
Single color system
– Background correction
– Summarization
– Normalization
– Log Transform
– Signal flooring
3
Signal adjustment (flag, flooring…)
Probe summarization
Signal transform (fold change, log transform)
4
Microarray data quality control and assessment
Is hybridization reaction OK?
Is signal comparable?
Is expected variations observed?
Distance measurement
Euclidean
√ ∑
i=16(
XiA−
XiB)
2 Manhattan
∑
i=1 6|
XiA−
XiB|
Pearson Correlation
5
∑
i=1 6(
Xi−X )(Y
i−Y )
(n−1)S
XSY6
Distance Matrics
Hierarchical Clustering
Dimension reduction
Principle Component Analysis or Matric MultiDimensional Scaling
7
Find differentially expressed genes
Statistics
Two samples (ratio)
One class sample
Two class sample
t-test(Welch’s t test)
limma
Multiple Classes sample
One way ANOVA
Limma
Pair comparison
pair t-test
2 way ANOVA
Multi-factor analysis (n-way ANOVA)
Time course
1 2 3 4 5
8
Comparison of Statistics
Test Variance modeling Reference Package R
Welch’s T-test - Fixed variance
- Heterodasticity Welch CIT (internal)
ANOVA -Fixed variance
-homoscedasticity Fisher CIT (internal)
Wilcoxon Non parametric Wilcoxon stats
SAM Non parametric Tusher et al
2001 samr
RVM
Inverse gamma distribution on the variance (estimated from all the data set)
Wright & Simon
2003 CIT (internal)
Limma Moderate t-test. Usual variance replaced by a conditional variance. Bayesian approach
Smyth
2004 limma
VARMIXT Gamma mixture model on the variance Delmar et al 2005 varmixt
SMVAR Mixed model (fixed condition effect and random gene effect)
Jaffrézic et al
2007 SMVar
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933223/?tool=pubmed
Type I Error Power Stablility Ease of Calucation
Sample size small large small large use time
t-test + +++ + +++ +++ +++ +++
anova +++ +++ + +++ +++ +++ +++
wilcoxon + + + ++ ++ +++ ++
SAM +++ +++ + ++ ++ ++ ++
RVM + + +++ +++ ++ + +
Limma +++ +++ +++ +++ +++ ++ +++
VarMixt +++ +++ +++ +++ + + +
SMVar + + ++ +++ + ++ +++
9
p-Value Correction
Multiple test correction (MTC)
αe
Experiment-wise significant levelαc
comparison-wise significance levelBonferroni correction
K independent test, so all α is equal. so overall number of type I errors = K x α
α
e =α
c/K =>α
c =α
e x KCorrect p = p x K
Dunn-Sidak
α
e = 1-(1-α
c)kCorrect p = 1-(1-p)1/k
Bonferroni Step-down (Holm)
1st. corrected p-value = p-value x n(gene number) 2nd. corrected p-value = p-value x n-1
....
nth. corrected p-value = p-value x 1