在家也能作的Microarray分析
• Desktop or laptop PC/Linux/Mac
• Free software
• Free Array Data
講者: 董建億 博士
VYMGC Microarray Core lab
Workflow
• Data preprocessing
– Normalization
– Probe summarization
– Signal adjustment (flag, flooring)
– Signal transform (fold change, log transform)
• Data QC/QA
– QC index (Hybridization control)
– Similarity analysis (Clustering, PCA, MDS)
• Find differentially expressed genes
– Statistical analysis – Set Cut-off
• Biological interpretation
– Gene annotation – Gene set analysis – Network analysis
System requirement
• Hardware:
– CPU: at least P4
– 4Gb RAM in 64-bite OS is strongly recommended!
– With great patience, 2 Gb in 32-bit OS is OK.
• Software:
– R GUI environment
• http://cran.csie.ntu.edu.tw/
– MultiEcperimentViewer
• http://sourceforge.net/projects/mev-tm4/files/latest/download
MultiExperiment Viewer
MeV is a desktop application for the analysis, visualization and data-
mining of large-scale genomic data. (http://www.tm4.org/mev/)
The R Project for Statistical Computing
R is a free software environment for statistical computing and
graphics. It compiles and runs on a wide variety of UNIX
platforms, Windows and MacOS.(http://www.r-project.org/)
R language
Installation
• Download MeV(4.8) and Unzip in your Hard Drive
• Install JAVA 3D API(32-bit) for PCA 3D plot
• Download EazyKit.zip and unzip into R folder of MeV
• Run RunMeFirst.bat
• Run RGUI
• library(lucasLazyPack)
• lz.GUI()
Data source
• NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/)
Demo samples
GSE12211
http://www.ncbi.nlm.nih.gov/ geo/query/acc.cgi?
acc=GSE12211
Normalized Data Matrix
Raw Data
Workflows
.CEL files (Affymetrix)
.Txt files (Agilent, Illuminia, gpr)Probe signal table
Data QC/QA (Box Plot, MDS, PCA)
Sta?s?cal analysis (limma, ANOVA) Raw data (.cel)
Normaliza>on (RMA) Hybridiza>on QC report
Raw data
Normaliza?on
Hybridiza?on QC report NCBI &
ArrayExpress
By MeV
By R Script
Clustering (HCL, SOM..)
lucasLazyPack
Download:
https://sites.google.com/site/lucastproject/tools/lucaslazypack Install from EasyKit folder
• Double click RunMeFirst.bat
Install from RGUI:
• Install.packages(file.choose()), select download zip file. Usage:
In EasyKit folder, click RunGUI.bat
• library(lucasLazyPack)
• lz.GUI()
Data preprocess
• Normalization
1. Load CEL files
2. Select normalization algorithms 3. Show QC report
4. Save result
• Signal Flooring
1. Load data file
2. Select flooring level 3. Save result
• Log 2 transfrom
• Sample Reorder
QC report
Dimensional reduction
• DimR.R
1. Load Data matrix
2. Assign data point
3. Show MDS plot
• PCA (MeV)
MeV
Two class analysis
• Create RMA data files(By LazyPack)
• Load Data
– [File]>[Load Data]>[Select File loader]>[Affymetrix] >[RMA File] – [Adjust Data]>[Gene\Row Adjustment]>[Mean center gene/rows]
• Assign sample group
– [Display]>[Sample/Column Label]>[Edit label/Reorder Samples]>[Edit] – [Cluster Manager]>[Sample Clusters]>[Auto-Cluster by Factor]
• Statistics Analysis
– T-test
• 2 classes
• Paired samples
– Limma
• 2 classes
– Multiple test correction
• Clustering (HCL)
Multiple Class analysis
• Create RMA data files(By LazyPack)
• Load Data
– [File]>[Load Data]>[Select File loader]>[Affymetrix] >[RMA File] – [Adjust Data]>[Gene\Row Adjustment]>[Mean center gene/rows]
• Assign sample group
– [Display]>[Sample/Column Label]>[Edit label/Reorder Samples]>[Edit] – [Cluster Manager]>[Sample Clusters]>[Auto-Cluster by Factor]
• Statistics Analysis
– ANOVA – Limma
– Multiple test correction
• Clustering
– Hierarchical Clustering – Self-Organization Map – K-mean/median
Find differentially expressed genes
• Statistics
– Two samples (ratio) – One class sample – Two sample pools
• t-test(Welch’s t test)
• limma
– >2 group (1-way ANOVA) – Pair comparison (pair t-test)
– Multi-factor analysis (n-way ANOVA) – Time course
• Select D.E. genes
– Multiple test correction
• p-value correction
– Filtration
• Fold changes-cut-off
• p-Value cut-off
– Volcano plot
1 2 3 4 5
T-test
Equal sample sizes, equal variance
Unequal sample sizes, equal variance
Unequal sample sizes, unequal variance
Analysis of Variance (ANOVA)
Volcano plot
• X axis: fold change
• Y axis: confidence level (p-value)