• 検索結果がありません。

pdf Microarray基礎分析III

N/A
N/A
Protected

Academic year: 2018

シェア "pdf Microarray基礎分析III"

Copied!
34
0
0

読み込み中.... (全文を見る)

全文

(1)

Func%onal annota%on and gene 

set analysis 

(2)

General workflow

•  Data preprocessing 

–  Normaliza)on 

–  Signal adjustment (flag, flooring…)  –  Probe summariza)on  

–  Signal transform (fold change, log transform) 

•  Data QC/QA 

–  QC index (Hybridiza)on control)  –  Visualiza)on (PCA, MDS, Box plot..) 

•  Find differen%ally expressed genes 

–  Sta)s)cal analysis  –  Set Cut‐off 

•  Biological interpreta%on 

–  Gene annota)on  –  Gene set analysis 

–  Gene network analysis 

Array profiles 

A few D.E. genes 

Large‐scaled screening 

Literature studies  Biomarkers 

Cellular Func)ons 

Genes 

(3)

Biological interpreta%on

•  Func%onal annota%on of individual gene 

–  Gene ontology (GO, KEGG..) 

–  Pathway (KEGG, BIOCARTA, GenMAPP) 

–  OMIM 

–  Database gateway  (GeneCards, Uniprot..) 

  Gene set analysis 

Func)onal annota)on enrichment analysis 

•   DAVID, GOTM, … 

Gene set enrichment analysis 

  GSEA

(4)

Gene Ontology

•  Systemic biological func)on vocabulary 

•  Acyclic network structure 

•  Gene associa)on 

www.geneontology.org 

(5)

KEGG: Kyoto Encyclopedia of Genes and Genomes 

www.genome.jp/kegg/ 

(6)

The Reactome Project 

(7)

OMIM  ‐ Online Mendelian Inheritance in Man 

h[p://www.ncbi.nlm.nih.gov/omim

(8)

Gene expression omnibus 

•  NCBI GEO ( h[p://www.ncbi.nlm.nih.gov/geo/) 

(9)

Biological interpreta%on

•  Func%onal annota%on of individual gene 

–  Gene ontology (GO, KEGG..) 

–  Pathway (KEGG, BIOCARTA, GenMAPP) 

–  OMIM 

–  Database gateway  (GeneCards, Uniprot..) 

 

D.E. Genes Array: 

Probe descrip)on

Annota)on   Database

Annota)on   Database

Annota)on  

Database

Gateway

(10)

GeneCards

h[p://www.genecards.org/

(11)

Uniprot 

•  h[p://www.uniprot.org/

(12)

Biological interpreta%on

  Func%onal annota%on of individual gene 

Gene ontology (GO, KEGG..) 

Pathway (KEGG, BIOCARTA, GenMAPP) 

OMIM 

Database gateway  (GeneCards, swissport..) 

•  Gene set analysis 

–  Func)onal annota)on enrichment analysis 

•  DAVID, GOTM, … 

–  Gene set enrichment analysis 

•  GSEA

(13)

What happen in cellular function?

•  Individual gene studies

–  Find differential expressed gene (DEG)

•  Gene-phenotype correlation (t-test, ANOVA, limma…)

–  Predict functional role from its annotation (PubMed,OMIM, Genecards…)

–  Validate function by genetic manipulations (over-express, knock- down..).

•  The risks of the straight forward strategy

–  Which DEG?

•  A try and error game

–  Which function?

•  Do you check right functional assay?

–  Do it work?

•  Function regulation is a team work.

Function 2 Function 1

Cross-talk 3

(14)

Expression Analysis Systema)c Explorer 

•  Also: 

•  Func)onal annota)on enrichment analysis 

•  GeneOntology analysis 

•  Interpreta%on of gene list 

–  Gene Ontology  –  KEGG pathway  –  Biocarta pathway 

•  Sta)s)c methods 

–  Fish exact test 

–  Hypergeometric test  –  Binomial test 

 

(15)

The Database for Annota)on, Visualiza)on and 

Integrated Discovery (DAVID)

•  ID conversion 

•  Gene annota)on 

•  EASE analysis 

•  ….

(16)

Fisher Exact Test 

•  When members of two independent groups can fall into one of two mutually exclusive  categories, Fisher Exact test is used to determine whether the propor)ons of those  falling into each category differs by group. In DAVID annota)on system, Fisher Exact is  adopted to measure the gene‐enrichment in annota)on terms. 

 

A Hypothe%cal Example:  

In human genome background (30,000 gene total), 40 genes are involved in p53  signalling pathway. A given gene list has found that 3 out of 300 belong to p53 

signalling pathway. Then  we ask the ques)on if 10/300 is more than random chance  comparing to the human background of 40/30000. 

A 2x2 con)gency table is built on above numbers:   

•     

Fisher Exact P‐Value =  0.008 and since P‐Value <= 0.01, this user gene list is specifically  associated (enriched) in p53 signalling pathway than random chance. 

User Genes Genome

In Pathway 3 40

Not In Pathway 297 29960

Adapted from DAVID (h[p://niaid.abcc.ncifcrf.gov/) 

(17)

ClassA ClassB

[est cut‐off 

FDR<0.05

FDR<0.05

Biological meaning?

Pihalls of EASE

•  Cut off issues 

•  Gene set issues 

(18)

Gene set approaches

•  From Gene list (IGA)

Expression Analysis Systematic Explorer

•  From Gene set score(GSA)

Gene Set Enrichment Analysis

event-based Gene Set Analysis

(19)

Gene Set Enrichment Analysis

h[p://www.pnas.org/content/102/43/15545.abstract

(20)

ES/NES statistic

-

+

ClassA ClassB

Gene Set 1

[est cut‐off 

Gene Set 2

Gene Set 3

Gene set 3 enriched in Class B

Gene set 2 enriched in Class A

Gene Set Enrichment Analysis

(21)

Dataset distribution

Number of genes

Gene Expression Level

The Kolmogorov–Smirnov test is used to determine whether two underlying one-dimensional probability distributions differ, or whether an underlying probability distribution differs from a hypothesized distribution, in either case based on finite samples.

The one-sample KS test compares the empirical distribution function with the cumulative distribution functionspecified by the null hypothesis. The main applications are testing goodness of fit with the normal and uniform distributions.

The two-sample KS test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.

Gene set 1 distribution

Gene set 2 distribution

h[p//:Fbioinfo.cnio.es/files/training/Fourth_Func)onal_Analysis_of_Gene_Expression/ CNIO_GSEA_16_02_2010.ppt

(22)

NES

pval

FDR

Benjamini-Hochberg

ES(S) = Max(|Phit-Pmiss|)  Kolmogorov-Smirnoff (K-S) distance

Gene Set Enrichment Analysis

(23)
(24)

Practice of GSEA

http://www.broadinstitute.org/gsea/msigdb/index.jsp

•  Gene set (.gmt files)

•  GSEA in MeV

•  GSEA for JAVA

•  GSEA for R

(25)

The problem of heterogeneous gene expression

•  Simple regulatory mechanism

•  Multiple regulatory routine

–  Multiple regulatory pathway

•  Pathway 1

•  Pathway 2

–  Multiple regulatory points

•  Pathway 1 (point 1-3)

•  Pathway 2 (point 1-n)

•  Noising crosstalk

Functional change

others

A gene set

Pathway 1

Pathway 2 Noising crosstalk

Point 2 Point 1

Point 3

(26)

event‐based Gene Set Analysis

(27)

Regulatory event‐based approach

Gene 1

Up‐RE Down‐RE

Gene1 Gene2 Gene3 Gene4

…… GeneX Gene4

Sample 1‐6

Func)onal gene set

RE f (3‐1)/6

Control

Test

Func)onal change

others A gene set

Pathway 1

Pathway 2 Noising crosstalk

Point 2 Point 1

Point 3

(28)

event‐based Gene Set Analysis

Q1: sample randomized

(29)

Organizing HCC func)onal map

•  Map structure 

–  GO term structure 

•  Map refinement 

–  Brach reducing logic 

•  Map Presenta)on 

–  Cytoscape  

(30)

4 stages of HCV‐induced hepatocarcinogenesis

(31)

Event table of HCC1 data set 

(32)

KEGG Cell Cycle Pathway

(33)

B. Function pattern of HCV induced HCCs C. Event heatmap of GO:0045087 A. Workflow of eGSA-R

WriteGoMap()

FetchEvent() eGSA()

eGSA.Read()

Event () NormDist () GeneSummary

User data () ex. GSE6764

z y

Gene signal table

p-Values table

Gene set table

1

2 3

4 Events

Changed p-values Gene signal Sample info

eGSA.Dataset x

Ci Ds

Ec Ac

(34)

Something else..

•  Microarray is dead! NGS is new king!

•  Great! My magic gene can link to cell cycle, apoptosis and P53

pathway!

•  Follow what you’re interested and biological clues.

参照

関連したドキュメント

In addition, more than 50% of fluorescence positive cells exhibited shrinkage and rounding even in the absence of anti-Fas antibodies (about 56, 65, and 56% of PKR-, dN-,

Determination of the Levels of Phosphorylated MAPK and GTP-bound Rac1—J774[SR-BI] cells or HEK293 cells forcedly expressing rat SR-BI and FLAG-tagged human GULP were incubated in

[r]

RNAi 導入の 2

• また, C が二次錐や半正定値行列錐のときは,それぞれ二次錐 相補性問題 (Second-Order Cone Complementarity Problem) ,半正定値 相補性問題 (Semi-definite

The reactions chosen to represent the extrinsic coagulation pathway, the generation of phospholipid binding sites due to platelet activation, the influence of thrombin on Factor

However, in a symbolic regression tree analysis using hospital pathways as statistical units, the type of pathway followed was the key predictor variable, showing in particular

[r]