Protein Complex Prediction Using Random Walks Under Domain Structural Constraints
全文
(2) Vol.2012-MPS-90 No.15 2012/9/19. IPSJ SIG Technical Report. P2. P3. D3. P1 D2 D1. Fig. 1. second-order Markov chains. For Eq. (1), we can write the i-th element as follows. P(Xt = vi ) = αbC(i) + (1 − α) · ∑ P(Xt = vi |Xt−1 = v j )P(Xt−1 = v j ). D4. P4. P5. D6 D5. D7. For second-order Markov chains under the domain structural constraint, we have the following representation corresponding to random walks with restarts. P(Xt = vi , Xt−1 = v j ) = αP(Xt = vi |Xt−1 = v j )bC( j) ∑ +(1 − α) P(Xt = vi |Xt−1 = v j , Xt−2 = vk ) · (v j ,vk )∈E. P(Xt−1 = v j , Xt−2 = vk ). Illustration of random walks under domain structural constraints.. deletes overlapping complexes with less significance if the ratio |C1 ∩C2 |/ min {|C1 |, |C2 |} is more than the threshold θ for two complexes C1 and C2 . In addition, algorithm RandomWalk finds a steady state vector x for vertices V such that x = αbC + (1 − α)Px,. (1). where α denotes the restart probability, the i-th element of bC , bC(i) = 1/|C| if vi ∈ C, otherwise 0, and P denotes the transi∑ tion probability matrix that Pi j = wi j / (vk ,v j )∈E wk j . The NWE method [6] successfully improved prediction accuracy by using ∑ ∑ ∑ bC(i) = (vi ,vk )∈E wik / v j ∈C (v j ,vk )∈E w jk if vi ∈ C. 2.2 Second-order Markov Chains on PPI Networks We assume that one domain interacts with at most one domain. Consider the case that domain Da in protein Pi can interact with domain Db in protein P j and Dc in Pk , and a random walker moved from P j to Pi . Then, domains Da and Db are regarded to have interacted with each other. Therefore, domain Da cannot interact with any other domain. It means that the walker cannot move from Pi to Pk using Dc . Figure 1 shows an example of a PPI network with five proteins P1 , . . . , P5 , where solid lines between domains indicate that these domains can interact with each other, and simultaneously indicate that there exist interactions between proteins. Suppose that a random walker moved from P5 to P1 . Then, domain D7 is regarded to be interacting with D1 , and cannot interact with domains D4 and D5 . It means that the walker cannot move from P1 to P3 or P4 . Then, the probabilities that the walker move from P1 to P2 , P3 , P4 , and P5 are w12 /(w12 + w15 ), 0, 0, and w15 /(w12 + w15 ), respectively. Let Xt be a random variable taking the vertex where a random walker is at time t. In the existing methods of RRW and NWE, the first-order Markov property P(Xt |Xt−1 , . . . , X1 ) = P(Xt |Xt−1 ) holds. However, under the domain structural constraint, the property does not always hold. In order to completely satisfy the domain structural constraint, we cannot assume any Markov property. Although, in the previous example, we considered the probability that the walker will move to Xt+1 after Xt = P1 and Xt−1 = P5 , we should consider Xt−2 , Xt−3 , . . ., that is, before the walker arrived at P5 . In this technical report, we consider only. c 2012 Information Processing Society of Japan. (2). (vi ,v j )∈E. (3). Let Y be a steady state matrix in this second-order Markov chain. Then, Yi j represents P(Xt = vi , Xt−1 = v j ), and we have ∑ Y = αPBC + (1 − α) Q( j) f j (Y 0 ), (4) j. where BC denotes the diagonal matrix in which the diagonal entries BC(ii) are bC(i) , P is the same transition matrix as in Eq. (1), Q( j) denotes the transition probability matrix that Q(i,kj) = P(Xt = vi |Xt−1 = v j , Xt−2 = vk ), Y 0 denotes the transpose of Y, and f j (Y) denotes the matrix in which the j-th column is the same as that of Y and the other columns are zero. In a similar way to RRW and NWE, we can get the steady state Y by repeatedly calculating the right hand side of Eq. (4) until Y converges. Thus, in the RepeatedRandomWalk algorithm, our method replaces RandomWalk that finds steady state vectors with the method that finds a steady state matrix Y in Eq. (4).. 3. Conclusion We introduced domain structural constraints that one domain interacts with at most one domain, approximately derived secondorder Markov chains, and proposed modified random walks with restarts under the constrained condition. References [1] [2] [3] [4] [5] [6] [7]. [8]. Enright, A., Dongen, S. and Ouzounis, C.: An efficient algorithm for large-scale detection of protein families, Nucleic Acids Research, Vol. 30, pp. 1575–1584 (2002). Bader, G. and Hogue, C.: An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics, Vol. 4, p. 2 (2003). King, A., Prˇulj, N. and Jurisica, I.: Protein complex prediction via costbased clustering, Bioinformatics, Vol. 20, pp. 3013–3020 (2004). Wu, M., Li, X., Kwoh, C. and Ng, S.: A core-attachment based method to detect protein complexes in PPI networks, BMC Bioinformatics, Vol. 10, p. 169 (2009). Macropol, K., Can, T. and Singh, A.: RRW: repeated random walks on genome-scale protein networks for local cluster discovery, BMC Bioinformatics, Vol. 10, p. 283 (2009). Maruyama, O. and Chihara, A.: NWE: Node-weighted expansion for protein complex prediction using random walk distances, Proteome Science, Vol. 9, No. Suppl 1, p. S14 (2011). Ozawa, Y., Saito, R., Fujimori, S., Kashima, H., Ishizaka, M., Yanagawa, H., Miyamoto-Sato, E. and Tomita, M.: Protein complex prediction via verifying and reconstructing the topology of domain-domain interactions, BMC Bioinformatics, Vol. 11, p. 350 (2010). Zhao, Y., Hayashida, M., Nacher, J., Nagamochi, H. and Akutsu, T.: Protein complex prediction via improved verification methods using constrained domain-domain matching, pp. 394–406 (2012).. 2.
(3)
図
関連したドキュメント
Northern blot analysis using 5’ portion of the chicken DDB1 cDNA as a probe detected a single transcript of ~ 4.3 kb in chicken DT40 cells as well as in human HeLa cells
Methods: Angiopoietin-like protein-3 (ANGPTL3), LPL activity, HTGL activity, remnant lipoproteins (RLP-C & RLP-TG), small dense LDL-Cholesterol (sd LDL-C) were measured in
Ability of HBx to overcome H-RAS V12 -induced senescence in BJ cells immortalized by hTERT Seeing as HBx did not exhibit the ability to immortalize primary human fibroblasts or
Character- ization and expression analysis of mesenchymal stem cells from human bone marrow and adipose tissue. IGFBP-4 is an inhibitor of canonical Wnt signalling
14 It is true that although proliferating bile ductules were scattered within portal tracts, MCP-1 expression in bile ductules and αSMA-positive HSCs were not found in CHF,
performed 4 h and 8 h euglycemic (5.5 mmol/l) clamps with 3 different insulin concentrations (basal, medium postprandial and high postprandial, ranging from ~ 35 to ~ 1450 pmol/l)
[Journal Article] Two independent regions of human telomerase reverse transcriptase (hTERT) are important for their oligomerization and telomerase 2002.
By adapting tools from information theory, I construct optimal, nonlinear local statistical predictors for random fields on networks; these take the form of minimal