Infinite limits of the duplication model and graph folding
Anthony Bonato
1†and Jeannette Janssen
2‡1Department of Mathematics, Wilfrid Laurier University, Waterloo, ON, Canada, N2L 3C5
2Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada, B3H 3J5
We study infinite limits of graphs generated by the duplication model for biological networks. We prove that with probability1, the sole nontrivial connected component of the limits is unique up to isomorphism. We describe certain infinite deterministic graphs which arise naturally from the model. We characterize the isomorphism type and induced subgraph structure of these infinite graphs using the notion of dismantlability from the theory of vertex pursuit games, and graph homomorphisms.
Keywords: massive networks, duplication model, infinite random graph, folding, adjacency property, graph homo- morphism
Much recent attention has focused on stochastic models of real-world massive self-organizing networks.
The reader is directed to the two recent surveys [1, 2]. In self-organizing networks, each node acts as an independent agent, which will base its decision on how to link to the existing network on local knowledge.
As a result, the neighbourhood of a new node will often be similar to that of an existing node. An important example of such a network consists of the protein-protein interactions in a living cell. Theduplication modelof [7] was designed to model such biological networks. The two parameters of the model are a real numberp∈(0,1), and a finite undirected graphH. New nodes are introduced over a countable sequence of discrete time-steps, with the graph atG0at timet= 0equallingH. At timet+ 1, a nodeuis chosen uniformly at random from the existing nodes inGt. To formGt+1, a new nodevt+1is added. For each of the neighbourszofu, we add the edgezvt+1to the edges ofGt+1with probabilityp.
In this paper, we study the infinite graphs that result when time is allowed to go to infinity. The study of infinite limits of random graph models can give new insight into the properties of the model. Ifp∈(0,1), then the classical results of Erd˝os, R´enyi [8] imply that with probability1, an infinite limit ofG(n, p) graphs is isomorphic to a unique isomorphism type of graph writtenR. This well-known result may appear contradictory at first, since it suggests an infinite random process with a deterministic conclusion.
The deterministic graphRis often called theinfinite random graph, and is the unique isomorphism type of countable graph satisfying theexistentially closedore.c.adjacency property: for all finite disjoint sets of nodesX andY, there is a node not inX ∪Y that is joined to all ofX and none ofY.A logically
†Supported by an NSERC Discovery grant, and from a MITACS grant.
‡Supported by an NSERC Discovery grant, and from a MITACS grant.
1365–8050 c2005 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France
weaker adjacency property islocally e.c.,introduced by the authors in [3]. (In [3], locally e.c. is referred to as property (B).) Ify is a node ofG, thenNG(y) ={z ∈V(G) :yz ∈ E(G)}is theneighbour set ofyinG.A graphGislocally e.c.if for each nodeyofG, for each finiteX ⊆N(y), and each finite Y ⊆V(G)\X, there exists a nodeznot in{y} ∪X∪Y that is joined toX and not toY.The locally e.c. property is therefore a variant of the e.c. property that applies only to sets contained in the neighbour set of a node.
A preliminary connection between the locally e.c. property and the duplication model was made ex- plicit by the following theorem, first proven in [3]. If (Gt : t ∈ N)is a sequence of graphs with Gt
an induced subgraph ofGt+1, then define thelimit of the Gt, writtenG = limt→∞Gt,byV(G) = S
t∈NV(Gt), E(G) =S
t∈NE(Gt).
Theorem 1 Fixp∈(0,1), andG0 =H a finite graph. With probability1,a limitG= limt→∞Gtof graphs generated by the duplication model is locally e.c.
As was proved in [3], in stark contrast to the e.c. property, there are uncountably many non-isomorphic locally e.c. graphs. The main goal of the present article is to introduce countably infinite graphs, which are, with high probability, theuniquelimits of the duplication model, up to the presence of isolated nodes.
These graphs are therefore analogues of the infinite random graph for the duplication model.
FixHa finite graph. LetR0∼=H.Assume thatRtis defined and is finite. For each nodey ∈V(Rt), and each subsetX ⊆ N(y),add a new nodezy,X joined only toX.This gives the graphRt+1 which containsRtas an induced subgraph. DefineRH = limt→∞Rt.Observe that the (deterministic) graph RHis locally e.c.
Suppose thatH, J are finite graphs, withv ∈ V(J). DefineH 4v J if there is a nodeuinJ −v such thatN(v)⊆N(u), andH =J−v. We say that the graphJfolds ontoH.Note that, for loop-free graphs, the definition implies thatuandv are non-joined. We writeH 4 J if there is a nonnegative integerm, graphsH0 =H, H1, . . . , Hm =J, and nodesv0, . . . , vm−1 ∈V(J)so thatHi 4vi Hi+1
for all0 ≤ i ≤ m−1. Note that the relation4is an order relation on the class of all finite graphs.
We name this thefolding order. For example,K2 4T whereT is a tree, while two cliques of different orders are incomparable in the folding order. The relation4is a form of dismantling used to characterize certain vertex pursuit games. The graphs aboveK2in the folding order are sometimes calleddismantlable graphs; see [4].
We extend the folding order to countable graphs as follows. LetH andJ be countable graphs. The relationH 4v Jis defined exactly as in the finite case. FixIas eitherNor one of the sets{0,1, . . . n}, wheren ∈ N. We writeH 4 J if there exists a sequence of countable graphs(Ht : t ∈ I)so that H0 = H, Ht 4v Ht+1 for allt ∈ I,andJ = limt→∞HtifI = N,orJ = Hn ifIis of the form {0,1, . . . n}. For example, K2 4 G,whereGis the infinite one-way path. Note that for all t > 0, H 4Rt,andRt4Rt+1.Hence,H 4RH.
Assume thatH is connected and nontrivial. It is straightforward to see that the graphRHconsists of a unique infinite connected component, along with infinitely many isolated nodes. We use the notation CHfor this infinite component. We use the notationCtfor the unique nontrivial connected component ofRtinRH.Hence,CH = limt→∞Ct,whereC0 ∼=H,andCt4v Ct+1for allt ∈N.Note that each v ∈ V(Ct+1)\V(Ct)is joined to some node inCt.The following theorem ties together the relation4, the graphCH,and the locally e.c. property.
Theorem 2 LetH be a fixed finite nontrivial connected graph. IfGis a countable connected locally e.c.
graph such thatH 4G,thenG∼=CH.
Proof: Suppose, without loss of generality, thatG= limt→∞Gt,whereG0∼=H,andGt4v Gt+1for allt∈N.SinceGis connected, eachv∈V(Gt+1)\V(Gt)is joined to some node inGt.
We define an isomorphismf : CH → Ginductively as follows. Letf0 : C0 → G0 be any fixed isomorphism. As the induction hypothesis, suppose that for a fixed integert≥0,there is a finite induced subgraphJtofGcontainingGtalong with an isomorphismft:Ct→Jtextendingf0.(Note that we do not claim here thatJtis of the formGsfor somes≥0.)
Enumerate all pairs(y, S)wherey is a node ofJtandS ⊆ NJt(y)is nonempty as{(yi, Si) : 1 ≤ i≤kt}.By the locally e.c. property forG,there is a nodexy1,S1∈V(G)\V(Jt)that is joined toS1and to no other nodes ofJt.A straightforward (and therefore, omitted) inductive argument supplies, for all j∈ {1. . . kt},a nodexyj,Sj in
V(G)\(V(Jt)∪ {y1, . . . , yj})
that is joined toSjand to no other node ofV(Jt)∪ {y1, . . . , yj}.Let T ={xyi,Si: 1≤i≤kt} ⊆V(G).
DefineJt+1to be the subgraph ofGinduced byV(Jt)∪T.Note thatT is an independent set of nodes.
By the definition ofCt+1, Jt+1is isomorphic toCt+1by an isomorphism extendingft.The unique node v ∈V(Gt+1)\V(Gt)is of the formxy,S for someyandS inGt.AsGtis an induced subgraph ofJt, we have thatv∈V(Jt+1).Hence,Gt+1is contained inJt+1,and the induction step is complete.
Definef : CH → Gbyf = S
t∈N
ft. The mapf is an embedding as each mapftis an isomorphism,
and it is onto by construction. Hence,f is an isomorphism. 2
Ifp∈(0,1), then the classic results of Erd˝os, R´enyi [8] imply that with probability1, an infinite limit ofG(n, p)graphs is isomorphic toR. Perhaps surprisingly, for the duplication model there is a similar result, replacingRbyCH. The next corollary follows directly by Theorems 1 and 2.
Corollary 3 Fixp∈(0,1)andH a finite nontrivial connected graph. With probability1a limit graph G= limt→∞Gtgenerated by the duplication model is isomorphic to the disjoint union ofCHand a set Iof isolated nodes, where|I|is countable.
Following Brightwell and Winkler in [4], we say that a graphJ isstiff if there is no proper induced subgraphH with the property thatH4J. By Theorem 4.4 of [4], each finite graphHcontains a unique (up to isomorphism graph) stiff induced subgraph, writtenc(H),so thatc(H)4H. We refer toc(H)as thestiff-coreofH.The stiff-core determine the isomorphism types of our limit graphs as follows.
Corollary 4 Fix finite connected nontrivial graphsH andJ. IfH 4J, thenRH ∼=RJ.In particular, RH ∼=Rc(H).
Our next result demonstrates how the graphsRHplay the role of “minimal” graphs for the locally e.c.
property.
Theorem 5 FixH a finite graph, and letGbe a locally e.c. graph. ThenRH ≤Gif and only ifH ≤G.
A vertex mappingf : G→ H is ahomomorphismifxy ∈ E(G),implies thatf(x)f(y) ∈ E(H).
We writeG →H to denote thatGadmits a homomorphism toH without reference to a specific map- ping. See the excellent book [9] for more background on graph homomorphisms. Folding gives rise to homomorphisms as follows.
Lemma 6 LetGandHbe countable graphs. IfH 4G,thenG→H.
The converse of Lemma 6 does not follow (considerH asK2andGasC6). However, the following theorem establishes an interesting connection between graph homomorphisms and the induced subgraphs ofRH.
Theorem 7 Fix na positive integer, and letGand H be finite graphs. Then G ≤ RH if and only if G→H.
Proof: For the forward direction, suppose thatG≤ RH.ThenG≤ Rtfor somet ≥0,and soG → Rt.Since H 4 Rt,by Lemma 6, we have that Rt → H.Hence, G → H by the transitivity of the homomorphism relation.
For the converse, we introduce an auxiliary graph construction. Fixf : G → H a homomorphism.
AssumeV(G)andV(H)are disjoint, and define a graphH(G, f)to have nodes V(G)∪V(H),and edges
E(G)∪E(H)∪ {xy:x∈V(G), y∈V(H),andf(x)y∈E(H)}.
We refer to the induced copy ofH inH(G, f)asH0.We proceed by induction on|V(G)|to show that H(G, f)≤RH. SinceG≤H(G, f), the proof of the converse will follow. Note that, if|V(G)|= 1, thenH(G, f)is isomorphic to the disjoint union ofH andK1. The base case follows. The induction hypothesis is that if|V(G)|=n,wheren≥1is fixed, thenH(G, f)is an induced subgraph ofRHwith H0the copy ofH att= 0.
Let|V(G)|=n+ 1,and fixx∈V(G).By induction hypothesis,H(G−x, f(G−x))is a subgraph SofRHwithH0the copy ofHatt= 0.By the definition ofH(G, f),all the neighbours ofxinH(G, f) are also neighbours of the nodef(x)inH0.Note thatN(x)⊆V(S).By the locally e.c. property ofRH, there is a nodezofRHjoined to the nodes ofN(x)inS and to no other nodes ofS. AddingztoSin RH will give an induced subgraph ofRH which is isomorphic toH(G, f), whileH0 is unchanged. This
completes the induction. 2
Theorem 7 characterizes theageofRH (that is, the class of isomorphism types of finite induced sub- graphs ofRH). In addition, it supplies us with structural information aboutRH.
Corollary 8 1. For a fixed finite graphH,all countableH-colourable graphs embed inRH. 2. The clique and chromatic numbers ofRH equal the clique and chromatic numbers ofH, respec-
tively.
As one consequence,RK2 ≤RC6andRC6 ≤RR2. Observe however, thatRK2 RC6 asRC6does not fold ontoK2.
In the duplication model, the neighbourhood of a new node will always be a subset in the neighbourhood of an existing node. While this is sufficient for the modelling of protein-protein interaction networks, for other applications it is desirable to extend the model to allow for a number of random edges. Let ρ: N → Nbe a monotone increasing function. The duplication model can be generalized by adding a second step, where at each timeta set ofρ(t)nodes is selected u.a.r., and edges from each node in this set to the new node are added. There is no unique limit in this case, but it can be shown that there is a unique minimal infinite graph which, with probability1, is contained in any infinite limit. These minimal
graphs are graphs that extend the idea behind the creation ofRH, and are calledR(n)H .They are defined as follows.
Fix a finite graphH. LetR0∼=H.Assume thatRtis defined and is finite. For each nodey ∈V(Rt), and each subsetX ⊆N(y),add a new nodezy,X joined only toX.For each subset ofY of nodes with cardinality at mostnadd a new nodezY joined only toY.This gives the graphRt+1which containsRt
as an induced subgraph. DefineR(n)H = limt→∞Rt.
Results similar to those described above hold forR(n)H for an appropriate adjacency property (namely, then-e.c. property). These results will be described more fully in the journal version of this extended abstract.
References
[1] B. Bollob´as, O. Riordan, Mathematical results on scale-free graphs, In: Handbook of graphs and networks(S. Bornholdt, H. Schuster, eds.) Wiley-VCH, Berlin (2002) 1-34.
[2] A. Bonato, A survey of models of the web graph, In:Proceedings of Combinatorial and Algorithmic Aspects of Networking, 2004.
[3] A. Bonato, J. Janssen, Infinite limits of copying models of the web graph, Internet Mathematics1 (2004) 193-213.
[4] G. Brightwell, P. Winkler, Gibbs measures and dismantlable graphs, J. Comb. Theory Ser. B 78 (2000) 141-169.
[5] P.J. Cameron, The random graph, In: Algorithms and Combinatorics 14 (R.L. Graham and J.
Neˇsetˇril, eds.), Springer Verlag, New York (1997) 333-351.
[6] P.J. Cameron, The random graph revisited, In:European Congress of MathematicsVol. I (C. Casacu- berta, R. M. Mir´o-Roig, J. Verdera and S. Xamb´o-Descamps, eds.), Birkhauser, Basel (2001) 267- 274.
[7] F.R.K. Chung, G. Dewey, D.J. Galas, L. Lu, Duplication models for biological networks,J. Comput.
Biol.10(2003) 677-688.
[8] P. Erd˝os, A. R´enyi, Asymmetric graphs,Acta Math. Acad. Sci. Hungar.14(1963) 295-315.
[9] P. Hell, J. Neˇsetˇril,Graphs and homomorphisms, Oxford University Press, Oxford, 2004.
[10] J. Kleinberg, R. Kleinberg, Isomorphism and embedding problems for infinite limits of scale-free graphs, In:Proceedings of ACM-SIAM Symposium on Discrete Algorithms, 2005.