The binomial ideal of the intersection axiom for conditional probabilities

(1)

DOI 10.1007/s10801-010-0253-5

The binomial ideal of the intersection axiom for conditional probabilities

Alex Fink

Received: 10 February 2009 / Accepted: 27 August 2010 / Published online: 17 September 2010

Abstract The binomial ideal associated with the intersection axiom of conditional probability is shown to be radical and is expressed as an intersection of toric prime ideals. This solves a problem in algebraic statistics posed by Cartwright and Eng- ström.

Keywords Conditional independence·Intersection axiom

Conditional independence constraints are a family of natural constraints on probability distributions, describing situations in which two random variables are indepen- dently distributed given knowledge of a third. Statistical models built around con- siderations of conditional independence, in particular graphical models in which the constraints are encoded in a graph on the random variables, enjoy wide applicability in determining relationships among random variables in statistics and in dealing with uncertainty in artificial intelligence.

One can take a purely combinatorial perspective on the study of conditional independence, as does Studený [10], conceiving of it as a relation on triples of subsets of a set of observables which must satisfy certain axioms. A number of elementary implications among conditional independence statements are recognized as axioms.

Among these are the semi-graphoid axioms, which are implications of conditional independence statements lacking further hypotheses, and hence are purely combina- torial statements. The intersection axiom is also often added to the collection, but unlike the semi-graphoid axioms it is not uniformly true; it is our subject here.

Formally, a conditional independence modelMis a set of probability distributions characterized by satisfying several conditional independence constraints. We

A. Fink (

⁾

Department of Mathematics, North Carolina State University, Raleigh, NC, USA e-mail:[email protected]

(2)

will work in the discrete setting, where a probability distributionp is a multi-way table of probabilities, and we follow the notational conventions in [1].

Consider the discrete conditional independence modelMgiven by {X1⊥⊥X2|X3, X1⊥⊥X3|X2}

whereX_i is a random variable taking values in the set[r_i] = {1, . . . , ri}. Throughout we assumer1≥2. Letpij k be the unknown probabilityP (X1=i, X2=j, X3=k) in a distribution from the modelM. The set of distributions in the modelMis the variety whose defining idealI_M⊆S=C[p_{ij k}]is

I_M=

p_{ij k}p_ijk−p_ijkp_ij k: i, i∈ [r1], j, j∈ [r2], k∈ [r3] +

p_{ij k}p_ij k−p_{ij k}p_ij k: i, i∈ [r₁], j∈ [r₂], k, k∈ [r₃] .

The intersection axiom is the implication whose premises are the statements ofM and whose conclusion isX₁⊥⊥(X₂, X₃). To be true, this implication requires the fur- ther hypothesis that the distributionpis in the interior of the probability simplex, i.e.

that no individual probabilitypij kis zero. It is thus a natural question to ask what can be inferred about distributionsp which may lie on the boundary of the probability simplex. In algebraic terms, we are asking for the (set-theoretic) components of the varietyV (I_M).

A problem posed by Dustin Cartwright and Alexander Engström appears in Sect. 6.6.3 of [1], giving a conjectural description of the associated primes ofI_Min terms of certain subgraphs of a complete bipartite graph. Our main theorem resolves this conjecture in the positive, and gives stronger information, namely the primary decomposition ofI_M.

In the course of this project the author computed primary decompositions ofI_M for various values ofr1,r2, andr3with the computer algebra system Singular [4,5].

Thomas Kahle has recently written dedicated Macaulay2 code [3] for binomial primary decompositions [7], in which the same computations may be carried out.

A broad generalization of this paper’s results to the class of binomial edge ideals of graphs has been obtained by Herzog, Hibi, Hreinsdóttir, Kahle, and Rauh [6]. The r=2 case ofI_Mis treated, with a different term order, in their Sect. 4.

LetKp,q be the complete bipartite graph with bipartitioned vertex set[p] [q].

Given a subgraph Gof Kr₂,r₃ with edge set Edges(G), the primePG to which it corresponds is defined to be

PG=P_G⁽⁰⁾+P_G⁽¹⁾ where

P_G⁽⁰⁾=

p_{ij k}: i∈ [r1], (j, k) /∈Edges(G) , P_G⁽¹⁾=

p_{ij k}p_ijk−p_ijkp_ij k: i, i∈ [r1]; and

j, j∈ [r₂], k, k∈ [r₃]are in the same connected component ofG .

(3)

Note thatj need not be distinct fromj, nork fromk. We will also want to refer to the individual summandsP_C⁽¹⁾ of P_G⁽¹⁾, where P_C⁽¹⁾ includes only the generators {p_{ij k}: (j, k)∈C}arising from edges in the connected componentCofG. Then

PG=P_G⁽⁰⁾+

C

P_C⁽¹⁾, (1)

whereCruns over connected components ofG.

We say that a subgraphGofK_r₂_,r₃ is admissible ifGhas vertex set[r₂] [r₃] and all connected components ofGare isomorphic to some complete bipartite graph K_p,q withp, q≥1.

Let ≺dp be the revlex term order on S over the lexicographic variable order on subscripts, with earlier subscripts more significant. Thus under ≺dp, we have p111≺dpp112≺dpp211.

Theorem 1 The primary decomposition I_M=

G

P_G (2)

holds and is an irredundant decomposition, where the union is over admissible graphsGon[r2] [r3]. We also have

in_≺_dpI_M=

G

in_≺_dpP_G.

Each in_≺_dpP_Gis squarefree, so in_≺_dpI_Mand henceI_Mare radical ideals.

In particular, the value ofr1is irrelevant to the combinatorial nature of the primary decomposition.

Corollary 2 (Conjecture, Cartwright–Engström) The set of minimal primes of the idealI_Mis

P_G: Gan admissible graph on[r₂] [r₃] .

Remark 3 This corollary amounts to the set-theoretic identity V (I_M)=

Gadmissible

V (PG).

Points(pij k)on the varietyV (PG)are characterized by the conditions thatpij k=0 for (j, k) /∈Edges(G), and that for any two edges (j, k) and (j, k)in the same connected component ofG, the vectors(p_·_{j k})and(p_·_jk)inC^r¹are proportional.

The core ideas of a proof of Corollary2are present in [1, Sect. 6.6.4]. That discus- sion focuses on the primeP_K_r

2,r3, corresponding to the locus where the conclusion of the intersection axiom is valid, but it extends without great difficulty to anyP_G.

(4)

It is noted in [1, §6.6] that the numberη(p, q)of admissible graphsGon[p][q] is given by the generating function

exp

e^x−1

e^y−1

=

p,q≥0

η(p, q)x^py^q

p!q!, (3)

which in that reference is said to follow from manipulations of Stirling numbers.

This equation (3) can also be obtained as a direct consequence of a bivariate form of the exponential formula for exponential generating functions [9, §5.1], using the observation that

e^x−1 e^y−1

=

p,q≥1

x^py^q p!q!

is the exponential generating function for complete bipartite graphs withp, q≥1, and these are the possible connected components of admissible graphs.

We now review some standard facts on binomial and toric ideals [2]. Let I be a binomial ideal in C[x₁, . . . , x_n], generated by binomials of the form x^v−x^w with v, w∈Nⁿ. There is a lattice L_I ⊆Zⁿ such that the localization I_x₁_···_x_n ⊆ C[x₁^±¹, . . . , x_n^±¹] has the form (x^v−1: v∈L_I), provided that this localization is a proper ideal, i.e.Icontains no monomial. IfφI:Zⁿ→Z^mis aZ-linear map whose kernel containsL_I, thenφ_I provides a multigrading with respect to whichI is homogeneous. (If kerφ_I=L_Iexactly thenφ_I is said to compute the minimal sufficient statistics for the statistical model associated toI.)

The condition that a multivariate Laurent polynomialf ∈C[x₁^±¹, . . . , x_n^±¹]lies inI_x₁_···_x_ncan be expressed in terms of a graphΓ, whose vertices areZⁿand whose edge set is{(v, w): x^v−x^wis a Laurent monomial multiple of a generator ofI}; in the statistical context these edges are known as moves. To wit,f is inI_x₁_···x_n if and only if, for each connected componentC ofΓ, the sum of the coefficients on all monomialsx^vwithv∈C is zero. In particularI_x₁_···_x_n is determined by the partition ofZⁿ into connected components ofΓ. Note that this partition refines the partition ofZⁿinto fibers ofφ_I, for any mapφ_I as in the last paragraph. If we are concerned with membership inI rather thanIx₁···x_n, analogues of everything in this paragraph are true if we substituteNⁿforZⁿ and use ordinary monomials rather than Laurent monomials in defining the edges. We will denote the resulting graph onNⁿbyΓ (I ), and its induced subgraph on a subsetF⊆NⁿbyΓ_F(I ).

Any prime binomial idealI is equal to the toric idealIAof a lattice point configu- rationA, whereI_Ais the kernel of the monomial map whose monomials are the points ofA. Sturmfels shows in [8] that the radicals of the monomial initial ideals of I_A are exactly the Stanley–Reisner ideals of regular triangulations ofA. The Stanley–

Reisner idealIΔ of a simplicial complexΔon a vertex setT is the monomial ideal ofC[x_t:t∈T]generated by the products of variablesx_t₁· · ·x_t_kfor which{t₁, . . . , t_k} is not a face ofΔ. Primary decompositions of Stanley–Reisner ideals are easily de- scribed:I_Δis the intersection of the ideals(x_t: t /∈F )over all facetsF ofΔ.

Sturmfels treats explicitly the 2×2 determinantal ideal of anr×smatrix, which is the toric idealI_A for A the set of vertices of the productΔ_r₋₁×Δ_s₋₁ of two simplices.

(5)

Theorem 4 (Sturmfels [8]) LetI be the ideal of 2×2 minors of anr×s matrix of indeterminatesY =(y_ij). For any term order≺, in_≺I is a squarefree monomial ideal.

Remark 5 If≺is the revlex term order on they_ij, set up analogously to≺dp, thenΔ has a pleasant description [8]: it is the staircase triangulation. The facets ofΔare the setsπ of entries of the matrixY which form maximal (“staircase”) paths throughY starting at the upper left corner, taking steps right and down, and terminating at the lower right corner. Note that staircase paths are maximal subsets of indeterminates not including bothy_ij andy_ij for anyi < iandj < j. Thus the associated primes of in_≺I are generated by minimal sets of variablesyij which include at least one of y_ij andy_ij wheneveri < iandj < j.

The significance of the idealsP_C⁽¹⁾ of connected components comes from the following fact.

Fact 6 IfG is an admissible graph, then (1) expressesPG as a sum of primes in disjoint sets of variables.

Indeed, P_G⁽⁰⁾ is the irrelevant ideal in thep_{ij k} with(j, k) /∈Edges(G), and for each connected componentCofGwiths left andtright vertices,P_C⁽¹⁾is the 2×2 determinantal ideal of ther₁×st matrix of indeterminates (p_{ij k})where the row indices arei∈ [r₁]and the column indices(j, k)∈EdgesC. The irrelevant ideal can mostly be ignored, and so this fact reduces many of our considerations to handling 2×2 determinantal ideals. (Note that the hypothesis thatGbe admissible is needed, since otherwiseP_G⁽¹⁾includes variables corresponding to nonedges ofG. We could amend the definition ofPGto salvage Fact6, but we would lose the also important fact that the summands are determinantal.)

For a first immediate application, by Theorem4the in_≺_dpP_Gare squarefree monomial ideals, implying the radicality claim of Theorem1.

For a second, we recover the primary decomposition of in_≺PG for an arbitrary admissible graphG. Let the connected components ofGbeC1, . . . , Cl. Fact6im- plies that in_≺PG=in_≺P_G⁽⁰⁾+ _iin_≺P_C⁽¹⁾

i . It then also gives us that if in_≺P_C⁽¹⁾

i =

jQ_C_i_,j are primary decompositions of the in_≺P_C⁽¹⁾

i , then in_≺PG=

j

P_G⁽⁰⁾+

l i=1

in_≺QC_i,j_i

is a primary decomposition ofP_G, where j=(j1, . . . , j_l)ranges over the Cartesian product of the index sets in

jQC,j.

Lemma 7 Let G and G be subgraphs of Kr₂,r₃. Then P_G ⊆PG if and only if Edges(G)⊆Edges(G)and every connected component of G is a union of con- nected componentsC₁, . . . , C_lofGsuch that at most oneC_icontains more than one vertex.

(6)

In particular, for any subgraphG of K_r₂_r₃ there exists an admissible graph G such thatP_G ⊆PG. Such aGcan be constructed fromGas follows: add toGone new edge incident to each of its isolated vertices, and then complete each connected component of the new graph to a bipartite complete graph.

Proof First suppose the consequence fails. Then either (1) Gcontains an edge not inG, or

(2) some connected component ofGis not contained in a single connected component ofG, or

(3) a connected component ofG contains two connected components ofG both larger than one vertex.

In case (1), let(j, k)be an edge ofGnot inG. Thenp_{1j k}∈P_G, butp_{1j k}∈/P_G, since Remark3describes points inV (PG)withp1j k=0. Case (2) implies case (1). And in case (3), let(j, k)and(j, k)be edges ofGin different connected components there but in the same connected component ofG. Thenp1j kp_2jk−p_1jkp2j k is in P_G but notP_G, again using Remark3.

Suppose instead the consequence holds. The generators ofP_G⁽⁰⁾ are inP_G, since nonedges ofGare nonedges ofG. The generators ofP_G⁽¹⁾ are also inPG. For every pair of edges(j, k),(j, k)in a connected component ofG, either all their endpoints are in the same component ofGor one of their endpoints is isolated: in the former casepij kp_ijk−p_ij kpijk is inP_G⁽¹⁾, in the latter case inP_G⁽⁰⁾. Proof of Theorem1 We first check that the right side of (2) is an irredundant primary decomposition. LetGbe an admissible graph. SinceP_Gis a sum of primes in disjoint variables by Fact6, it is prime. Irredundance of (2) is the assertion that forGandG distinct admissible graphs,P_Gis not contained inP_G. This follows directly from the definition of admissibility and Lemma7.

So we must prove the intersection statement (2). Let≺be≺dp, and writeI=I_M. It is apparent that

I⊆PG (4)

for eachG(without using admissibility). Indeed, given a generatorf ofI, without loss of generality of the formf =p_{ij k}p_ijk−p_ijkp_ij k, either both edges(j, k)and (j, k)lie in Edges(G), in which casef is a generator ofP_G⁽¹⁾, or one of these edges is not in Edges(G), in which casef ∈P_G⁽⁰⁾. Therefore the containments

in_≺I⊆in_≺

G

PG⊆

G

in_≺PG

hold, the intersections still being over admissibleG. It now suffices to show an equal- ity of Hilbert functions

H (S/in_≺I )=H

S/

G

in_≺P_G

, (5)

forcing these containments to be equalities.

(7)

The latticeL_I associated to ourI is generated by all vectors of the forme_{ij k}+ e_ijk−e_ijk −e_ij k ande_{ij k}+e_ij k −e_{ij k} −e_ij k. The map φ=φ_I :Z^r¹^r²^r³ → Z^r¹^+r²^r³ sending(uij k)to

(j,k)

u1j k, . . . ,

(j,k)

ur₁j k,

i

ui11, . . . ,

i

uir₂r₃

has kernel containingL_I. Therefore we obtain aZ^r¹⁺^r²^r³-valued multigrading onS, deg_φp_{ij k}=(e_i, e_{j k}), in whichI is homogeneous. The deg_φmultigrading refines the standard grading. We will prove that (5) holds in this stronger context ofφ-graded Hilbert functions.

Letd∈Z^r¹^+r²^r³ be the multidegree of some monomial, and write its components asdi fori∈ [r1]anddj k for(j, k)∈ [r2] × [r3]. LetG(d)be the bipartite graph with vertex set[r2] [r3]and edge set{(j, k):d_{j k}=0}. We now prove the following two claims:

Claim 1 I_d=(P_G(d))_d. Claim 2 (

Gadmissiblein_≺P_G)_d=(in_≺P_G(d))_d. These claims imply

H (in_≺I )(d)=H (I )(d)=H (P_G(d))(d)=H (in_≺P_G(d))(d)

=H

G

in_≺P_G

(d).

Thence we conclude that (5) holds, proving Theorem1.

Proof of Claim 1 Observe first that no polynomial homogeneous of multidegreedcan be divisible by anyp_{ij k}with(j, k) /∈Edges(G(d)). Accordingly we have(P_G(d))_d= (P_G(d)⁽¹⁾ )_d, and we will work withP_G(d)⁽¹⁾ .

SinceI andP_G(d)⁽¹⁾ are binomial ideals generated by differences of monomials, it will suffice to show that the two graphsΓ_F(I )andΓ_F(P_G(d)⁽¹⁾ )of moves on the fiber F=φ⁻¹(d)have the same partition into connected components. It is clear thatΓF(I ) is a subgraph ofΓF(P_G(d)⁽¹⁾ ), since containment (4) impliesId⊆(PG(d))d=(P_G(d)⁽¹⁾ )d. So given an edge ofΓ_F(P_G(d)⁽¹⁾ ), say with endpointsu, u∈F, we must show that this edge is contained in a connected component ofΓ_F(I ), i.e. thatp^u−p^u∈I. We haveu=u+e_i₀_j₀_k₀+e_i_j_k−e_i₀_j_k−e_i_j₀_k₀ for somei₀, i∈ [r₁]and some two edges(j0, k0), (j, k)of G(d)in the same component. Let(jm, km)m=0,..., be the edges of a path inG(d)between these, so that for each 0≤m < eitherjm=jm+1

ork_m=k_m+1. Assume the(j_m, k_m)are pairwise distinct. For each 1≤m≤−1, let i_mbe such thatp_i_m_j_m_k_m dividesp^u. Such ani_mmust exist becaused_j_m_k_mis positive.

Then

(8)

(p_i₀_j₀_k₀p_i_j_k−p_i₀_j_kp_i_j₀_k₀)p_i₁_j₁_k₁· · ·p_i₋₁_j₋₁_k₋₁

=

−1

m=0

pi₁j₀k₀· · ·pi_mj_m₋₁k_m₋₁g_iⁱ⁰^j^m^k^m

m+1j_m+1k_m+1pi_m₊₂j_m₊₂k_m₊₂· · ·pijk

−

−2

m=0

p_i₁_j₀_k₀· · ·p_i_m_j_m₋₁_k_m₋₁gⁱ_i^j^m^k^m

m+1j_m+1k_m+1

×p_i_m+₂_j_m+₂_k_m+₂· · ·p_i₋₁_j₋₁_k₋₁p_i₀_j_k

is inI, where to save spacegⁱ_{ij k}^j^kdenotes the generatorp_{ij k}p_ijk−p_ijkp_ij k ofI. The binomialpû−pû is a monomial multiple of this binomial, sopû−pû∈I. Proof of Claim 2 There is an admissible graph G such that P_G ⊆PG(d), by Lemma7. Then in_≺P_G⊆in_≺P_G(d), which implies

Gadmissiblein_≺P_G⊆in_≺P_G(d), one of the containments of the claim.

For the other containment, suppose pû is a monomial of multidegree d be- longing to in_≺P_G(d). By Remark5,pû is divisible by some p_ijkp_ij k for i < i in[r1]and(j, k), (j, k)two edges lying in the same connected component ofG(d) with (j, k) < (j, k) lexicographically. Now let G be any admissible graph. If G(d)is not a subset ofG, thenpû is divisible by some indeterminatep_ijk with (j, k) /∈E(G), so pû∈in_≺PG. OtherwiseG(d)⊆G, in which case the edges (j, k)and(j, k)lie in the same component ofG, sop_ijkp_ij k∈in_≺P_G, implying pû∈in_≺P_G. Therefore in_≺P_G(d)⊆

Gadmissiblein_≺P_G.

Observe finally that Claim 1 alone would suffice for the radicality in Theorem1, supposing we already knew thePGto be the associated primes. For radicality it suffices thatI contain the intersection of its minimal primes, and this follows using Claim 1 one multidegree at a time, since the multidegreedpart of this intersection is contained in(P_G(d))_d.

Acknowledgements We thank Thomas Kahle for discussions, and Bernd Sturmfels and a patient referee for careful readings and for several helpful suggestions.

Open Access This article is distributed under the terms of the Creative Commons Attribution Noncom- mercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

1. Drton, M., Sturmfels, B., Sullivant, S.: Lectures on Algebraic Statistics. Oberwolfach Seminars, vol. 39. Springer, Berlin (2009)

2. Eisenbud, D., Sturmfels, B.: Binomial ideals. Duke Math. J. 84, 1–45 (1996)

3. Grayson, D.R., Stillman, M.: Macaulay2, a software system for research in algebraic geometry. Avail- able athttp://www.math.uiuc.edu/Macaulay2/

4. Greuel, G.-M., Pfister, G., Schönemann, H.: SINGULAR3.0—a computer algebra system for polynomial computations. In: Kerber, M., Kohlhase, M. (eds.) Symbolic Computation and Automated Reasoning, The Calculemus-2000 Symposium, pp. 227–233 (2001)

(9)

5. Greuel, G.-M., Pfister, G.:primdec.lib, a SINGULAR3.0 library for computing the primary decomposition and radical of ideals (2005)

6. Herzog, J., Hibi, T., Hreinsdóttir, F., Kahle, T., Rauh, J.: Binomial edge ideals and conditional independence statements. arXiv:0909.4717

7. Kahle, T.: Binomials.m2, code for binomial primary decomposition in Macaulay2. http://

personal-homepages.mis.mpg.de/kahle/bpd/index.html

8. Sturmfels, B.: Gröbner bases of toric varieties. T¯ohoku Math. J. 43, 249–261 (1991)

9. Stanley, R.P.: Enumerative Combinatorics, vol. 2. Cambridge Studies in Advanced Mathematics, vol. 62. Cambridge University Press, Cambridge (1997)

10. Studený, M.: Probabilistic Conditional Independence Structures, Information Science and Statistics.

Springer, New York (2005)