Parameterizations and fitting of bi-directed graph models to categorical data

P ARAMETERIZA TIO NS AND FITT I NG OF BI-DIRECTED GRAP H MODELS TO CA TEGO RICAL D A T A MONIA LUPP AREL L I Dip artimento di Ec onomia P olitic a e Meto di Quantitativi, via S. F e lic e, 7, 27100, Pavia, Italy GIO V ANNI M. M AR CHET TI Dip artimento di Statistic a “G. Par enti”, viale M or gagni, 59, 5013 4, Flor enc e, Italy WICHER P . BER GSMA L ondon Scho ol of Ec onomics and Politic al Scienc e, Houghton Str e et, WC2A 2AE L ondon, UK Abstra ct. W e discuss tw o p arameterizations of mo dels for marginal indep en dencies for discrete distributions which are representable b y bi-directed graph mo dels, under the global Marko v prop erty . Such mo dels are useful data analytic t o ols esp ecially if used in com bination with other graphical mo dels. The ﬁ rst parameterization, in the saturated case, is also known as the multiv ariate logis tic tran sformation, the second is a var ian t that allo ws, in some (but not all) cases, v ariation indep endent parameters. An algorithm for maximum lik eliho o d ﬁtting is prop osed, b ased on an extension of t he Aitchison and Silvey metho d. E-mail addr esses : mlu pparelli@ eco.unipv .it , giovanni .marchett i@ds.unif i.it , W.P.Bergsm a@lse.ac. uk . Date : 3 January 2008. Key wor ds and phr ases. cov ariance graphs, complete h ierarchical parameteriza tions, connected set Mark o v prop erty , constrained maximum likelihoo d, marginal indep endence, marginal log-linear mo dels, multiv ariate logistic transformation, v ariation indepen dence. 1 2 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 1. Introduction This p ap er deals with th e parametrization and ﬁtting of a class of marginal ind ep endence mo dels for m ultiv ariate discrete d istr ibutions. These mo dels are asso ciated to a class of graph s w here the missin g edges represen t marginal indep enden ce. The graphs us ed ha ve s p ecial edges to distinguish them from u n directed graph s used to enco de conditional indep end encies. Cox & W erm uth (1993) use dashed edges and call the graphs co v ariance graphs by stressing the equiv alence b et w een a marginal pairwise in d ep enden ce and a zero co v ariance in a Gaussian distribution. Ric h ardson & Spirtes (2002) u s e instead bi-directed edges follo w ing the tradition of p ath analysts. The interpretation of the graphs in terms of indep end encies is based on the p airwise and global Mark ov p rop erties discussed originally b y Kauermann (1996) for co v ariance graphs and later deve lop ed by Ric hardson (2003 ). These are r ecalled in Section 2. Mo dels of marginal in dep end ence can b e useful in s everal con texts. F or instance, Co x & W ermuth (1993) pr esen t an example on diab etic patients concerning four cont in uous v ari- ables: X 1 , the duration of th e illness, X 2 , the quan tit y of a particular metab olic p arameter, X 3 , a score for the knowledge ab out th e illness, and X 4 , a questionnaire score mea sur- ing a patien ts’ attitude called external fatalism. The structure of th e correlation m atrix suggests f or th is d ata set the marginal in dep end en cies X 4 ⊥ ⊥ { X 1 , X 2 } and X 1 ⊥ ⊥ { X 3 , X 4 } . This marginal indep endence mod el can b e represen ted by the bi-dir ected graph in Fig - ure 1(a ), called a 4-c hain. Th e su ggested interpretation is that the d uration of illness X 1 and the external fatalism X 4 are indep enden t explanatory v ariables of the resp onses X 2 , X 3 in tw o seemingly unrelated regressions. F or f urther discussion on the in terpretation of co v ariance c hains see (W ermuth et al. , 2006). Bi-directed grap h mo dels are s ometimes useful to repr esen t marginal indep enden ce structures induced after marginalizing o ver la- ten t v ariables. The ind ep endence str u cture of the diab etes data , for example, migh t b e represent ed b y assuming an und erlying generating pro cess describ ed by a directed acyclic graph, sho wn in Fi gure 1(b), with one laten t v ariable p oin ting b oth to X 2 and X 3 . Af- ter marginalizing o v er the laten t v ariable the induced indep enden cies are exactly th ose enco ded in the b i-d ir ected graph of Figure 1(a). As another example w ith four binary v ariables, consider the data b y Copp en (1966) shown in T able 1, concerning symptoms of 362 psyc hiatric patien ts. The sym ptoms are: X 1 : stabilit y , X 2 : v alidit y , X 3 : acute 1 2 3 4 ⊗ 1 2 3 4 ⊗ ⊗ ⊗ 1 2 3 4 (a) (b) (c) Figure 1. (a ) A bi-dir e cte d gr aph, c al le d 4-chain, implying the indep en- dencies: 4 ⊥ ⊥ 12 and 1 ⊥ ⊥ 34 . Dir e cte d acyclic gr aphs inducing the same inde- p endencies after mar ginalization over the latent variables (with no des ⊗ ): (b) with one latent variable; (c) with 3 latent variables. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 3 depression and X 4 : solidit y . The c hi-squared tests of the hypotheses of marginal inde- p enden ce X 4 ⊥ ⊥ { X 1 , X 2 } and X 1 ⊥ ⊥ { X 3 , X 4 } , with p-v alues, resp ectiv ely , 0 . 32 and 0 . 14, are separately n ot signiﬁcan t and the ind ep endence m o del deﬁned b y the tw o statemen ts join tly giv es a satisfactory ﬁt with a deviance of 8 . 61 on 5 d egrees of freedom. Th us the same bi-directed graph model deﬁned b y the 4-c hain of Figure 1( a) is adequate. I n Section 6 we discuss the details of this application. In th is example, if all symptoms are treated on the same fo oting, it is less plausible that a single laten t v ariable will explain the indep end ence stru cture and more (at lea st th r ee) laten t v ariables are required to su ggest a generating pr o cess, as sh o wn in the graph of Figure 1(c). Dev eloping a parameterization for Gaussian bi-directed graph mo dels is straight forw ard since the pairwise and the global Marko v p rop erty are equiv alen t and they can b e simply fulﬁlled b y constraining to zero a subset of co v ariances. Accomplishing the same task in the discrete case is muc h more diﬃcult due to the high num b er of parameters and to the non-equiv alence of the t wo Mark o v prop erties. Recen tly , Drton & Richardson (2007 ) studied th e p arametrizatio n of bi-directed graph m o dels for d iscrete b inary distr ibutions, based on M¨ oebius parameters, b y prop osing a v ers ion of their iterativ e conditional ﬁtting algorithm for maximum lik eliho o d estimation. In this p ap er we prop ose diﬀerent parameterizatio ns, su itable for general categorica l v ariables, based on the cla ss o f m arginal log- linear mo dels of Be rgsma & Rudas (2002). One sp ecial case of this class, esp ecially useful in the con text of bi-directed graph mo d - els, is the m ultiv ariate logistic parameterizati on of Glonek & McCullagh (199 5); see also Kauermann (199 7). W e discuss a further marginal log-linea r p arametrization that can, in sp ecial cases, b e sho wn to imply v ariation indep enden t parameters. W e sho w that the marginal lo g-linear parameteriza tions suggest a cla ss of red uced mo d els deﬁned by con- straining certain higher-order log-linear parameters to zero. Then w e discuss maxim um lik eliho o d estimation of the mo dels and w e prop ose a general algorithm based on previous w orks by Aitc h ison & Silve y (1958) , Lang (1996), Bergsma (1997). The r emainder of this pap er is organized as f ollo ws. Section 2 reviews discrete bi- directed graphs and th eir Marko v prop erties. In Section 3 w e giv e the essen tial resu lts concerning the theory of marginal log -linear mo dels. Two p arameterizatio ns of bi-directed graph mo d els are giv en th en in Sectio n 4 illustrating their prop erties w ith sp ecial emphasis on v ariation indep en d ence and th e int erpretation of the parameters. In Section 5 we T able 1. Data by Copp en (1966) on symptoms of psychiatric p atients. The variables ar e X 1 : stability (1=extr overte d, 2=intr overte d), X 2 : valid- ity (1=psychasthenic, 2=ener getic), X 3 : depr ession (yes, no), X 4 : solidity (1=hysteric, 2=rigid). X 4 1 2 X 1 X 3 X 2 1 2 1 2 1 y 15 30 9 32 n 25 22 46 2 7 2 y 23 22 14 1 6 n 14 8 47 12 4 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A prop ose an algorithm for maxim um lik eliho o d ﬁtting and then, in Section 6 w e p ro vid e some examples. Finally , in Section 7 w e give a short d iscussion, with a comparison w ith the app roac h by Drton & Ric hardson (2007 ). 2. Discrete bi-directed graph model s Bi-directed graphs are essen tially un directed graphs with ed ges represented by b i- directed arro w s instead of full lines. W e review in this section the main conce pts of graph theory required to un derstand the m o dels. A bi-dir ected graph G = ( V , E ) is a p air G = ( V , E ), w here V = { 1 , . . . , d } is a set of no des, and E is a set of edges deﬁn ed by t wo-e lemen t subsets of V . T w o no d es u, v are adjac ent or neigh b ours if uv is an edge of G and in this case the edge is drawn as bi-directed, u ← → v . T w o edges are adjacen t if they ha v e an end nod e in common. A p ath from a n o de u to a no de v is a sequence of adjacen t edges connecting u and v for whic h the corresp onding sequence of no des con tains no rep etitio ns. Th e no des u and v are called the endp oints of the p ath and all the other no des are called th e inner no des . A graph G is c omplete if all its no des are pairwise adjacen t. A non-empt y graph G is called c onne cte d if any t wo of its no des are link ed b y a path in G , otherwise it is calle d disc onne cte d . If A is a subset of th e no de set V of G , the graph G A with nodes A and con taining all th e edges of G with endp oint s in A is calle d an induc e d sub gr aph . If a subgraph G A is connected (resp. disconnected, complete) w e call also A connected (resp. disconnected, complete ), in G . The set of all disconn ected s ets of the graph G will b e denoted by D , and the set of all the connected sets of G w ill b e d en oted b y C . In a graph G a c onne cte d c omp onent or s im p ly a comp onen t is a maximal connected subgraph . If a subset D of no des is disconnected then it can b e uniquely d ecomp osed into more connected comp onent s C 1 , . . . , C r , say , such that D = C 1 ∪ · · · ∪ C r . The usual notion of separation in undirected graphs can b e used also for bi-directed graphs. Thus, giv en three disj oin t subsets of no des A , B a nd C , A and B are said to b e sep ar ate d by C if for an y u in A and an y v in B all p aths from u to v ha ve at least one inner n o de in C . The cardinalit y of a set V will b e d enoted b y | V | . The set of all the subsets of V , the p ow er set, will b e d enoted b y P ( V ). W e use also the notation P 0 ( V ) for the set of all nonempty subsets of V . Let X = ( X v , v ∈ V ) b e a discrete random v ector with ea c h comp onent X v taking on v alues in the ﬁnite set I v = { 1 , . . . , b v } . The Cartesia n pro du ct I V = × v ∈ V I v , is a con tingency table, with generic elemen t i = ( i v , v ∈ V ), called a cell of the table, and with total num b er of ce lls t = |I V | . W e a ssume that X h as a join t probabilit y function p ( i ), i ∈ I V giving the probabilit y that an individual falls in cell i . Giv en a su bset M ⊆ V of the v ariables, the marginal con tingency table is I M = × v ∈ M I v with generic cell i M and the marginal probabilit y fu n ction of the r andom vec tor X M = ( X v , v ∈ M ) is p M ( i M ) = P j ∈I V | j M = i M p ( j ). A bi-directed graph G = ( V , E ) indu ces an indep endence mo d el for the discrete rand om v ector X = ( X v , v ∈ V ) by deﬁning a Mark ov prop erty , i.e. a rule for reading oﬀ the graph the indep en d ence relations. In the follo w ing w e shall u se th e shorthand notati on A ⊥ ⊥ B | C to indicate the conditional indep endence X A ⊥ ⊥ X B | X C , wh ere A , B and C are BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 5 three disjoint subsets of V . Similarly A ⊥ ⊥ B and A ⊥ ⊥ B ⊥ ⊥ C will denote the m arginal and the complete indep end ence, resp ectiv ely , of sub-ve ctors of X . Th ere are tw o Mark o v prop erties describ ing the indep endence mo del associated with a b i-d ir ected graph, which w e consider in this pap er: (a) the global Marko v prop erty of Kauermann (19 96) and (b) the connected set Mark o v prop ert y by Ric hardson (2003). The distribution of the random v ector X satisﬁes the glob al Markov pr op erty for the bi-directed graph G if for any triple of d isjoin t sets A , B a nd C , A ⊥ ⊥ B | V \ ( A ∪ B ∪ C ) whenever A is separated from B b y C in G. Instead, th e distribution of X is said to satisfy the c onne cte d set Markov pr op erty if (1) C 1 ⊥ ⊥ · · · ⊥ ⊥ C r whenev er C 1 , . . . , C r are the connected comp onents of every disconn ected s et D ∈ D . Ric hard son (2003) p ro ves that the t w o prop erties are equiv alen t; see also Drton & Ric hard - son (2007) . F ollo wing these authors w e deﬁne a discrete bi-directed graph mo d el as follo ws. Deﬁnition 2.1. A discrete bi-directed graph model asso ciate d with a bi-dir e cte d gr aph G = ( V , E ) is a family of discr ete joint pr ob ability d istributions p for the discr ete r andom ve ctor X = ( X v , v ∈ V ) , that satisﬁes the pr op erty (1) for G , i.e. such that, for every disc onne cte d set D in the gr aph, p D ( i D ) = p C 1 ( i C 1 ) × · · · × p C r ( i C r ) , wher e C 1 , . . . , C r ar e the c onne cte d c omp onents of D . If the global Mark ov p rop erty holds then for an y p air of not ad j acen t n o des, th e asso ci- ated random v ariables are marginally indep en d en t. This implication is called the p airwise Markov pr op erty and it is for discrete v ariables a necessary but not suﬃcient condition for the global Mark o v pr op ert y . This is in sharp cont rast with the family of Gaussian distributions where th e t wo pr op erties are equiv alen t. Example 1. Here and henceforth w e shall us e the short forms 34 and 12 to denote the sets { 3 , 4 } and { 1 , 2 } , and so on. T h e graph of Figure 1(a) is a c hain in 4 no des with disconnected sets D = { 13 , 1 4 , 24 , 1 34 , 124 } . Th us, D = 13 has the comp onents C 1 = 1 and C 2 = 3, while D = 134 can b e decomp osed in to C 1 = 1 and C 2 = 34. Th e p airwise Mark o v p rop erty implies 1 ⊥ ⊥ 3, 1 ⊥ ⊥ 4 and 2 ⊥ ⊥ 4, while the connected set Mark o v prop ert y implies further that 1 ⊥ ⊥ 34 and 4 ⊥ ⊥ 12. The global Mark ov pr op ert y im p lies the equiv alent set of indep en dence statemen ts 1 ⊥ ⊥ 4, 2 ⊥ ⊥ 4 | 1 and 1 ⊥ ⊥ 3 | 4. Note that the complete list of all marginal in d ep enden cies implied by a bi-directed graph mo del is der ived f rom the class D of all disconnected sets of the graph. 6 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 1 2 4 5 3 1 2 4 5 3 (a) (b) Figure 2. Two bi- dir e cte d gr aphs. The indep endencies implie d by the c on- ne cte d set Markov pr op erty (or, e quivalently, the glob al Markov pr op erty) ar e: (a) 1 ⊥ ⊥ 34 , 3 ⊥ ⊥ 15 and 5 ⊥ ⊥ 23 ; (b) 1 ⊥ ⊥ 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 345 , 12 ⊥ ⊥ 45 and 123 ⊥ ⊥ 5 . Example 2. The graph of Figure 2(a) has 7 disconnected sets and th us the asso ciated discrete bi-directed graph mo d el fulﬁlls the ind ep endencies 1 ⊥ ⊥ 3 , 1 ⊥ ⊥ 4 , 2 ⊥ ⊥ 5 , 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 34 , 5 ⊥ ⊥ 23 , 3 ⊥ ⊥ 15 that reduce to 1 ⊥ ⊥ 34, 3 ⊥ ⊥ 15 and 5 ⊥ ⊥ 23, after eliminating redu n dancies. T he discrete mo del asso ciated with th e graph of Figure 2(b) with 16 disconnected subsets satisﬁes 16 marginal indep end encies that can b e redu ced to the four statemen ts 1 ⊥ ⊥ 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 345 , 12 ⊥ ⊥ 45 , 123 ⊥ ⊥ 5 . The stronger condition requir ed by Deﬁnition 2.1 implies that in some situations not all marginal indep end ence relations are repr esen table by b i-directed graphs, as th e follo wing example sho w s. Example 3. Cons ider the data in T able 2, due to Lienert (1970). T he v ariables are 3 symptoms after LSD in tak e, recorded to b e p resen t (le v el 1) or absen t(lev el 2), and are distortio ns in aﬀectiv e b eha vior ( X 1 ), distortions in think in g ( X 2 ), and dimming of consciousness ( X 3 ). As W ermuth (1998 ) p oin ts out, the frequencies in the three marginal tables sho w that the th ree symptom pairs are close to indep endence, but at the same time the v ariables are not m u tual ind ep endent as witnessed by the strong three-factor in teractio n du e to the quite distinct conditional o d ds ratio s b et w een X 1 and X 2 at the t wo lev els of X 3 . Th us, in this case, despite three marginal ind ep endencies, a discrete bi-directed graph mo del ca n represen t just one of them, and thus must include at le ast t wo edges. P earl & W erm u th (1994 ) studied the Mark o v equiv alence b et we en bi-directed graph mo dels (actually the co v ariance graph s ) and d irected acyclic graphs mo dels, i.e. when the tw o mo dels imply exactly the same conditional in dep end ence statement s, und er their resp ectiv e global Mark ov pr op ert y (for the global Mark o v prop ert y see Lauritzen, 199 6). They sho wed that eac h bi-directed graph is alw a ys Marko v equiv alen t to a directed acyclic graph with add itional syn thetic laten t no d es, after marginalizing o v er the add itional no des, as exempliﬁed in Figure 1(b, c). Moreo ver they also giv e a Ma rk o v equiv alence result, pro ving that a bi-directed graph is equiv alen t to a dir ected acyclic graph with th e same BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 7 set of no des if and only if it con tains no 4-c hain. Th u s, th ere is n o dir ected acyclic graph whic h is Mark o v equiv alen t to the bi-directed graphs of Figures 1(a), 2(a) or 2(b). 3. Marginal log-linear p aramete riza tions Discrete bi-directed grap h mo dels ma y b e deﬁned as marginal log-linear mo d els, using complete hierarc h ical parameteriz ations as d eﬁ ned b y Bergsma & Ru das (20 02). In this section w e review the b asic concepts and we discuss the deﬁn itions of the parameters in v olv ed. Let p ( i ) > 0 b e a strictly p ositiv e pr obabilit y d istribution of a discrete r andom v ector X = ( X v , v ∈ V ) and let p M ( i M ) b e an y marginal probability d istribution of a sub - v ector X M , M ⊆ V . The marginal p robabilit y d istribution adm its a log-linear expan s ion log p M ( i M ) = X L ⊆ M λ M L ( i L ) where λ M L ( i L ) is a fu n ction deﬁning the log-linear parameters ind exed by the sub set L of M . The f unctions λ M L ( i L ) are deﬁn ed by λ M L ( i L ) = X A ⊆ L ( − 1) | L \ A | log p M ( i A , i ∗ M \ A ) where i ∗ = (1 , . . . , 1) denotes a baseline cel l of the table; see Whittak er (1990) and Lau- ritzen (1996). The function λ M L ( i L ) is zero whenev er at least one index in i L is equal to 1. Therefore, λ M L ( i L ) deﬁnes only Q v ∈ L ( b v − 1) parameters where b v is the n u m b er of catego ries of v ariable X v . Du e to the constrain t on the probabilities, that must sum to one, the parameter λ M φ = log p ( i ∗ M ) is a function of the others, and can th u s b e eliminate d. If λ M L is th e v ector con taining the parameters λ M L ( i L ), then it can b e obtained exp licitly using Kronec ker p ro ducts as follo ws. F or any sub set L of M , let C v,L b e the matrix C v,L =    ( − 1 b v − 1 I b v − 1 ) if v ∈ L (1 0 b v − 1 ) if v 6∈ L. and let π M b e the t M × 1 column v ector of the marginal cell pr ob ab ilities in lexicographic order. Then, the v ector of the log-linear parameters λ M L ( i L ) is (2) λ M L = C M L log π M , where C M L = O v ∈ M C v,L . T able 2. Data by Lienert (1970) c onc erning symptoms after LSD -intake. O R is the c onditional o dds-r atio b etwe en X 1 and X 2 given X 3 . The f r e- quencies show evidenc e of p airwise indep endenc e, but mutual dep e ndenc e. X 3 1 2 X 1 X 2 1 2 1 2 1 21 5 4 16 2 2 13 11 1 O R 27 . 3 0 . 023 8 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A F or a discussion of the tec h nique of building all log-linear parameters based on Kr onec ker pro du cts s ee W ermuth & Co x (1992). The co d ing u sed in this p ap er corresp onds to th eir indicator cod ing, and give s the parameters used for example by the pr ogram glim . A m arginal log -linear parameterization of the probabilit y distr ibution p ( i ) is obtained b y com bin ing the log-linear parameters λ M L for many d iﬀeren t marginal pr obabilit y distri- butions. The general theory is dev elop ed in Bergsma & Rudas (2002 ) and is s u mmarized b elo w. Deﬁnition 3.1. L et M = ( M 1 , . . . , M s ) b e an or der e d se quenc e of mar gins of inter est, and, for e ach M j , j = 1 , . . . , s , let L j b e the c ol le ction of sets L for which λ M j L is deﬁne d with e quation (2) . Then, ( λ M j L ) is said to b e a hier ar c hic al and c omplete mar ginal lo g- line ar p ar ameterization for p ( i ) if ( i ) the se q u enc e M 1 , . . . , M s is non-de cr e asing; ( ii ) the last mar g i n is M s = V ; ( iii ) the sets deﬁning the lo g-line ar p ar ameters in e ach mar gin ar e: L 1 = P 0 ( M 1 ) , and L j = P 0 ( M j ) \ j − 1 [ h =1 L h , for j > 1 , wher e P 0 ( M j ) denotes the c ol le ction of al l non-empty sets of M j . The parameterizatio n is called hierarc hical b ecause it is generated by a n on-decreasing sequence M , and complete b ecause it d eﬁnes all p ossible log-linear parameters terms, eac h within one and only o ne marginal table. Notice that the parameterization is asso ciated uniquely to a particular sequence M of margins. Th us, a d iﬀerent (still n on-decreasing) ordering of the sequence induces a diﬀerent parameteriza tion; see the examples in Sec- tion 4.2. The ab ov e constr u ction deﬁnes a map fr om the simplex ∆ V of the strictly p ositiv e distributions p ( i ) of the discrete random v ector X into the set Λ o f p ossible v alues for the whole v ector of the marginal log-linear parameters λ = ( λ M j L ), with j = 1 , . . . , s and L ∈ L j . The f ollo wing general r esult sho w s that a complete hierarc hical marginal log-linear mo del deﬁnes a pr op er p arameterizati on. Prop osition 1. (Be r gsma & Ruda s, 2002) The map ∆ V → Λ ⊆ R t − 1 deﬁne d by a c omplete and hier ar chic al mar ginal lo g-line ar p ar ameterization is a diﬀe omorphism. The parameters λ can b e written in matrix form λ = C log ( T π ) where π is th e t × 1 vec tor of all the cell p r obabilities in lexicographical order , T is a m × t marginalizatio n matrix su ch that T π =     π M 1 . . . π M s     and C = diag ( C M L ) is a t − 1 × m blo c k diago nal matrix, with m = P s j =1 |I M j | . F or a discussion of algorithms for computing the matrices C and T see Bartolucci et al. (2007), that generali ze the approac h by Bergsma & Rud as (2002) to log its and higher ord er eﬀects of global and cont in uation t yp e, suitable with ord inal d ata . BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 9 The log- linear parameterization and the m ultiv ariate logistic transformation represen t t wo sp ecial cases of marginal log -linear models. The standard log-linear paramete rs are generated b y M = { V } . They will b e denoted b y θ L = λ V L for L ∈ P 0 ( V ) and the whole vect or of parameters by θ . T he p arameter space coincides with R t − 1 and the map from π to θ admits an inv erse in closed form, p ro v id ed that π > 0. The m ultiv ariate logistic parameters Glonek & McCullagh (1995) are generated by M = P 0 ( V ), in an y non-decreasing order. They will b e denoted by η M = λ M M , with η represen ting the whole v ector. Thus the parameters η M corresp ond to the highest order log-linear parameters within eac h marginal table I M , for eac h n onempt y s et M ⊆ V . Th e parameter space is in general a strict su bset of R t − 1 , except wh en th e num b er of v ariables is d = 2. In general there is no closed form inv erse tr ansforming bac k η into π . Th e inv erse op eration ho wev er ma y b e accomplished using for example the iterativ e p r op ortional ﬁ tting algorithm. Th us, w hile the log-linear parameters θ are alw ays v ariation indep enden t and for any θ in R t − 1 there is a uniqu e asso ciated joint probability distrib ution π , instead the m ultiv ari- ate logistic p arameters are n ev er v ariation indep end en t, for d > 2. Thus there are v ecto rs η in R t − 1 that are not compatible with an y join t p robabilit y distribution π . T he latter assertion is also imp lied by a fu rther result b y Bergsma & Rudas (2002) whic h pro ves that the hierarc hical and complete marginal log-linear parameterization generated by a sequence M is v ariation indep enden t if and only if M satisﬁes a pr op ert y called or der e d de c omp osability . A sequence of arbitrary su bsets of V is said to b e order ed decomp osable if it has at most t w o elemen ts or if there is an ord ering M 1 , . . . , M s of its elemen ts, suc h that M i 6⊆ M j if i > j and, for k = 3 , . . . , s , the maximal elemen ts (i.e. those n ot cont ained in an y other sets) of { M 1 , . . . , M k } form a decomp osable set. F or further details and exam- ples ab out ordered decomp osabilit y see Ru das & Bergsma (2004). More p rop erties of the t wo parameterizations θ and η , connected to graphical mo d els, will describ ed in the next Section 4. 4. P ara meteriza tions of discrete bi-directed graph models W e suggest n o w t w o d iﬀerent marginal log-linear parameterizations of discrete bi- directed graph mo dels, and we compare adv an tages an d sh ortcomings. 4.1. Multiv ariat e logistic parameterization. It is kn o wn that the complete indep en- dence of tw o sub-v ectors X A , X B of the rand om vect or X is equ iv alent to a set of zero restrictions on multiv ariate logistic parameters. Lemma 1. (Kauer m ann (199 7), Lemma 1) . If { A, B } is a p artition of V and η = ( η M ) , M ∈ P 0 ( V ) is the multivariate lo gistic p ar ameterization, then A ⊥ ⊥ B ⇐ ⇒ η M = 0 for al l M ∈ Q wher e Q = { M ⊆ A ∪ B : M ∩ A 6 = ∅ , M ∩ B 6 = ∅} . W e generalize this result to complete indep end ence of more than tw o random vecto rs. Giv en a partition { C 1 , . . . , C r } of a set D ⊆ V , w e deﬁ ne Q ( C 1 , . . . , C r ) = P ( S r i =1 C k ) \ S r i =1 P ( C k ) . 10 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A This is th e set of all s u bsets of D not completely cont ained in a single class, i.e. con taining elemen ts coming from at least tw o classes of th e partition. With this notation, the set Q of Lemma 1 ma y b e d en oted b y Q ( A, B ). Then we h a ve the follo wing result. Prop osition 2. L et X = ( X v ) , v ∈ V , b e the discr ete r andom ve c tor with multivariate lo gistic p ar ameterization η = ( η M ) , M ∈ P 0 ( V ) . If D ⊆ V is p artitione d into the classes { C 1 , . . . , C r } then C 1 ⊥ ⊥ . . . ⊥ ⊥ C r ⇐ ⇒ for al l M ∈ Q ( C 1 , . . . , C r ) : η M = 0 . Pr o of. First, use the sh orthand notat ions Q to denote th e set Q ( C 1 , . . . , C r ) and Q i to denote the set Q ( C i , C − i ), i = 1 , . . . , r , where C − i = D \ C i . In fact, since Q i ⊆ Q , th en S r i =1 Q i ⊆ Q . Con versely , for any M ∈ Q there is alwa ys a class C i suc h that C i M , and hence, b y deﬁnition, M ∈ Q i . Hence, f or every M ∈ Q , M ∈ S r i =1 Q i and th us Q ⊆ S r i =1 Q i . Th en, the complete indep endence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is equiv alent to C i ⊥ ⊥ C − i for all i = 1 , . . . , r . By Lemma 1, applied to the sub-v ector X D , eac h ind ep endence C i ⊥ ⊥ C − i is equiv alen t to the restriction η M = 0 for M ∈ Q i and the parameters η M are iden tical to the corresp onding multiv ariate logistic p arameters for th e full r andom v ector X V . Thus, the complete indep end ence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is equiv alen t to η M = 0 for M ∈ Q i , i = 1 , . . . , r , i.e. for M ∈ S r i =1 Q i = Q .  Prop osition 2 imp lies that a statemen t of complete indep endence C 1 ⊥ ⊥ . . . ⊥ ⊥ C r is equiv- alen t to a set of z ero co nstrain ts o n the m ultiv ariate log istic parameters. T he follo w ing result exp lains h ow the constrain ts m ust b e c hosen in order to satisfy all the indep end en cies required by the Deﬁnition 2.1 of a bi-directed graph mo del. Prop osition 3. Given a bi- dir e cte d gr aph G = ( V , E ) , the discr ete bi - dir e cte d gr aph mo del asso ciate d with G is deﬁne d by the set of strictly p ositive discr ete pr ob ability distributions with multivariate lo gistic p ar ameters η = ( η M ) , M ∈ P 0 ( V ) , such that η M = 0 for every M ∈ D , wher e D is the set of al l disc onne cte d sets of no des in the gr aph G . Pr o of. Giv en a set D ∈ D , denote its connected comp onen ts b y { C 1 , . . . C r } and by Q D the set Q ( C 1 , . . . , C r ). First, w e pro ve th at D = S D ∈D Q D . I n fact, for any D ∈ D , Q D ⊆ D b ecause it is a class of disconnected sub sets of D . Th u s , S D ∈D Q D ⊆ D . Conv ersely , if D ∈ D , then D ∈ Q D and thus D ⊆ S D ∈D Q D . By Deﬁnition 2.1, the indep en- dence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is implied for eac h disconnected set D with co nnected comp onent s C 1 , . . . , C r . By Prop ositio n 2, this is equiv alen t to the zero restrictions on the m ultiv ariate logistic p arameters η M = 0 , for all M ∈ Q D , D ∈ D i.e. f or all M ∈ S D ∈D Q D = D .  A consequence of Prop osition 3 is that all p ossible discrete bi-directed graphical mo dels can b e id en tiﬁed within the m u ltiv ariate logistic parametrization under the zero constraints asso ciated w ith the d isconnected sets. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 11 T able 3. Comp arison b etwe en two p ar ameterization of the discr ete chor d- less 4-chain mo del of Figu r e 1(a): ( η ) with bi-dir e cte d e dges; ( θ ) with undir e cte d e dges. T erms 1 2 3 4 12 13 14 23 24 3 4 123 124 134 234 1 234 η η 1 η 2 η 3 η 4 η 12 0 0 η 23 0 η 34 η 123 0 0 η 234 η 1234 θ θ 1 θ 2 θ 3 θ 4 θ 12 0 0 θ 23 0 θ 34 0 0 0 0 0 Example 4. T he discrete mo del asso ciated with th e c hordless 4- c hain of Figure 1(a) is deﬁned b y the m ultiv ariate logi stic parameters sh o w n in T able 3, ﬁrst ro w. There are 5 zero constr aints on the highest-order log-linear parameters of the tables 13, 14, 24, 124 134. There are three nonzero t w o-factor marginal log-linear p arameter η ij asso ciated with the edges of the graph th at may b e interpreted as sets of marginal asso ciation coeﬃcients b et w een the in v olv ed v ariables, based on the c h osen contrasts. C onsider now the reduced mo del resulting after dropping the edge 2 ↔ 3 and implying the in dep end en ce 12 ⊥ ⊥ 34. This mo del ca n b e obtained, within the same p arameterizati on, by the additional zero constrain ts on η 23 , η 123 , η 234 and η 1234 . While the parameters are in general not v ariation indep enden t, they satisfy the upw ard compatibilit y prop erty , b ecause they ha ve the same meaning across d iﬀeren t marginal distributions. Using this prop erty , we can p ro v e the follo wing resu lt concerning the eﬀect of marginalization ov er a subset A of the v ariables. Let G A = ( A, E A ) b e t he sub graph induced b y A , and let D A b e the set of all disconnected sets of G A . Prop osition 4. If a discr ete pr ob ability distribution p ( i ) for i ∈ I V satisﬁes a bi-dir e cte d gr aph mo del deﬁne d by the gr aph G = ( V , E ) then the ma r ginal distribution p A ( i A ) over A ⊆ V satisﬁes the bi-dir e cte d gr aph mo del deﬁne d b y G A = ( A, E A ) and its multivaria te lo gistic p ar ameters a r e η = ( η M ) , M ∈ P 0 ( A ) with c onstr aints η M = 0 , for M ∈ D A . Pr o of. After m arginalizati on o v er A , the m ultiv ariate logi stic parameters asso ciated with p A ( i A ), b y the prop ert y of upw ard compatibilit y , are ( η M , M ∈ P 0 ( A )). Some of these parameters are zero b y the constrain ts implied by the orig inal bi-directed graph mo del, i.e. η M = 0 , for M ∈ D ∩ P 0 ( A ). The r esult is p ro ved by sh o win g that D ∩ P 0 ( A ) = D A . First, we n ote that if D ⊆ A ⊆ V , then the graph G D = ( D , E D ) with edges E D = ( D × D ) ∩ E = ( D × D ) ∩ E A is a sub graph of b oth G A and G . Th us , if D ⊆ A and D ∈ D then the in duced sub graph G D is disconnected and b eing also a sub graph of G A then D is also a d isconn ected set of G A . Thus D ∩ P 0 ( A ) ⊆ D A . Con v ersely , if D is a disconnected set of G A , th en the subgraph G D is disconnected, and b eing a subgraph of G , then D is also a d isconn ected s et of G . Th us D A ⊆ D ∩ P ( A ), and the r esu lt follo ws.  Discrete bi-directed graph mo d els in the multiv ariate logistic parameterizatio n can b e compared with discrete log-linea r graphical mo dels r epresen ted b y undirected graphs with the same ske leton (i.e. with the same set E ). T o f acilitate the comparison we state the follo wing w ell-kno wn result, follo wing from the Hammers ley and Cliﬀord theorem, (see Lauritzen, 1996, p. 36), whic h is th e undirected graph mo d el coun terpart of Prop osition 3. 12 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Prop osition 5. Given an u ndir e cte d gr aph G = ( V , E ) , a discr ete gr aphic al lo g-line ar mo del asso ciate d with G is deﬁne d by the set of strictly p ositive discr ete pr ob ability distri- butions with lo g-line ar p ar ameters θ = ( θ L , L ∈ P 0 ( V )) , such that θ L = 0 for every L ∈ N , wher e N is the set of al l inc omplete subsets of no des in the gr aph G . The set D of all disconnected s ets of a graph G is included in the set N of the incomplete sets, and therefore the n u m b er of zero restrictio ns of the un directed graph mo d els is alwa ys higher th an the num b er of zero restrictions of the bi-dir ected graph mo dels with the same sk eleto n, (see Drton & Ric hard son, 2007 ). Example 5. A d iscrete undirected graph mo d el for the 4-c hain implies the indep en dencies 12 ⊥ ⊥ 4 | 3 and 1 ⊥ ⊥ 34 | 2 and is deﬁned by zero co nstrain ts on 8 log -linear parameters θ L , sho wn in T able 3, second ro w. Also, Prop osition 5 implies that in the d iscrete undirected graph model the general hierarc h y principle h olds, i.e. if a particular log-linear term is zero then all h igher terms con taining the same set of subscripts are also set to zero. On the con trary , b y Pr op osition 3, in the m ultiv ariate logisti c parameterizatio n of the bi-dir ected graph mo del the hierarch y p rinciple is violated b ecause a su p erset of a disconnected set ma y b e connected. Thus, for instance in the example sho wn in T ab le 3 there are zero pairwise asso ciations, lik e η 13 = 0 , but nonzero higher order log-linear parameters like η 123 6 = 0 and η 1234 6 = 0 . 4.2. The disconnected sets parameterization. W e discuss no w another marginal log- linear p arameterization that can represent the in d ep enden ce constr aints imp lied by any discrete bi-d ir ected graph mo d el, but in v olving only those marginal tables wh ic h are needed. This parameterization d eﬁnes the log-linear parameters w ith in the margins asso- ciated with the disconnected sets of the graph deﬁn in g the mo del. S p eciﬁcally , giv en a dis- crete graph mo del with a graph G , w e arbitrarily order the d isconnected s ets of the graph to yield a non-decreasing sequence ( D 1 , . . . , D s ) such that D k 6⊇ D k +1 for k = 1 , . . . , s − 1. Then, the disc onne cte d set p ar ameterization of the discrete bi-directed graph mod el as- so ciated with G , is the hierarc hical and complete marginal log -linear parameterization λ = ( λ M j L ) generated, follo wing Deﬁnition 3.1, b y the sequen ce of margins (3) M G =    ( D 1 , . . . , D s ) if D s = V ( D 1 , . . . , D s , V ) otherwise. This parameterization conta ins by deﬁn ition the log-linear parameters λ D D = η D for every disconnected set D a nd thus can deﬁne the ind ep endence mo d el b y the same constrain ts of Pr op osition 3. Prop osition 6. Given a bi- dir e cte d gr aph G = ( V , E ) , the discr ete bi - dir e cte d gr aph mo del asso ciate d with G is deﬁne d by the set of strictly p ositive discr ete pr ob ability distributions with a disc onne cte d set p ar ameterization ( λ M j L ) , su ch that λ M j M j = 0 for every M j ∈ D , BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 13 T able 4. Comp arison of thr e e p ar ameterizations for the bi-dir e cte d gr aph mo del G of Figur e 1(a). O ne-factor lo g- line ar p ar ameters ar e omitte d. The c olumns of p ar ameters to b e c onstr aine d to zer o have a b oldfac e d lab el. T erms 12 13 1 4 23 24 34 123 124 134 234 1234 η η 12 η 13 η 14 η 23 η 24 η 34 η 123 η 124 η 134 η 234 η 1234 M G λ 124 12 λ 13 13 λ 14 14 λ 1234 23 λ 24 24 λ 134 34 λ 1234 123 λ 124 124 λ 134 134 λ 1234 234 λ 1234 1234 M ′ G λ 124 12 λ 134 13 λ 14 14 λ 1234 23 λ 124 24 λ 134 34 λ 1234 123 λ 124 124 λ 134 134 λ 1234 234 λ 1234 1234 wher e D is the class of al l disc onne cte d sets for G . Mor e over, the c onstr aints ar e indep en- dent of the or dering chosen to deﬁne M G . Pr o of. The d isconn ected set parameterization deﬁned b y the sequence (3), cont ains the parameters λ D L , with D ∈ D . By Deﬁnition 3.1, L j , j = 1 , . . . , s alw ays con tains the set D itself. This happ ens whatev er ordering is us ed to deﬁne M G . Thus the p arameterization alw ays includ es λ D D = η D , for ev ery D ∈ D and it is p ossible to imp ose the c onstrain ts η D = 0 for every D ∈ D and the result follo ws b y Prop osition 3.  While the constrained parameters deﬁ ning the bi-directed graph mo d el are actually the same as th e multiv ariate logistic parameterizatio n, the other u nconstrained log-linear parameters are deﬁn ed in larger m arginal tables, and thus ha ve a diﬀerent interpretatio n. An imp ortant diﬀerence is that the disconnected s et parameterizatio n is tied to the sp eciﬁc graph G deﬁnin g the mo del. Th is implies that it is not p ossible to deﬁne ev ery bi-directed graph mod el within the same d isconnected set parameteriza tion. A diﬀeren t model G implies a diﬀerent sequence M G of disconnected sets and thus a diﬀeren t list of log-linear parameters. Example 6. F or th e c hordless 4- c hain graph of Figure 1(a), there are sev eral p ossible orderings of t he 5 disconn ected sets D = { 1 3 , 14 , 2 4 , 134 , 124 } . The discrete bi-directe d graph mo d el is d eﬁned by c ho osing for example M G = (13 , 14 , 24 , 13 4 , 124 , 1234) , and b y constraining the marginal log-linear parameters λ D D = 0 for D ∈ D . The uncon- strained parameters diﬀer from the multiv ariate logistic ones. F or example the t w o-fact or log-linear parameters b etw een X 1 and X 2 , λ 124 12 , are deﬁned within the marginal table 124 instead o f th e marginal table 12. A detailed co mparison b etw een the parameters is rep orted in the ﬁ r st tw o ro ws of th e T able 3. The pr evious example shows that w e ca n collect the log-linear parameters in to a reduced n um b er of marginal ta bles. An alternativ e selection of marginal tables c ould b e c hosen in order to fulﬁll the conditional indep end encies imp lied by the glo bal Mark o v p rop erty . W e will describ e th e metho d in the sp ecial case of the c hordless 4-c hain graph. It is conjectured that a general v ariation indep endent parameterization do es not exists for all bi-directed graphs, but the deﬁ nition of a sub-class admitting suc h a parameterization is still an op en pr oblem. 14 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Example 7. In Example 1 we s tated that, f or th e bi-directed 4-c h ain graph of Fig- ure 1(a), the global Marko v prop erty implies the cond itional indep end encies 1 ⊥ ⊥ 4, 2 ⊥ ⊥ 4 | 1 and 1 ⊥ ⊥ 3 | 4. Thus, the relev ant margins can b e collecte d in the sequence M ′ G = (14 , 13 4 , 124 , 1234) where the ﬁrst three allo w th e deﬁ nition of the conditional indep endencies and the last one serv es as completion of the parameterization. T he complete hierarchical parameterization generated b y M ′ G is sligh tly d iﬀeren t from that generated by M G , see T able 4, third ro w , b ut with the 5 zero constrain ts on the h igher lev el log-linear parameters within eac h margin, we obtain the required indep endencies 1 ⊥ ⊥ 4 ⇐ ⇒ λ 14 14 = 0 2 ⊥ ⊥ 4 | 1 ⇐ ⇒    λ 124 24 = 0 λ 124 124 = 0 1 ⊥ ⊥ 3 | 4 ⇐ ⇒    λ 134 13 = 0 λ 134 134 = 0 . Note th at these indep endencies can also b e represen ted by a c hain graph with tw o com- p onents, { 1 , 4 } and { 2 , 3 } , under the alternativ e Ma rk o v prop ert y , (see Ander s son et al. , 2001) . The associated discrete mo del is in terpreted as a s y s tem of seemingly unrelated regressions, with t wo joint resp onses X 2 and X 3 . In this con text the asso ciations of in- terest are the eﬀe ct parameters b et w een every resp onse and ea c h explanatory v ariable conditional on the remaining explanatory v ariable, i.e. λ 124 12 , λ 124 24 , λ 134 13 and λ 134 34 , and the marginal asso ciation p arameters b et ween the explanatory v ariables, λ 14 14 . By relaxing the constr aint λ 14 14 = 0 w e obtain a discrete chain graph mod el w ith t w o complete c hain comp onent s, u nder th e alternativ e Mark o v prop erty . In the comparison b et we en diﬀerent parameteriza tions also the prop ert y o f v ariation indep end ence m ay b e relev ant. F ollo w ing Bergsma & Rudas (2002), giv en a discrete bi- directed graph mo del, th er e is a v ariation indep end en t parameterization if there is at least a sequence M G whic h is ord ered decomp osable. Th is prop ert y is qu ite relev an t b ecause the lac k of v ariation indep endence m a y mak e the separate inte rpretation of the parameters misleading. Example 8. In the p revious example b oth the p arameterizati ons b ased on M G and M ′ G are v ariation indep endent (unlike the multiv ariate logistic parameterization) b ecause the sequences of margins are b oth ord ered decomp osable. Consid er instead the bi-directed graph in Figur e 2(a). Tw o p ossib le disconnected set parameterizations of the discrete mo del ma y b e based for example on M G = (13 , 1 4 , 25 , 35 , 134 , 135 , 235 , 12345) , M ′ G = (13 , 3 5 , 135 , 1 4 , 25 , 134 , 235 , 12345) . with the constrain ts λ D D = 0 for any disconn ected set D . In this case w e can v erify th at only the sequence M ′ G is ord ered d ecomp osable and th us implies v ariation indep endent parameters. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 15 5. Maximum likelihood est ima tion of discrete bi-directed graph mod els W e study no w the maxim um lik eliho od estimation of the discrete bi-directed graph mo dels u nder any of the parameterizatio ns pr eviously discuss ed . Assuming a m u ltinomial sampling sc heme with s ample size N , eac h in dividual falls in a cell i of the giv en con tin- gency table I V with probabilit y p ( i ) > 0. Let n ( i ) b e the cell count and n = ( n ( i ) , i ∈ I V ), b e a t × 1 v ector. Thus, n has a m u ltinomial distribution with parameters N and π . If µ = N π > 0 is the exp ected v alue of n and ω = log µ , then f or an y app ropriate marginal log-linear parameteriza tion λ w e ha ve λ = C log( T π ) = C log( T exp( ω )) b ecause the con trasts of marginal p robabilities are equal to th e contrast s of exp ected count s. Giv en a discrete bi-directed g raph mo del d eﬁned by the graph G = ( V , E ), if λ is d eﬁned either b y the m u ltiv ariate logistic parameterization or b y the disconnected set p arameterization, w e can al w a ys split λ in t w o comp onent s λ D and λ C indexed b y the disconnected sets D and by the c onnected sets C of the graph, r esp ectiv ely . If C D is a sub-matrix of the con trast m atrix C , obtained by selecting th e ro ws asso ciated with the disconnected sets of the graph G , λ D = C D log( T exp( ω )) = h ( ω ) where C D has dimensions q × v with q = P D ∈D Q v ∈ D ( b v − 1). Th us, the kernel of the log-lik eliho o d function of the discrete bi-directed graph mo del is deﬁned by (4) l ( ω ; n ) = n T ω − 1 T exp( ω ) , ω ∈ Ω B G , with Ω B G = { ω ∈ R t : h ( ω ) = 0 , 1 T exp( ω ) = N } . Note th at (4) deﬁnes a curv ed exp onen tial family mod el as the set Ω B G is a smo oth manifold in the sp ace R t of the canonical parameters µ . Maxim um like liho o d estimation is a constrained optimization problem and the m aximum lik eliho o d estimate is a saddle p oint of the Lagrangian log-lik eliho o d ℓ ( ω , τ ) = n T ω − 1 T exp( ω ) + τ T h ( ω ) where τ is a q × 1 v ector of unknown Lagrange multiplie rs. T o solv e the equations we prop ose an iterativ e procedu re inspired b y Aitc h ison & Silvey (19 58), Lang (1 996) and Bergsma (199 7). Deﬁne ﬁrst ξ = ω τ ! , f ( ξ ) = ∂ ℓ ∂ ξ = f ω f τ ! F ( ξ ) = − E  ∂ 2 ℓ ∂ ξ ∂ ξ T  = F ω ω F ω τ · F τ τ ! , where the dot is a shortcut to denote a symmetric sub-matrix. Diﬀeren tiating the La- grangian with r esp ect to ω and τ and equating the result to zero we obtain (5) f ω f τ ! = e + H τ h ( ω ) ! = 0 where e = ∂ l /∂ ω = n − µ , H = ∂ h /∂ ω T = D µ T T D − 1 T µ C T D and D T µ and D µ are diagonal matrices, w ith n onzero elemen ts T µ and µ , r esp ectiv ely . Let ˆ ω b e a lo cal maxim u m of the lik elihoo d sub ject to the constrain t h ( ω ) = 0 . A classical resu lt (Bertsek as, 1982 ) is that if H is of full column rank at ˆ ω , there is a uniqu e ˆ τ suc h that ℓ ( ˆ ω , ˆ τ ) = 0 . In the sequel, it is assum ed th at the maxim um lik elihoo d estimate 16 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A ˆ ω is a solution to the equation (5). Note that the constrain t 1 T µ = 1 T n is automatically satisﬁed as it can b e v eriﬁed that H T 1 = 0 and thus from (5) it f ollo ws that 1 T e = 0 . Aitc hison and S ilv ey pr op ose a Fisher score lik e up dating fun ction (6) ξ ( k +1) = u ( ξ ( k ) ) , with u ( ξ ) = ξ + F − 1 ( ξ ) f ( ξ ) , yielding the estimate ξ ( k +1) at cycle k + 1 from that at cycle k . As the a lgorithm does not alwa ys con verge w h en starting estimates are not close enough to ˆ ω , it is n ecessary to in tro duce a step size in to the up dating equ ation. The standard approac h to choosing a step size in optimization p roblems is to use a v alue for which the ob jectiv e fun ction to b e maximized increases. Ho wev er, since in in this case w e are lo oking for a saddle p oint of the Lagrangian lik elihoo d ℓ , w e need to adjus t th e standard strategy . First, the matrix F has a sp ecia l stru cture with F ω ω = D µ , F ω τ = − H and F τ τ = 0 . T h us, indicating the sub-matrices of F − 1 b y su p erscrip ts, we hav e F τ ω F ω τ = I and F ω ω F ω τ = 0 . Thus the up d ating function u ( ξ ) of (6) can b e rewritten as follo ws u ω ( ω ) = ω + F ω ω e + F ω τ h ( ω ) , u τ ( ω ) = F τ ω e + F τ τ h ( ω ) , neither of w hic h is a function of τ . As the up dating of the Lagrange multipliers do es non dep end on the estimation for τ at pr evious step, the algorithm essen tially searches in the space of ω . Hence, inserting a s tep size is only r equired f or up d ating ω and we prop ose, follo wing Bergsma (1997) to u se the follo wing b asic up dating equati ons w ith an add ed step s ize, 0 < step ( k ) ≤ 1: ω ( k +1) = ω ( k ) + step ( k ) { F ω ω ( k ) e ( k ) + F ω τ ( k ) h ( ω ( k ) ) } , where e ( k ) = n − ˆ µ ( k ) and wh ere F ω ω ( k ) and F ω τ ( k ) are tw o sectio ns of ˆ F − 1 at cyc le k . W e c hose the step size b y a simp le step h alving criterion, b ut more sophisticated step size rules co uld also b e considered. A discussion on the c hoice of the step size ma y b e found in Bergsma (1997). Note that the algorithm’s up d ates tak e place in th e rectangular space R t of ω rather than the not n ecessarily rectangular space Λ of the marginal log- linear parameters wh ic h ma y not b e v ariation indep endent. The algorithm con verges if it is started f r om suitable initial estimat es of ω and τ . While usually a zero v ector is a g o o d c hoice for τ , w e found emp irically th at the n umb er o f iterations to con v ergence can b e reduced s ubstant ially b y u sing as a starting v alue for ω an approximat e maxim um lik eliho o d estimate based on results b y Co x & W erm uth (199 0) and Ro d dam (2004). At con vergence , w e obtain the m aximum lik eliho o d estimates ˆ µ = exp( ˆ ω ) and ˆ π = N − 1 ˆ µ and the asymp totic co v ariance matrices co v ( ˆ ω ) = ˆ F ω ω , co v ( ˆ λ ) = H sat ˆ F ω ω H T sat , with H sat = D ˆ µ T T D − 1 T ˆ µ C T . 6. Anal ysis of some examp les The examples of this s ection illustrate b oth the p arameterizatio ns and the ﬁ tting of marginal ind ep endence mo dels. It is r are that a pure marginal indep enden ce mod el is useful in isola tion and th u s usually it is interpreted in combinatio n with other graphical mo dels. Ho we v er, th e problem of simultaneous testing of multiple m arginal in dep end en cies in a ge neral co n tingency table is often p resen t in applica tions and it can b e carried o ut BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 17 T able 5. Par ameters estimates of the 4-chain mo del f or the data on symp- toms of psychiatric p atients under the multivariate lo gistic and the disc on- ne cte d set p ar ameterizations. The ﬁt is χ 2 5 = 8 . 61 . Columns (1) and (2) ar e studentize d estimates. Multiv a riate logistic param. Disconnected set param. Margin ˆ η (1) Margin In tera ction ˆ λ (2) 1 − 0 . 28 − 2 . 62 13 1 − 0 . 28 − 2 . 62 2 − 0 . 13 − 1 . 23 3 0 . 21 1 . 95 3 0 . 21 1 . 95 13 0 . 00 4 0 . 24 2 . 31 14 4 0 . 24 2 . 31 12 − 0 . 7 2 − 3 . 47 14 0 . 00 13 0 . 00 24 2 − 0 . 1 3 − 1 . 23 14 0 . 00 24 0 . 00 23 − 1 . 1 2 − 5 . 32 124 12 − 0 . 72 − 3 . 47 24 0 . 00 124 0 . 00 34 0 . 79 3 . 80 134 34 0 . 79 3 . 80 123 0 . 16 0 . 36 13 4 0 . 00 124 0 . 00 123 4 23 − 0 . 78 − 1 . 80 134 0 . 00 123 0 . 14 0 . 20 234 − 0 . 90 − 2 . 03 234 − 1 . 02 − 1 . 63 1234 0 . 15 0 . 16 1234 0 . 15 0 . 16 with the technique discu s sed in this pap er. All the compu tations were programmed in the R language (R Dev elopmen t Core T eam, 2007). Example 9. T h e 4-c hain marginal indep endence mo d el w as ﬁ tted to the data on symp- toms of psychiat ric patien ts of T able 1 with the algo rithm of Sectio n 5. After 22 iterations, the algorithm leads to a c hi-squared goo d ness of ﬁt of 8.61 on 5 degrees of freedom. By comparison, the b est graphical log -linear mo d el h as generators [12][ 234] with a deviance of 8 . 4 on 6 degrees of freedom. Thus, b oth mo dels pro vide adequate int eresting in ter- pretations of the data. T able 5 summarizes the estimat es of the 4-c h ain graph mo del, sho wing the parameter estimates and the student ized estimates under the m ultiv ariate logistic and the disconnected set parameterizations. In the m ultiv ariate logistic parame- terizatio n the tw o-factor parameters hav e the simple inte rpretation of marginal association co eﬃcien ts. It must b e k ept in m ind that they measur e just the strength of marginal as- so ciation b et ween pairs of adjacen t v ariables in the graph, but that the mo del includes higher order log-linear parameters whic h are not visible fr om the graph. F or instance, b oth ˆ η 23 = − 1 . 12 and ˆ η 234 = − 0 . 90 are measures of asso ciation for v ariables X 2 and X 3 . I n general, for any connected subgraph , all h igher order log-linear parameters are exp ected. As explained in Section 4, the in terp retation of the parameters necessarily d ep ends on the c h osen p arameterizati on. F or instance, ˆ η 23 = − 1 . 12 and λ 1234 23 = − 0 . 78 are a marginal asso ciation measur e and a conditional asso ciation mea sure resp ectiv ely . The four-facto r log-linear parameter is not signiﬁcan t, and a simpler r ed uced mo d el with the additional 18 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A T able 6. Data fr om U.S. Gener al So cial Survey. F 1 2 3 S C G A J 1 2 3 1 2 3 1 2 3 m 1 1 1 410 241 80 691 556 1 87 192 14 8 84 2 71 31 9 109 64 3 4 27 26 15 2 1 181 128 42 307 284 82 84 93 41 2 41 17 5 61 35 20 18 13 5 2 1 1 96 77 29 163 151 7 6 5 8 55 27 2 34 18 7 58 36 15 17 13 6 2 1 29 37 4 55 54 31 22 26 17 2 16 6 6 1 6 1 6 7 10 7 2 f 1 1 1 552 353 1 45 8 99 793 26 5 180 162 94 2 98 60 15 186 122 47 40 23 14 2 1 133 74 33 219 1 64 66 3 6 47 24 2 25 15 1 54 40 13 14 6 4 2 1 1 228 153 60 356 343 166 95 80 41 2 75 45 12 125 116 34 25 20 12 2 1 41 25 13 64 56 22 15 14 11 2 17 6 1 1 9 1 8 6 3 3 2 zero constrain t on this p arameter, has an adequate chi-squared go o d ness of ﬁt of 8 . 63 on 6 degrees of freedom. The follo wing example concerns a larger con tingency table including t w o ordinal v ari- ables with thr ee lev els. In the analysis these v ariables are treated as nominal v ariables using the b aseline con trasts (2). Although the nature of the v ariables could b e han d led b y usin g other more approp r iate con trasts, as explained in Bartolucci et al. (2007), th e ﬁ t of the marginal ind ep endence m o del is nevertheless inv arian t. Example 10. T able 6 summ arizes observ ations for 13067 ind ividuals on 6 v ariables ob- tained from as man y questions tak en from the U.S. General So cial Sur v ey (Da vis et al. , 2007) dur ing the yea rs 19 72-20 06. The v ariables are rep orted b elo w with the original name in the GSS Co d eb o ok: C cappu n : do you fav or or opp ose d eath p enalt y for p ersons con victed of murder? (1=fa vor, 2=opp ose) F con finan : conﬁdence in banks and ﬁnancial institutions (1= a grea t deal, 2= only some, 3= hard ly any) G gunla w : would y ou fav or or opp ose a la w whic h would require a p er s on to obtain a p olice p ermit b efore he or sh e could buy a gun? (1=fa v or, 2=opp ose) J sa t job : h o w satisﬁed are y ou with the w ork y ou do? (1 = v ery satisﬁed, 2= mo derately satisﬁed, 3 = a little dissatisﬁed, 4= v ery d issatisﬁed). Categories 3 and 4 of sa tjob were mer ged together. S sex : Gender (f,m) A abrape : do y ou think it should b e p ossible for a pregnant w oman to obtain legal ab ortion if she b ecame p regnan t as a result of rap e? (1= y es, 2 = no) BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 19 In d ata s ets of this kind there are a large n u m b er of miss ing v alues and the table used in this example coll ects only individuals with complete observ ations. T herefore, the follo wing exploratory analysis is in tend ed to b e only an illustration with a realistic example. F rom a ﬁr st analysis of the data, the follo wing m arginal in dep end encies are not rejected b y the c h i-squ ared goo d ness of ﬁt test statistic F ⊥ ⊥ C A G ⊥ ⊥ J A J ⊥ ⊥ GS A ⊥ ⊥ F G χ 2 6 = 6 . 7 χ 2 5 = 3 . 3 χ 2 6 = 8 . 1 χ 2 5 = 2 . 1 and th us they suggest the indep endence mo d el represen ted by th e bi-directed graph in Figure 3(a). Fitting this mo del, u n der th e m ultinomial sampling assumption, we obtain an adequate ﬁt with a deviance of 17 . 29 on 17 degree s of freedom. The Aitc hison and Silv ey’s algorithm con v erges after 13 it erations. The enco ded indep endencies cannot b e represent ed b y a directed acyclic graph mo d el with the same observed v ariables, b ecause the graph con tains at least one su b graph whic h is a c h ordless 4-c hain. The d isconnected set parameterizatio n deﬁn ed by the ordered decomp osable sequence M G = { C F , F A, GJ, GA, J S, C F A, F GA, GJ S, GJ A, C F GJ S A } is v ariation in dep end en t. Instead, by searc hing in the class of graphical log-linea r mo d els with the bac kwa rd stepwise s election p ro cedure of mim (Edwa rds, 2000) w e found a m o del with a d eviance of 10 3.16 o ve r 110 degrees of f reedom. The mo del graph is s ho w n in Figure 3(b). Other selec tion p r o cedures sho w how ev er that there are seve ral equ ally w ell ﬁtting mod els. The chosen undirected graph is sligh tly simpler (2 edge less) than th e bi-directed graph. As antici pated, the n um b er of constrain ts on p arameters is ho wev er m uc h higher. F rom the insp ection of the studen tized m ultiv ariate logistic estimates, w e noticed that the higher order log-linear p arameters are almost all n ot signiﬁcant and th u s w e ﬁtted a reduced mo del, b y fur ther restricting to zero all the log-linear parameters of order higher than t wo, obtai ning a deviance of 108 . 34 on 118 degrees of freedom. The estimates of the remaining n onzero tw o-factor log-linear parameters are sho w n in T able 7. These are estima ted lo cal log o dds -ratios in the selected t w o-wa y m arginal tables and they ha ve the exp ected sig ns. By comparison, the ﬁtted non-graphical log-linear mo del with the graph of Figure 3(b), with add itional zero constraints on the log-linear parameters of order h igher than t w o, leads to a c hi-squared go o dness of ﬁ t of 118 . 49 on 119 degrees of freedom. Both mo dels thus app ear adequate. G S F C J A G S F C J A (a) (b) Figure 3. Data fr om the U.S. Gener al So cial Survey 1972-200 6. (a) A bi-dir e cte d gr aph mo del ( χ 2 17 = 17 . 2 9 ). (b) A gr aphic al lo g-line ar mo del ( χ 2 110 = 103 . 1 6 ). 20 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A T able 7. Estimates of two-factor lo g-line ar p ar ameters for the bi- dir e cte d gr aph mo del of Fig u r e 3(a) with additional zer o r estrictions on higher or der terms. The asterisks indic ate the p ar ameters for which the Wald statistic is signiﬁc ant. Margin Parameter Estimate s.e. Margin Parameter Estimate s.e. C G (1) − 0 . 38 0 . 048 * F G (1 ) − 0 . 01 0 . 047 C J (1) 0 . 10 0 . 0 43 * (2) 0 . 16 0 . 058 * (2) 0 . 14 0 . 0 58 * F J (1) 0 . 29 0 . 044 * C S (1) 0 . 46 0 . 0 40 * (2) 0 . 05 0 . 065 C A (1) 0 . 56 0 . 049 * (3) 0 . 04 0 . 056 GS (1) − 0 . 77 0 . 0 42 * (4) 0 . 36 0 . 072 * J A (1) − 0 . 21 0 . 051 * F S (1) − 0 . 004 0 . 040 (2) − 0 . 03 0 . 075 (2) − 0 . 35 0 . 051 * S A (1) 0 . 18 0 . 047 * The last exa mple sh o ws that sometimes the b est ﬁtted marginal indep endence mo del ma y b e simpler than the b est ﬁtted d irected acyclic mo del. Example 11. The set of data in T able 8 is take n from the General So cial Su r v ey in German y in 1998 (ALLBUS, 1998 ). In a s elected p opulation age d b et wee n 18 and 65, the answers of 1228 resp ondents are collected ab ou t the follo wing 5 binary v ariables U , unconcerned ab out environmen t (y es, no); P , no own p olitic al imp act exp ected (y es, n o), E ; paren ts education, b oth at low er lev el (at most 10 y ears) (y es, no); A , ag e u nder 40 y ears(yes, no); S , gender (female, male) . A p ossible ordering of the v ariables has b een suggested by W ermuth (2003), who analyzed a sup erset of this data set and discussed a directed acyclic graph mo del. Using a similar ordering, limited to the v ariables h ere studied, we consider the v ariables { A, S } as purely explanatory , E and P as in termed iate and U as ﬁnal resp onse. Our ﬁn al we ll ﬁtting directed acyclic graph m o del, sho wn in Fig- ure 4(a), has a d eviance 3 . 70 ov er 3 degrees of freedom. Th e sub graph for all the v ariables except gender S is complete. Sp eciﬁcal ly , the graph has an edge E → U , indicating a d i- rect eﬀect of education on the ﬁ nal resp onse. The mo del without the arro w E → U has a w orse go o dn ess of ﬁt χ 2 15 = 36 . 0 and further it can b e v eriﬁed that the t wo- factor log-l inear parameters E P and E A are large and signiﬁcant. Model s election in the class of the graph- ical log-linear mo d els do es not lead to any sens ible red uction wh ilst search in the class of bi-directed graph mo dels sho ws that a sp ecial structur e of marginal ind ep endencies h olds . T able 8. Data fr om the German Gener al So cial Survey in 1998. U yes no S f m f m A E P y es no yes no yes no yes no no yes 6 8 7 27 66 186 24 230 no 4 0 1 9 8 6 4 4 60 yes yes 2 2 11 6 28 159 16 130 no 0 1 0 2 4 7 5 8 80 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 21 S U P A E S U P A E (a) (b) Figure 4. Two gr aphic al mo dels ﬁtte d to data fr om the Gener al So cial Survey in Germany, 1998. (a) A dir e cte d acyclic gr aph mo del: χ 2 3 = 3 . 70 . (b) A bi-dir e cte d gr aph mo del: χ 2 5 = 5 . 91 . The ﬁnal selecte d bi-directe d graph, represen ted in Figure 4(b), represen ts the marginal indep end encies S ⊥ ⊥ A, E and E ⊥ ⊥ S, U . The bi-directed graph co n tains the c hord less 4- c h ain E AU S and th us it is not Mark o v equiv alen t to an y directed acyclic graph in the ﬁve v ariables. T his suggests th at the d irected acyclic graph mo d el conceals some distortions due to the presence of laten t v ariables. Also in this case, the d isconnected set parame- terizatio n deﬁned b y th e sequence M G = ( GE , GF , AE , GF E , GE A, AB E F G ) leads to a v ariation indep en d en t parameterization b ecause it can b e veriﬁed that the sequence M G is ord er d ecomp osable. 7. Discussion The d iscrete mo d els based on marginal log-linear mod els by Bergsma & Rud as (2002 ) form a large class that includ es sev eral discrete graphical m o dels. The undirected graph mo dels and the c hain graph mo dels under the classical (Lauritzen, W ermuth, F rydenberg) in terpretation can b e parameterize d as marginal log-linear models. F or an in tro d uction see Rud as et al. (2006). Th is pap er sho ws that th e discrete bi-directed graph mo dels u nder the global Marko v prop ert y are included in the same class by sp ecifying the constraints appropriately . In general, three main criteria w ere considered in c h o osing a marginal log-linear parameterizati on. (a) Up ward compatibilit y: if th e parameters ha ve a meaning that is in v ariant across diﬀeren t marginal d istributions, then the in terpretations remain the same when a sub-mo d el is c hosen. W e sa w that the multiv ariate logistic parameteriza tion has this prop ert y . (b) Modelling considerations: the parameterization should conta in all the parameters that are of inte rest for the p roblem at h and. F or example, in a regression con text where some v ariables are prior to others, eﬀect parameters conditional on th e explanatory v ariables are most meaningful. In the seemingly unr elated r egression problem of Example 7, the chosen parameters ha v e th e interpretatio n of logistic regression co eﬃcien ts. (c) V ariation indep end ence: if the p arameter space is the whole Euclidean space, this has certain adv antag es. First, the in terpretations are simp ler, b ecause in a certain sense d iﬀeren t p arameters measur e diﬀerent thin gs. Second, in a Ba ye sian con text, prior s p eciﬁcation is easier. Finally , the problem of out-of-b ound estimates when transforming the parameters to prob ab ilities is a vo ided. In the examples, we alw ays 22 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A found a v ariation in dep end en t parameteriza tion, but a c h aracterizat ion of the class of bi-directed graphs admitting a v ariation indep enden t complete and hierarc h ical marginal log-li near parameteriza tion is an op en problem. The three c riteria are in some cases conﬂicting: typica lly v ariation indep en dence is ob- tained at th e exp ense of upw ard compatibilit y . The m u ltiv ariate logistic p arameterization has a p urp ose s im ilar to that of the M¨ obiu s parameterizatio n recen tly prop osed by Drto n & Ric hardson (20 07) for binary marginal indep end ence mo dels, w hic h is based on a min imal set of marginal p robabilities ident ify- ing the join t distribution. These authors discuss the t yp e of constraints o n the M¨ obius parameters needed to sp ecify a marginal indep endence, sho win g that th ey tak e a simple m ultiplicativ e form. T h e same constraints are deﬁned by zero restrictions on marginal log-linear parameters in our approac h. Eve n if the parametric space can b e awkw ard, this problem is hand led b y a ﬁtting algorithm that op erates in the space of the exp ected frequencies, while the parameters are u sed only to deﬁn e the ind ep endence constrain ts. Moreo v er, the deﬁnition of th e mo dels th rough the complete s p eciﬁcation of the marginal log-linear parameters giv es some adv an tage when there is a mixture of n ominal and ordinal v ariables b ecause it allo w s to deﬁne appropriate parameters for both t yp es of v ariables using the t heory of generalized marginal in teractions by (Bartolucci e t al. , 2 007). This op ens the wa y to d eﬁ ning sub classes of discrete graphical mo dels sp ecifying equalit y and inequalit y constraints. The pr op osed algorithm for maxim um like liho o d ﬁtting of the bi-directed graph mo del is a v ery general algorithm of constrained optimizatio n b ased on Lagrange multipliers. It is essen tially b ased on Ait c h ison & Silve y (1958) as later dev elop ed by Bergsma (1997). Similar algorithms hav e b een pr op osed, for instance, b y Mole n b erghs & Lesa ﬀre (1994), Glonek & McCullagh (1995), Lang (1996) and fur ther generalized by C olom bi & F orcina (2001 ). Its m ain adv ant age is its generalit y (it can b e app lied to all mo dels deﬁn ed by constrain ts on the marginal log-linear parameters). As previously stat ed, the al gorithm do es not requ ire further iterativ e pro cedures for computing, at eac h step, the in verse transformation from the marginal log-linear parameters to the cell probabilities. Thus, the risk of not compatible estimates that could arise for th e lac k of v ariation indep endence is av oided. The disad v antage is that, as f or man y gradien t-based algorithms of this t yp e, con vergence is not guaranteed and that it requires th e co mputation of a large exp ected information matrix. Ho we v er, empirically , con v ergence is achiev ed in a relativ e few num b er of iterations by includ ing a step adjustment. An alternativ e algo rithm with con ve rgence guaran tees is the Iterated Conditional Fitting algorithm, prop osed b y Drton & Ric hardson (2007 ) for binary bi-directed graph mo dels in the M¨ obius parameterizatio n. A comparison b et w een the t wo alg orithms in terms of p erf orm ance, sp eed and memory requirements needs further in vestig ation. A c knowledgement W e thank Nann y W ermuth for helpful discussions. The wo rk of the ﬁ rst tw o authors w as partially su pp orted by MIUR, Rome, un der the pro ject PRIN 2005 1323 07. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 23 References Aitc hison, J. & Silvey , S. D. (1958). Maximum lik eliho od estimatio n of para meter sub ject to restraints. Annals of Mathematic al Statistics 29 , 813–8 28. ALLBUS (1998). Co debo ok ZA-Nr. 3755 . German So cial Science Infrastructure Service s . Andersson, S., Madigan, D. & Perlman, M. (2001 ). Alterna tive Ma rko v prop erties for chain graphs. Sc andinavian Journal of Statistics 28 , 33–8 5. Bartolucci, F., Colombi, R. & F o r cina, A. (2007). An ex tended class of marginal link functions for mo delling contingency ta ble s b y e q uality and inequality c o nstraints. Statistic a Sinic a 1 7 , 691– 711. Bergsma, W. P . (19 97). Mar ginal m o dels for c ate goric al data . Ph.d thesis, Tilbur g. Bergsma, W. P . & Rudas, T. (2002). Ma rginal log-linear mo dels for categ orical da ta . Annals of Statistics 30 , 1 40 – 1 59. Bertsek as, D. P . (1982 ). Constr aine d op timization and Lagr ange multiplier metho ds . Acade mic Press, New Y o r k. Colombi, R. & F orcina, A. (2001 ). Margina l regressio n mo dels for the analysis of p ositive asso cia - tion of or dinal resp ons e v ar iab els. Bio metrika 88 , 1007 – 1019 . Copp en, A. (1966). The Mark-Nyman tempera ment scale: an English translation. Brit. J . Me d. Psychol. 33 , 5 5 –59. Cox, D. R. & W ermuth, N. (199 3). Linea r dependencies r epresented by chain graphs (with discus- sion). Statistic al Scienc e 8 , 204– 218, 247–2 77. Cox, R. D. & W er mut h, N. (1990). An approximation to ma ximum likeliho o d estimates in reduced mo dels. B iometrika 77 , 747 – 7 61. Davis, J., Smith, T. & Mars den, J. A. (2007). Gener al So cial Surveys Cumulative Co deb o ok: 1972-20 06 . NORC: Chicago. Drton, M. & Ric hardson, T. S. (2 0 07). Bina ry mo dels for mar ginal indepe ndence . Journal of the R oyal S tatistic al So ciety, Ser. B , forthcoming. Edwards, D. (2000). Intr o duction to gr aphic al mo del ling . Springer V erlag , New Y ork, (2nd ed.) edn. Glonek, G. J. N. & McCullag h, P . (1995). Multiv aria te logistic mo dels . Journal of the R oyal Statistic al So ciety, Ser. B 57 , 533– 546. Kauermann, G. (1996). O n a dualiza tion of gra phical Ga ussian mo dels. Sc andinavian Journal of Statistics 23 , 1 05–1 16. Kauermann, G. (19 9 7). A note on multiv ar iate logistic mo dels for c ontingency tables. Austr alia n Journal of Statistics 39 , 261–2 76. Lang, J . B. (1996). Maximum lik eliho o d metho ds for a g eneralized class of log-linear models . Annals of Statistics 24 , 726 – 752. Lauritzen, S. L. (199 6). Gra phic al mo dels . Ox ford University Press, Oxford. Lienert, G. A. (1970 ). K onﬁgurations frequenzalyse einiger lysergs a urediathylamid-wirkungen. Arzneimittel lorschung 20 , 912–9 13. Molenberghs, G. & Lesa ﬀr e, E. (1994). Mar ginal mo delling of multiv ariate catego r ical data. Journ al of the Ameri c an Statistic al Asso ciation 89 , 6 33–6 4 4. Pearl, J. & W ermuth , N. (1994). When can asso ciation graphs admit a ca usal in terpr etation? In P . Chee s man & W. O ldford, eds., Mo dels and data, artiﬁc al intel ligenc e and statistics iv . Springer, New Y or k, pp. 20 5–214 . R Developmen t Core T eam (200 7). R: A language and en vir onment for statistic al c omput ing . R F oundation for Statistica l Computing, Vienna, Austria . ISB N 3 - 90005 1-07 - 0. 24 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Richardson, T. S. (20 03). Mar ko v pr op erty for acy c lic dir ected mixed g r aphs. S c andinavia n Journal of Statistics 30 , 145– 1 57. Richardson, T. S. & Spir tes, P . (20 02). Ancestral graph Ma r ko v mo dels. Annals of Statistics 30 , 962–1 03. Ro ddam, A. W. (2004). An approximate maximum likeliho o d pro cedure for parameter estimation in multiv aria te discrete data r egressio n models . J. of Applie d Statistics 28 , 273 – 279 . Rudas, T. & Bergsma , W. P . (2004). On applications of marginal mode ls for catego r ical da ta. Metr on LXI I , 1– 25. Rudas, T., Berg s ma, W. P . & N ´ emeth, R. (2006). Parameterization and estimation o f path models for categorical data. In A. Rizzi & M. Vichi, eds., Compstat 2006 Pr o c e e dings in Computational Statistics . P hysica-V er lag, Heidelb erg, pp. 383–3 9 4. W ermuth , N. (1998). Pairwise indepe ndence. In P . Armitage & T. Colton, eds., Encyclop e dia of biostatistics . Wiley , New Y ork , pp. 324 4–32 4. W ermuth , N. (200 3 ). Analysing so cia l science data with graphica l Ma rko v mode ls . In P . Gr een, N. Hjort & T. S. Richardson, eds., Highly stru ct ur e d sto chastic systems . Oxfo r d University P r ess, pp. 47–52 . W ermuth , N. & Co x, D. R. (199 2). On the relation betw een interactions obtained with alter native co dings of discr ete v ar iables. Metho dika VI , 76– 8 5. W ermuth , N., Cox, D. R. & Marchetti, G. M. (2006). Co v ariance chains. Bernoul li 12 , 841– 862. Whittaker, J. (19 90). Gr aphic al mo dels in applie d mu ltivariate stat ist ics . John Wiley .

Parameterizations and fitting of bi-directed graph models to categorical data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment