Parameterizations and fitting of bi-directed graph models to categorical data
We discuss two parameterizations of models for marginal independencies for discrete distributions which are representable by bi-directed graph models, under the global Markov property. Such models are useful data analytic tools especially if used in …
Authors: Monia Lupparelli, Giovanni M. Marchetti, Wicher P. Bergsma
P ARAMETERIZA TIO NS AND FITT I NG OF BI-DIRECTED GRAP H MODELS TO CA TEGO RICAL D A T A MONIA LUPP AREL L I Dip artimento di Ec onomia P olitic a e Meto di Quantitativi, via S. F e lic e, 7, 27100, Pavia, Italy GIO V ANNI M. M AR CHET TI Dip artimento di Statistic a “G. Par enti”, viale M or gagni, 59, 5013 4, Flor enc e, Italy WICHER P . BER GSMA L ondon Scho ol of Ec onomics and Politic al Scienc e, Houghton Str e et, WC2A 2AE L ondon, UK Abstra ct. W e discuss tw o p arameterizations of mo dels for marginal indep en dencies for discrete distributions which are representable b y bi-directed graph mo dels, under the global Marko v prop erty . Such mo dels are useful data analytic t o ols esp ecially if used in com bination with other graphical mo dels. The fi rst parameterization, in the saturated case, is also known as the multiv ariate logis tic tran sformation, the second is a var ian t that allo ws, in some (but not all) cases, v ariation indep endent parameters. An algorithm for maximum lik eliho o d fitting is prop osed, b ased on an extension of t he Aitchison and Silvey metho d. E-mail addr esses : mlu pparelli@ eco.unipv .it , giovanni .marchett i@ds.unif i.it , W.P.Bergsm a@lse.ac. uk . Date : 3 January 2008. Key wor ds and phr ases. cov ariance graphs, complete h ierarchical parameteriza tions, connected set Mark o v prop erty , constrained maximum likelihoo d, marginal indep endence, marginal log-linear mo dels, multiv ariate logistic transformation, v ariation indepen dence. 1 2 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 1. Introduction This p ap er deals with th e parametrization and fitting of a class of marginal ind ep endence mo dels for m ultiv ariate discrete d istr ibutions. These mo dels are asso ciated to a class of graph s w here the missin g edges represen t marginal indep enden ce. The graphs us ed ha ve s p ecial edges to distinguish them from u n directed graph s used to enco de conditional indep end encies. Cox & W erm uth (1993) use dashed edges and call the graphs co v ariance graphs by stressing the equiv alence b et w een a marginal pairwise in d ep enden ce and a zero co v ariance in a Gaussian distribution. Ric h ardson & Spirtes (2002) u s e instead bi-directed edges follo w ing the tradition of p ath analysts. The interpretation of the graphs in terms of indep end encies is based on the p airwise and global Mark ov p rop erties discussed originally b y Kauermann (1996) for co v ariance graphs and later deve lop ed by Ric hardson (2003 ). These are r ecalled in Section 2. Mo dels of marginal in dep end ence can b e useful in s everal con texts. F or instance, Co x & W ermuth (1993) pr esen t an example on diab etic patients concerning four cont in uous v ari- ables: X 1 , the duration of th e illness, X 2 , the quan tit y of a particular metab olic p arameter, X 3 , a score for the knowledge ab out th e illness, and X 4 , a questionnaire score mea sur- ing a patien ts’ attitude called external fatalism. The structure of th e correlation m atrix suggests f or th is d ata set the marginal in dep end en cies X 4 ⊥ ⊥ { X 1 , X 2 } and X 1 ⊥ ⊥ { X 3 , X 4 } . This marginal indep endence mod el can b e represen ted by the bi-dir ected graph in Fig - ure 1(a ), called a 4-c hain. Th e su ggested interpretation is that the d uration of illness X 1 and the external fatalism X 4 are indep enden t explanatory v ariables of the resp onses X 2 , X 3 in tw o seemingly unrelated regressions. F or f urther discussion on the in terpretation of co v ariance c hains see (W ermuth et al. , 2006). Bi-directed grap h mo dels are s ometimes useful to repr esen t marginal indep enden ce structures induced after marginalizing o ver la- ten t v ariables. The ind ep endence str u cture of the diab etes data , for example, migh t b e represent ed b y assuming an und erlying generating pro cess describ ed by a directed acyclic graph, sho wn in Fi gure 1(b), with one laten t v ariable p oin ting b oth to X 2 and X 3 . Af- ter marginalizing o v er the laten t v ariable the induced indep enden cies are exactly th ose enco ded in the b i-d ir ected graph of Figure 1(a). As another example w ith four binary v ariables, consider the data b y Copp en (1966) shown in T able 1, concerning symptoms of 362 psyc hiatric patien ts. The sym ptoms are: X 1 : stabilit y , X 2 : v alidit y , X 3 : acute 1 2 3 4 ⊗ 1 2 3 4 ⊗ ⊗ ⊗ 1 2 3 4 (a) (b) (c) Figure 1. (a ) A bi-dir e cte d gr aph, c al le d 4-chain, implying the indep en- dencies: 4 ⊥ ⊥ 12 and 1 ⊥ ⊥ 34 . Dir e cte d acyclic gr aphs inducing the same inde- p endencies after mar ginalization over the latent variables (with no des ⊗ ): (b) with one latent variable; (c) with 3 latent variables. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 3 depression and X 4 : solidit y . The c hi-squared tests of the hypotheses of marginal inde- p enden ce X 4 ⊥ ⊥ { X 1 , X 2 } and X 1 ⊥ ⊥ { X 3 , X 4 } , with p-v alues, resp ectiv ely , 0 . 32 and 0 . 14, are separately n ot significan t and the ind ep endence m o del defined b y the tw o statemen ts join tly giv es a satisfactory fit with a deviance of 8 . 61 on 5 d egrees of freedom. Th us the same bi-directed graph model defined b y the 4-c hain of Figure 1( a) is adequate. I n Section 6 we discuss the details of this application. In th is example, if all symptoms are treated on the same fo oting, it is less plausible that a single laten t v ariable will explain the indep end ence stru cture and more (at lea st th r ee) laten t v ariables are required to su ggest a generating pr o cess, as sh o wn in the graph of Figure 1(c). Dev eloping a parameterization for Gaussian bi-directed graph mo dels is straight forw ard since the pairwise and the global Marko v p rop erty are equiv alen t and they can b e simply fulfilled b y constraining to zero a subset of co v ariances. Accomplishing the same task in the discrete case is muc h more difficult due to the high num b er of parameters and to the non-equiv alence of the t wo Mark o v prop erties. Recen tly , Drton & Richardson (2007 ) studied th e p arametrizatio n of bi-directed graph m o dels for d iscrete b inary distr ibutions, based on M¨ oebius parameters, b y prop osing a v ers ion of their iterativ e conditional fitting algorithm for maximum lik eliho o d estimation. In this p ap er we prop ose different parameterizatio ns, su itable for general categorica l v ariables, based on the cla ss o f m arginal log- linear mo dels of Be rgsma & Rudas (2002). One sp ecial case of this class, esp ecially useful in the con text of bi-directed graph mo d - els, is the m ultiv ariate logistic parameterizati on of Glonek & McCullagh (199 5); see also Kauermann (199 7). W e discuss a further marginal log-linea r p arametrization that can, in sp ecial cases, b e sho wn to imply v ariation indep enden t parameters. W e sho w that the marginal lo g-linear parameteriza tions suggest a cla ss of red uced mo d els defined by con- straining certain higher-order log-linear parameters to zero. Then w e discuss maxim um lik eliho o d estimation of the mo dels and w e prop ose a general algorithm based on previous w orks by Aitc h ison & Silve y (1958) , Lang (1996), Bergsma (1997). The r emainder of this pap er is organized as f ollo ws. Section 2 reviews discrete bi- directed graphs and th eir Marko v prop erties. In Section 3 w e giv e the essen tial resu lts concerning the theory of marginal log -linear mo dels. Two p arameterizatio ns of bi-directed graph mo d els are giv en th en in Sectio n 4 illustrating their prop erties w ith sp ecial emphasis on v ariation indep en d ence and th e int erpretation of the parameters. In Section 5 we T able 1. Data by Copp en (1966) on symptoms of psychiatric p atients. The variables ar e X 1 : stability (1=extr overte d, 2=intr overte d), X 2 : valid- ity (1=psychasthenic, 2=ener getic), X 3 : depr ession (yes, no), X 4 : solidity (1=hysteric, 2=rigid). X 4 1 2 X 1 X 3 X 2 1 2 1 2 1 y 15 30 9 32 n 25 22 46 2 7 2 y 23 22 14 1 6 n 14 8 47 12 4 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A prop ose an algorithm for maxim um lik eliho o d fitting and then, in Section 6 w e p ro vid e some examples. Finally , in Section 7 w e give a short d iscussion, with a comparison w ith the app roac h by Drton & Ric hardson (2007 ). 2. Discrete bi-directed graph model s Bi-directed graphs are essen tially un directed graphs with ed ges represented by b i- directed arro w s instead of full lines. W e review in this section the main conce pts of graph theory required to un derstand the m o dels. A bi-dir ected graph G = ( V , E ) is a p air G = ( V , E ), w here V = { 1 , . . . , d } is a set of no des, and E is a set of edges defin ed by t wo-e lemen t subsets of V . T w o no d es u, v are adjac ent or neigh b ours if uv is an edge of G and in this case the edge is drawn as bi-directed, u ← → v . T w o edges are adjacen t if they ha v e an end nod e in common. A p ath from a n o de u to a no de v is a sequence of adjacen t edges connecting u and v for whic h the corresp onding sequence of no des con tains no rep etitio ns. Th e no des u and v are called the endp oints of the p ath and all the other no des are called th e inner no des . A graph G is c omplete if all its no des are pairwise adjacen t. A non-empt y graph G is called c onne cte d if any t wo of its no des are link ed b y a path in G , otherwise it is calle d disc onne cte d . If A is a subset of th e no de set V of G , the graph G A with nodes A and con taining all th e edges of G with endp oint s in A is calle d an induc e d sub gr aph . If a subgraph G A is connected (resp. disconnected, complete) w e call also A connected (resp. disconnected, complete ), in G . The set of all disconn ected s ets of the graph G will b e denoted by D , and the set of all the connected sets of G w ill b e d en oted b y C . In a graph G a c onne cte d c omp onent or s im p ly a comp onen t is a maximal connected subgraph . If a subset D of no des is disconnected then it can b e uniquely d ecomp osed into more connected comp onent s C 1 , . . . , C r , say , such that D = C 1 ∪ · · · ∪ C r . The usual notion of separation in undirected graphs can b e used also for bi-directed graphs. Thus, giv en three disj oin t subsets of no des A , B a nd C , A and B are said to b e sep ar ate d by C if for an y u in A and an y v in B all p aths from u to v ha ve at least one inner n o de in C . The cardinalit y of a set V will b e d enoted b y | V | . The set of all the subsets of V , the p ow er set, will b e d enoted b y P ( V ). W e use also the notation P 0 ( V ) for the set of all nonempty subsets of V . Let X = ( X v , v ∈ V ) b e a discrete random v ector with ea c h comp onent X v taking on v alues in the finite set I v = { 1 , . . . , b v } . The Cartesia n pro du ct I V = × v ∈ V I v , is a con tingency table, with generic elemen t i = ( i v , v ∈ V ), called a cell of the table, and with total num b er of ce lls t = |I V | . W e a ssume that X h as a join t probabilit y function p ( i ), i ∈ I V giving the probabilit y that an individual falls in cell i . Giv en a su bset M ⊆ V of the v ariables, the marginal con tingency table is I M = × v ∈ M I v with generic cell i M and the marginal probabilit y fu n ction of the r andom vec tor X M = ( X v , v ∈ M ) is p M ( i M ) = P j ∈I V | j M = i M p ( j ). A bi-directed graph G = ( V , E ) indu ces an indep endence mo d el for the discrete rand om v ector X = ( X v , v ∈ V ) by defining a Mark ov prop erty , i.e. a rule for reading off the graph the indep en d ence relations. In the follo w ing w e shall u se th e shorthand notati on A ⊥ ⊥ B | C to indicate the conditional indep endence X A ⊥ ⊥ X B | X C , wh ere A , B and C are BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 5 three disjoint subsets of V . Similarly A ⊥ ⊥ B and A ⊥ ⊥ B ⊥ ⊥ C will denote the m arginal and the complete indep end ence, resp ectiv ely , of sub-ve ctors of X . Th ere are tw o Mark o v prop erties describ ing the indep endence mo del associated with a b i-d ir ected graph, which w e consider in this pap er: (a) the global Marko v prop erty of Kauermann (19 96) and (b) the connected set Mark o v prop ert y by Ric hardson (2003). The distribution of the random v ector X satisfies the glob al Markov pr op erty for the bi-directed graph G if for any triple of d isjoin t sets A , B a nd C , A ⊥ ⊥ B | V \ ( A ∪ B ∪ C ) whenever A is separated from B b y C in G. Instead, th e distribution of X is said to satisfy the c onne cte d set Markov pr op erty if (1) C 1 ⊥ ⊥ · · · ⊥ ⊥ C r whenev er C 1 , . . . , C r are the connected comp onents of every disconn ected s et D ∈ D . Ric hard son (2003) p ro ves that the t w o prop erties are equiv alen t; see also Drton & Ric hard - son (2007) . F ollo wing these authors w e define a discrete bi-directed graph mo d el as follo ws. Definition 2.1. A discrete bi-directed graph model asso ciate d with a bi-dir e cte d gr aph G = ( V , E ) is a family of discr ete joint pr ob ability d istributions p for the discr ete r andom ve ctor X = ( X v , v ∈ V ) , that satisfies the pr op erty (1) for G , i.e. such that, for every disc onne cte d set D in the gr aph, p D ( i D ) = p C 1 ( i C 1 ) × · · · × p C r ( i C r ) , wher e C 1 , . . . , C r ar e the c onne cte d c omp onents of D . If the global Mark ov p rop erty holds then for an y p air of not ad j acen t n o des, th e asso ci- ated random v ariables are marginally indep en d en t. This implication is called the p airwise Markov pr op erty and it is for discrete v ariables a necessary but not sufficient condition for the global Mark o v pr op ert y . This is in sharp cont rast with the family of Gaussian distributions where th e t wo pr op erties are equiv alen t. Example 1. Here and henceforth w e shall us e the short forms 34 and 12 to denote the sets { 3 , 4 } and { 1 , 2 } , and so on. T h e graph of Figure 1(a) is a c hain in 4 no des with disconnected sets D = { 13 , 1 4 , 24 , 1 34 , 124 } . Th us, D = 13 has the comp onents C 1 = 1 and C 2 = 3, while D = 134 can b e decomp osed in to C 1 = 1 and C 2 = 34. Th e p airwise Mark o v p rop erty implies 1 ⊥ ⊥ 3, 1 ⊥ ⊥ 4 and 2 ⊥ ⊥ 4, while the connected set Mark o v prop ert y implies further that 1 ⊥ ⊥ 34 and 4 ⊥ ⊥ 12. The global Mark ov pr op ert y im p lies the equiv alent set of indep en dence statemen ts 1 ⊥ ⊥ 4, 2 ⊥ ⊥ 4 | 1 and 1 ⊥ ⊥ 3 | 4. Note that the complete list of all marginal in d ep enden cies implied by a bi-directed graph mo del is der ived f rom the class D of all disconnected sets of the graph. 6 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 1 2 4 5 3 1 2 4 5 3 (a) (b) Figure 2. Two bi- dir e cte d gr aphs. The indep endencies implie d by the c on- ne cte d set Markov pr op erty (or, e quivalently, the glob al Markov pr op erty) ar e: (a) 1 ⊥ ⊥ 34 , 3 ⊥ ⊥ 15 and 5 ⊥ ⊥ 23 ; (b) 1 ⊥ ⊥ 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 345 , 12 ⊥ ⊥ 45 and 123 ⊥ ⊥ 5 . Example 2. The graph of Figure 2(a) has 7 disconnected sets and th us the asso ciated discrete bi-directed graph mo d el fulfills the ind ep endencies 1 ⊥ ⊥ 3 , 1 ⊥ ⊥ 4 , 2 ⊥ ⊥ 5 , 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 34 , 5 ⊥ ⊥ 23 , 3 ⊥ ⊥ 15 that reduce to 1 ⊥ ⊥ 34, 3 ⊥ ⊥ 15 and 5 ⊥ ⊥ 23, after eliminating redu n dancies. T he discrete mo del asso ciated with th e graph of Figure 2(b) with 16 disconnected subsets satisfies 16 marginal indep end encies that can b e redu ced to the four statemen ts 1 ⊥ ⊥ 3 ⊥ ⊥ 5 , 1 ⊥ ⊥ 345 , 12 ⊥ ⊥ 45 , 123 ⊥ ⊥ 5 . The stronger condition requir ed by Definition 2.1 implies that in some situations not all marginal indep end ence relations are repr esen table by b i-directed graphs, as th e follo wing example sho w s. Example 3. Cons ider the data in T able 2, due to Lienert (1970). T he v ariables are 3 symptoms after LSD in tak e, recorded to b e p resen t (le v el 1) or absen t(lev el 2), and are distortio ns in affectiv e b eha vior ( X 1 ), distortions in think in g ( X 2 ), and dimming of consciousness ( X 3 ). As W ermuth (1998 ) p oin ts out, the frequencies in the three marginal tables sho w that the th ree symptom pairs are close to indep endence, but at the same time the v ariables are not m u tual ind ep endent as witnessed by the strong three-factor in teractio n du e to the quite distinct conditional o d ds ratio s b et w een X 1 and X 2 at the t wo lev els of X 3 . Th us, in this case, despite three marginal ind ep endencies, a discrete bi-directed graph mo del ca n represen t just one of them, and thus must include at le ast t wo edges. P earl & W erm u th (1994 ) studied the Mark o v equiv alence b et we en bi-directed graph mo dels (actually the co v ariance graph s ) and d irected acyclic graphs mo dels, i.e. when the tw o mo dels imply exactly the same conditional in dep end ence statement s, und er their resp ectiv e global Mark ov pr op ert y (for the global Mark o v prop ert y see Lauritzen, 199 6). They sho wed that eac h bi-directed graph is alw a ys Marko v equiv alen t to a directed acyclic graph with add itional syn thetic laten t no d es, after marginalizing o v er the add itional no des, as exemplified in Figure 1(b, c). Moreo ver they also giv e a Ma rk o v equiv alence result, pro ving that a bi-directed graph is equiv alen t to a dir ected acyclic graph with th e same BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 7 set of no des if and only if it con tains no 4-c hain. Th u s, th ere is n o dir ected acyclic graph whic h is Mark o v equiv alen t to the bi-directed graphs of Figures 1(a), 2(a) or 2(b). 3. Marginal log-linear p aramete riza tions Discrete bi-directed grap h mo dels ma y b e defined as marginal log-linear mo d els, using complete hierarc h ical parameteriz ations as d efi ned b y Bergsma & Ru das (20 02). In this section w e review the b asic concepts and we discuss the defin itions of the parameters in v olv ed. Let p ( i ) > 0 b e a strictly p ositiv e pr obabilit y d istribution of a discrete r andom v ector X = ( X v , v ∈ V ) and let p M ( i M ) b e an y marginal probability d istribution of a sub - v ector X M , M ⊆ V . The marginal p robabilit y d istribution adm its a log-linear expan s ion log p M ( i M ) = X L ⊆ M λ M L ( i L ) where λ M L ( i L ) is a fu n ction defining the log-linear parameters ind exed by the sub set L of M . The f unctions λ M L ( i L ) are defin ed by λ M L ( i L ) = X A ⊆ L ( − 1) | L \ A | log p M ( i A , i ∗ M \ A ) where i ∗ = (1 , . . . , 1) denotes a baseline cel l of the table; see Whittak er (1990) and Lau- ritzen (1996). The function λ M L ( i L ) is zero whenev er at least one index in i L is equal to 1. Therefore, λ M L ( i L ) defines only Q v ∈ L ( b v − 1) parameters where b v is the n u m b er of catego ries of v ariable X v . Du e to the constrain t on the probabilities, that must sum to one, the parameter λ M φ = log p ( i ∗ M ) is a function of the others, and can th u s b e eliminate d. If λ M L is th e v ector con taining the parameters λ M L ( i L ), then it can b e obtained exp licitly using Kronec ker p ro ducts as follo ws. F or any sub set L of M , let C v,L b e the matrix C v,L = ( − 1 b v − 1 I b v − 1 ) if v ∈ L (1 0 b v − 1 ) if v 6∈ L. and let π M b e the t M × 1 column v ector of the marginal cell pr ob ab ilities in lexicographic order. Then, the v ector of the log-linear parameters λ M L ( i L ) is (2) λ M L = C M L log π M , where C M L = O v ∈ M C v,L . T able 2. Data by Lienert (1970) c onc erning symptoms after LSD -intake. O R is the c onditional o dds-r atio b etwe en X 1 and X 2 given X 3 . The f r e- quencies show evidenc e of p airwise indep endenc e, but mutual dep e ndenc e. X 3 1 2 X 1 X 2 1 2 1 2 1 21 5 4 16 2 2 13 11 1 O R 27 . 3 0 . 023 8 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A F or a discussion of the tec h nique of building all log-linear parameters based on Kr onec ker pro du cts s ee W ermuth & Co x (1992). The co d ing u sed in this p ap er corresp onds to th eir indicator cod ing, and give s the parameters used for example by the pr ogram glim . A m arginal log -linear parameterization of the probabilit y distr ibution p ( i ) is obtained b y com bin ing the log-linear parameters λ M L for many d ifferen t marginal pr obabilit y distri- butions. The general theory is dev elop ed in Bergsma & Rudas (2002 ) and is s u mmarized b elo w. Definition 3.1. L et M = ( M 1 , . . . , M s ) b e an or der e d se quenc e of mar gins of inter est, and, for e ach M j , j = 1 , . . . , s , let L j b e the c ol le ction of sets L for which λ M j L is define d with e quation (2) . Then, ( λ M j L ) is said to b e a hier ar c hic al and c omplete mar ginal lo g- line ar p ar ameterization for p ( i ) if ( i ) the se q u enc e M 1 , . . . , M s is non-de cr e asing; ( ii ) the last mar g i n is M s = V ; ( iii ) the sets defining the lo g-line ar p ar ameters in e ach mar gin ar e: L 1 = P 0 ( M 1 ) , and L j = P 0 ( M j ) \ j − 1 [ h =1 L h , for j > 1 , wher e P 0 ( M j ) denotes the c ol le ction of al l non-empty sets of M j . The parameterizatio n is called hierarc hical b ecause it is generated by a n on-decreasing sequence M , and complete b ecause it d efines all p ossible log-linear parameters terms, eac h within one and only o ne marginal table. Notice that the parameterization is asso ciated uniquely to a particular sequence M of margins. Th us, a d ifferent (still n on-decreasing) ordering of the sequence induces a different parameteriza tion; see the examples in Sec- tion 4.2. The ab ov e constr u ction defines a map fr om the simplex ∆ V of the strictly p ositiv e distributions p ( i ) of the discrete random v ector X into the set Λ o f p ossible v alues for the whole v ector of the marginal log-linear parameters λ = ( λ M j L ), with j = 1 , . . . , s and L ∈ L j . The f ollo wing general r esult sho w s that a complete hierarc hical marginal log-linear mo del defines a pr op er p arameterizati on. Prop osition 1. (Be r gsma & Ruda s, 2002) The map ∆ V → Λ ⊆ R t − 1 define d by a c omplete and hier ar chic al mar ginal lo g-line ar p ar ameterization is a diffe omorphism. The parameters λ can b e written in matrix form λ = C log ( T π ) where π is th e t × 1 vec tor of all the cell p r obabilities in lexicographical order , T is a m × t marginalizatio n matrix su ch that T π = π M 1 . . . π M s and C = diag ( C M L ) is a t − 1 × m blo c k diago nal matrix, with m = P s j =1 |I M j | . F or a discussion of algorithms for computing the matrices C and T see Bartolucci et al. (2007), that generali ze the approac h by Bergsma & Rud as (2002) to log its and higher ord er effects of global and cont in uation t yp e, suitable with ord inal d ata . BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 9 The log- linear parameterization and the m ultiv ariate logistic transformation represen t t wo sp ecial cases of marginal log -linear models. The standard log-linear paramete rs are generated b y M = { V } . They will b e denoted b y θ L = λ V L for L ∈ P 0 ( V ) and the whole vect or of parameters by θ . T he p arameter space coincides with R t − 1 and the map from π to θ admits an inv erse in closed form, p ro v id ed that π > 0. The m ultiv ariate logistic parameters Glonek & McCullagh (1995) are generated by M = P 0 ( V ), in an y non-decreasing order. They will b e denoted by η M = λ M M , with η represen ting the whole v ector. Thus the parameters η M corresp ond to the highest order log-linear parameters within eac h marginal table I M , for eac h n onempt y s et M ⊆ V . Th e parameter space is in general a strict su bset of R t − 1 , except wh en th e num b er of v ariables is d = 2. In general there is no closed form inv erse tr ansforming bac k η into π . Th e inv erse op eration ho wev er ma y b e accomplished using for example the iterativ e p r op ortional fi tting algorithm. Th us, w hile the log-linear parameters θ are alw ays v ariation indep enden t and for any θ in R t − 1 there is a uniqu e asso ciated joint probability distrib ution π , instead the m ultiv ari- ate logistic p arameters are n ev er v ariation indep end en t, for d > 2. Thus there are v ecto rs η in R t − 1 that are not compatible with an y join t p robabilit y distribution π . T he latter assertion is also imp lied by a fu rther result b y Bergsma & Rudas (2002) whic h pro ves that the hierarc hical and complete marginal log-linear parameterization generated by a sequence M is v ariation indep enden t if and only if M satisfies a pr op ert y called or der e d de c omp osability . A sequence of arbitrary su bsets of V is said to b e order ed decomp osable if it has at most t w o elemen ts or if there is an ord ering M 1 , . . . , M s of its elemen ts, suc h that M i 6⊆ M j if i > j and, for k = 3 , . . . , s , the maximal elemen ts (i.e. those n ot cont ained in an y other sets) of { M 1 , . . . , M k } form a decomp osable set. F or further details and exam- ples ab out ordered decomp osabilit y see Ru das & Bergsma (2004). More p rop erties of the t wo parameterizations θ and η , connected to graphical mo d els, will describ ed in the next Section 4. 4. P ara meteriza tions of discrete bi-directed graph models W e suggest n o w t w o d ifferent marginal log-linear parameterizations of discrete bi- directed graph mo dels, and we compare adv an tages an d sh ortcomings. 4.1. Multiv ariat e logistic parameterization. It is kn o wn that the complete indep en- dence of tw o sub-v ectors X A , X B of the rand om vect or X is equ iv alent to a set of zero restrictions on multiv ariate logistic parameters. Lemma 1. (Kauer m ann (199 7), Lemma 1) . If { A, B } is a p artition of V and η = ( η M ) , M ∈ P 0 ( V ) is the multivariate lo gistic p ar ameterization, then A ⊥ ⊥ B ⇐ ⇒ η M = 0 for al l M ∈ Q wher e Q = { M ⊆ A ∪ B : M ∩ A 6 = ∅ , M ∩ B 6 = ∅} . W e generalize this result to complete indep end ence of more than tw o random vecto rs. Giv en a partition { C 1 , . . . , C r } of a set D ⊆ V , w e defi ne Q ( C 1 , . . . , C r ) = P ( S r i =1 C k ) \ S r i =1 P ( C k ) . 10 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A This is th e set of all s u bsets of D not completely cont ained in a single class, i.e. con taining elemen ts coming from at least tw o classes of th e partition. With this notation, the set Q of Lemma 1 ma y b e d en oted b y Q ( A, B ). Then we h a ve the follo wing result. Prop osition 2. L et X = ( X v ) , v ∈ V , b e the discr ete r andom ve c tor with multivariate lo gistic p ar ameterization η = ( η M ) , M ∈ P 0 ( V ) . If D ⊆ V is p artitione d into the classes { C 1 , . . . , C r } then C 1 ⊥ ⊥ . . . ⊥ ⊥ C r ⇐ ⇒ for al l M ∈ Q ( C 1 , . . . , C r ) : η M = 0 . Pr o of. First, use the sh orthand notat ions Q to denote th e set Q ( C 1 , . . . , C r ) and Q i to denote the set Q ( C i , C − i ), i = 1 , . . . , r , where C − i = D \ C i . In fact, since Q i ⊆ Q , th en S r i =1 Q i ⊆ Q . Con versely , for any M ∈ Q there is alwa ys a class C i suc h that C i M , and hence, b y definition, M ∈ Q i . Hence, f or every M ∈ Q , M ∈ S r i =1 Q i and th us Q ⊆ S r i =1 Q i . Th en, the complete indep endence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is equiv alent to C i ⊥ ⊥ C − i for all i = 1 , . . . , r . By Lemma 1, applied to the sub-v ector X D , eac h ind ep endence C i ⊥ ⊥ C − i is equiv alen t to the restriction η M = 0 for M ∈ Q i and the parameters η M are iden tical to the corresp onding multiv ariate logistic p arameters for th e full r andom v ector X V . Thus, the complete indep end ence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is equiv alen t to η M = 0 for M ∈ Q i , i = 1 , . . . , r , i.e. for M ∈ S r i =1 Q i = Q . Prop osition 2 imp lies that a statemen t of complete indep endence C 1 ⊥ ⊥ . . . ⊥ ⊥ C r is equiv- alen t to a set of z ero co nstrain ts o n the m ultiv ariate log istic parameters. T he follo w ing result exp lains h ow the constrain ts m ust b e c hosen in order to satisfy all the indep end en cies required by the Definition 2.1 of a bi-directed graph mo del. Prop osition 3. Given a bi- dir e cte d gr aph G = ( V , E ) , the discr ete bi - dir e cte d gr aph mo del asso ciate d with G is define d by the set of strictly p ositive discr ete pr ob ability distributions with multivariate lo gistic p ar ameters η = ( η M ) , M ∈ P 0 ( V ) , such that η M = 0 for every M ∈ D , wher e D is the set of al l disc onne cte d sets of no des in the gr aph G . Pr o of. Giv en a set D ∈ D , denote its connected comp onen ts b y { C 1 , . . . C r } and by Q D the set Q ( C 1 , . . . , C r ). First, w e pro ve th at D = S D ∈D Q D . I n fact, for any D ∈ D , Q D ⊆ D b ecause it is a class of disconnected sub sets of D . Th u s , S D ∈D Q D ⊆ D . Conv ersely , if D ∈ D , then D ∈ Q D and thus D ⊆ S D ∈D Q D . By Definition 2.1, the indep en- dence C 1 ⊥ ⊥ · · · ⊥ ⊥ C r is implied for eac h disconnected set D with co nnected comp onent s C 1 , . . . , C r . By Prop ositio n 2, this is equiv alen t to the zero restrictions on the m ultiv ariate logistic p arameters η M = 0 , for all M ∈ Q D , D ∈ D i.e. f or all M ∈ S D ∈D Q D = D . A consequence of Prop osition 3 is that all p ossible discrete bi-directed graphical mo dels can b e id en tified within the m u ltiv ariate logistic parametrization under the zero constraints asso ciated w ith the d isconnected sets. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 11 T able 3. Comp arison b etwe en two p ar ameterization of the discr ete chor d- less 4-chain mo del of Figu r e 1(a): ( η ) with bi-dir e cte d e dges; ( θ ) with undir e cte d e dges. T erms 1 2 3 4 12 13 14 23 24 3 4 123 124 134 234 1 234 η η 1 η 2 η 3 η 4 η 12 0 0 η 23 0 η 34 η 123 0 0 η 234 η 1234 θ θ 1 θ 2 θ 3 θ 4 θ 12 0 0 θ 23 0 θ 34 0 0 0 0 0 Example 4. T he discrete mo del asso ciated with th e c hordless 4- c hain of Figure 1(a) is defined b y the m ultiv ariate logi stic parameters sh o w n in T able 3, first ro w. There are 5 zero constr aints on the highest-order log-linear parameters of the tables 13, 14, 24, 124 134. There are three nonzero t w o-factor marginal log-linear p arameter η ij asso ciated with the edges of the graph th at may b e interpreted as sets of marginal asso ciation coefficients b et w een the in v olv ed v ariables, based on the c h osen contrasts. C onsider now the reduced mo del resulting after dropping the edge 2 ↔ 3 and implying the in dep end en ce 12 ⊥ ⊥ 34. This mo del ca n b e obtained, within the same p arameterizati on, by the additional zero constrain ts on η 23 , η 123 , η 234 and η 1234 . While the parameters are in general not v ariation indep enden t, they satisfy the upw ard compatibilit y prop erty , b ecause they ha ve the same meaning across d ifferen t marginal distributions. Using this prop erty , we can p ro v e the follo wing resu lt concerning the effect of marginalization ov er a subset A of the v ariables. Let G A = ( A, E A ) b e t he sub graph induced b y A , and let D A b e the set of all disconnected sets of G A . Prop osition 4. If a discr ete pr ob ability distribution p ( i ) for i ∈ I V satisfies a bi-dir e cte d gr aph mo del define d by the gr aph G = ( V , E ) then the ma r ginal distribution p A ( i A ) over A ⊆ V satisfies the bi-dir e cte d gr aph mo del define d b y G A = ( A, E A ) and its multivaria te lo gistic p ar ameters a r e η = ( η M ) , M ∈ P 0 ( A ) with c onstr aints η M = 0 , for M ∈ D A . Pr o of. After m arginalizati on o v er A , the m ultiv ariate logi stic parameters asso ciated with p A ( i A ), b y the prop ert y of upw ard compatibilit y , are ( η M , M ∈ P 0 ( A )). Some of these parameters are zero b y the constrain ts implied by the orig inal bi-directed graph mo del, i.e. η M = 0 , for M ∈ D ∩ P 0 ( A ). The r esult is p ro ved by sh o win g that D ∩ P 0 ( A ) = D A . First, we n ote that if D ⊆ A ⊆ V , then the graph G D = ( D , E D ) with edges E D = ( D × D ) ∩ E = ( D × D ) ∩ E A is a sub graph of b oth G A and G . Th us , if D ⊆ A and D ∈ D then the in duced sub graph G D is disconnected and b eing also a sub graph of G A then D is also a d isconn ected set of G A . Thus D ∩ P 0 ( A ) ⊆ D A . Con v ersely , if D is a disconnected set of G A , th en the subgraph G D is disconnected, and b eing a subgraph of G , then D is also a d isconn ected s et of G . Th us D A ⊆ D ∩ P ( A ), and the r esu lt follo ws. Discrete bi-directed graph mo d els in the multiv ariate logistic parameterizatio n can b e compared with discrete log-linea r graphical mo dels r epresen ted b y undirected graphs with the same ske leton (i.e. with the same set E ). T o f acilitate the comparison we state the follo wing w ell-kno wn result, follo wing from the Hammers ley and Clifford theorem, (see Lauritzen, 1996, p. 36), whic h is th e undirected graph mo d el coun terpart of Prop osition 3. 12 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Prop osition 5. Given an u ndir e cte d gr aph G = ( V , E ) , a discr ete gr aphic al lo g-line ar mo del asso ciate d with G is define d by the set of strictly p ositive discr ete pr ob ability distri- butions with lo g-line ar p ar ameters θ = ( θ L , L ∈ P 0 ( V )) , such that θ L = 0 for every L ∈ N , wher e N is the set of al l inc omplete subsets of no des in the gr aph G . The set D of all disconnected s ets of a graph G is included in the set N of the incomplete sets, and therefore the n u m b er of zero restrictio ns of the un directed graph mo d els is alwa ys higher th an the num b er of zero restrictions of the bi-dir ected graph mo dels with the same sk eleto n, (see Drton & Ric hard son, 2007 ). Example 5. A d iscrete undirected graph mo d el for the 4-c hain implies the indep en dencies 12 ⊥ ⊥ 4 | 3 and 1 ⊥ ⊥ 34 | 2 and is defined by zero co nstrain ts on 8 log -linear parameters θ L , sho wn in T able 3, second ro w. Also, Prop osition 5 implies that in the d iscrete undirected graph model the general hierarc h y principle h olds, i.e. if a particular log-linear term is zero then all h igher terms con taining the same set of subscripts are also set to zero. On the con trary , b y Pr op osition 3, in the m ultiv ariate logisti c parameterizatio n of the bi-dir ected graph mo del the hierarch y p rinciple is violated b ecause a su p erset of a disconnected set ma y b e connected. Thus, for instance in the example sho wn in T ab le 3 there are zero pairwise asso ciations, lik e η 13 = 0 , but nonzero higher order log-linear parameters like η 123 6 = 0 and η 1234 6 = 0 . 4.2. The disconnected sets parameterization. W e discuss no w another marginal log- linear p arameterization that can represent the in d ep enden ce constr aints imp lied by any discrete bi-d ir ected graph mo d el, but in v olving only those marginal tables wh ic h are needed. This parameterization d efines the log-linear parameters w ith in the margins asso- ciated with the disconnected sets of the graph defin in g the mo del. S p ecifically , giv en a dis- crete graph mo del with a graph G , w e arbitrarily order the d isconnected s ets of the graph to yield a non-decreasing sequence ( D 1 , . . . , D s ) such that D k 6⊇ D k +1 for k = 1 , . . . , s − 1. Then, the disc onne cte d set p ar ameterization of the discrete bi-directed graph mod el as- so ciated with G , is the hierarc hical and complete marginal log -linear parameterization λ = ( λ M j L ) generated, follo wing Definition 3.1, b y the sequen ce of margins (3) M G = ( D 1 , . . . , D s ) if D s = V ( D 1 , . . . , D s , V ) otherwise. This parameterization conta ins by defin ition the log-linear parameters λ D D = η D for every disconnected set D a nd thus can define the ind ep endence mo d el b y the same constrain ts of Pr op osition 3. Prop osition 6. Given a bi- dir e cte d gr aph G = ( V , E ) , the discr ete bi - dir e cte d gr aph mo del asso ciate d with G is define d by the set of strictly p ositive discr ete pr ob ability distributions with a disc onne cte d set p ar ameterization ( λ M j L ) , su ch that λ M j M j = 0 for every M j ∈ D , BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 13 T able 4. Comp arison of thr e e p ar ameterizations for the bi-dir e cte d gr aph mo del G of Figur e 1(a). O ne-factor lo g- line ar p ar ameters ar e omitte d. The c olumns of p ar ameters to b e c onstr aine d to zer o have a b oldfac e d lab el. T erms 12 13 1 4 23 24 34 123 124 134 234 1234 η η 12 η 13 η 14 η 23 η 24 η 34 η 123 η 124 η 134 η 234 η 1234 M G λ 124 12 λ 13 13 λ 14 14 λ 1234 23 λ 24 24 λ 134 34 λ 1234 123 λ 124 124 λ 134 134 λ 1234 234 λ 1234 1234 M ′ G λ 124 12 λ 134 13 λ 14 14 λ 1234 23 λ 124 24 λ 134 34 λ 1234 123 λ 124 124 λ 134 134 λ 1234 234 λ 1234 1234 wher e D is the class of al l disc onne cte d sets for G . Mor e over, the c onstr aints ar e indep en- dent of the or dering chosen to define M G . Pr o of. The d isconn ected set parameterization defined b y the sequence (3), cont ains the parameters λ D L , with D ∈ D . By Definition 3.1, L j , j = 1 , . . . , s alw ays con tains the set D itself. This happ ens whatev er ordering is us ed to define M G . Thus the p arameterization alw ays includ es λ D D = η D , for ev ery D ∈ D and it is p ossible to imp ose the c onstrain ts η D = 0 for every D ∈ D and the result follo ws b y Prop osition 3. While the constrained parameters defi ning the bi-directed graph mo d el are actually the same as th e multiv ariate logistic parameterizatio n, the other u nconstrained log-linear parameters are defin ed in larger m arginal tables, and thus ha ve a different interpretatio n. An imp ortant difference is that the disconnected s et parameterizatio n is tied to the sp ecific graph G definin g the mo del. Th is implies that it is not p ossible to define ev ery bi-directed graph mod el within the same d isconnected set parameteriza tion. A differen t model G implies a different sequence M G of disconnected sets and thus a differen t list of log-linear parameters. Example 6. F or th e c hordless 4- c hain graph of Figure 1(a), there are sev eral p ossible orderings of t he 5 disconn ected sets D = { 1 3 , 14 , 2 4 , 134 , 124 } . The discrete bi-directe d graph mo d el is d efined by c ho osing for example M G = (13 , 14 , 24 , 13 4 , 124 , 1234) , and b y constraining the marginal log-linear parameters λ D D = 0 for D ∈ D . The uncon- strained parameters differ from the multiv ariate logistic ones. F or example the t w o-fact or log-linear parameters b etw een X 1 and X 2 , λ 124 12 , are defined within the marginal table 124 instead o f th e marginal table 12. A detailed co mparison b etw een the parameters is rep orted in the fi r st tw o ro ws of th e T able 3. The pr evious example shows that w e ca n collect the log-linear parameters in to a reduced n um b er of marginal ta bles. An alternativ e selection of marginal tables c ould b e c hosen in order to fulfill the conditional indep end encies imp lied by the glo bal Mark o v p rop erty . W e will describ e th e metho d in the sp ecial case of the c hordless 4-c hain graph. It is conjectured that a general v ariation indep endent parameterization do es not exists for all bi-directed graphs, but the defi nition of a sub-class admitting suc h a parameterization is still an op en pr oblem. 14 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Example 7. In Example 1 we s tated that, f or th e bi-directed 4-c h ain graph of Fig- ure 1(a), the global Marko v prop erty implies the cond itional indep end encies 1 ⊥ ⊥ 4, 2 ⊥ ⊥ 4 | 1 and 1 ⊥ ⊥ 3 | 4. Thus, the relev ant margins can b e collecte d in the sequence M ′ G = (14 , 13 4 , 124 , 1234) where the first three allo w th e defi nition of the conditional indep endencies and the last one serv es as completion of the parameterization. T he complete hierarchical parameterization generated b y M ′ G is sligh tly d ifferen t from that generated by M G , see T able 4, third ro w , b ut with the 5 zero constrain ts on the h igher lev el log-linear parameters within eac h margin, we obtain the required indep endencies 1 ⊥ ⊥ 4 ⇐ ⇒ λ 14 14 = 0 2 ⊥ ⊥ 4 | 1 ⇐ ⇒ λ 124 24 = 0 λ 124 124 = 0 1 ⊥ ⊥ 3 | 4 ⇐ ⇒ λ 134 13 = 0 λ 134 134 = 0 . Note th at these indep endencies can also b e represen ted by a c hain graph with tw o com- p onents, { 1 , 4 } and { 2 , 3 } , under the alternativ e Ma rk o v prop ert y , (see Ander s son et al. , 2001) . The associated discrete mo del is in terpreted as a s y s tem of seemingly unrelated regressions, with t wo joint resp onses X 2 and X 3 . In this con text the asso ciations of in- terest are the effe ct parameters b et w een every resp onse and ea c h explanatory v ariable conditional on the remaining explanatory v ariable, i.e. λ 124 12 , λ 124 24 , λ 134 13 and λ 134 34 , and the marginal asso ciation p arameters b et ween the explanatory v ariables, λ 14 14 . By relaxing the constr aint λ 14 14 = 0 w e obtain a discrete chain graph mod el w ith t w o complete c hain comp onent s, u nder th e alternativ e Mark o v prop erty . In the comparison b et we en different parameteriza tions also the prop ert y o f v ariation indep end ence m ay b e relev ant. F ollo w ing Bergsma & Rudas (2002), giv en a discrete bi- directed graph mo del, th er e is a v ariation indep end en t parameterization if there is at least a sequence M G whic h is ord ered decomp osable. Th is prop ert y is qu ite relev an t b ecause the lac k of v ariation indep endence m a y mak e the separate inte rpretation of the parameters misleading. Example 8. In the p revious example b oth the p arameterizati ons b ased on M G and M ′ G are v ariation indep endent (unlike the multiv ariate logistic parameterization) b ecause the sequences of margins are b oth ord ered decomp osable. Consid er instead the bi-directed graph in Figur e 2(a). Tw o p ossib le disconnected set parameterizations of the discrete mo del ma y b e based for example on M G = (13 , 1 4 , 25 , 35 , 134 , 135 , 235 , 12345) , M ′ G = (13 , 3 5 , 135 , 1 4 , 25 , 134 , 235 , 12345) . with the constrain ts λ D D = 0 for any disconn ected set D . In this case w e can v erify th at only the sequence M ′ G is ord ered d ecomp osable and th us implies v ariation indep endent parameters. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 15 5. Maximum likelihood est ima tion of discrete bi-directed graph mod els W e study no w the maxim um lik eliho od estimation of the discrete bi-directed graph mo dels u nder any of the parameterizatio ns pr eviously discuss ed . Assuming a m u ltinomial sampling sc heme with s ample size N , eac h in dividual falls in a cell i of the giv en con tin- gency table I V with probabilit y p ( i ) > 0. Let n ( i ) b e the cell count and n = ( n ( i ) , i ∈ I V ), b e a t × 1 v ector. Thus, n has a m u ltinomial distribution with parameters N and π . If µ = N π > 0 is the exp ected v alue of n and ω = log µ , then f or an y app ropriate marginal log-linear parameteriza tion λ w e ha ve λ = C log( T π ) = C log( T exp( ω )) b ecause the con trasts of marginal p robabilities are equal to th e contrast s of exp ected count s. Giv en a discrete bi-directed g raph mo del d efined by the graph G = ( V , E ), if λ is d efined either b y the m u ltiv ariate logistic parameterization or b y the disconnected set p arameterization, w e can al w a ys split λ in t w o comp onent s λ D and λ C indexed b y the disconnected sets D and by the c onnected sets C of the graph, r esp ectiv ely . If C D is a sub-matrix of the con trast m atrix C , obtained by selecting th e ro ws asso ciated with the disconnected sets of the graph G , λ D = C D log( T exp( ω )) = h ( ω ) where C D has dimensions q × v with q = P D ∈D Q v ∈ D ( b v − 1). Th us, the kernel of the log-lik eliho o d function of the discrete bi-directed graph mo del is defined by (4) l ( ω ; n ) = n T ω − 1 T exp( ω ) , ω ∈ Ω B G , with Ω B G = { ω ∈ R t : h ( ω ) = 0 , 1 T exp( ω ) = N } . Note th at (4) defines a curv ed exp onen tial family mod el as the set Ω B G is a smo oth manifold in the sp ace R t of the canonical parameters µ . Maxim um like liho o d estimation is a constrained optimization problem and the m aximum lik eliho o d estimate is a saddle p oint of the Lagrangian log-lik eliho o d ℓ ( ω , τ ) = n T ω − 1 T exp( ω ) + τ T h ( ω ) where τ is a q × 1 v ector of unknown Lagrange multiplie rs. T o solv e the equations we prop ose an iterativ e procedu re inspired b y Aitc h ison & Silvey (19 58), Lang (1 996) and Bergsma (199 7). Define first ξ = ω τ ! , f ( ξ ) = ∂ ℓ ∂ ξ = f ω f τ ! F ( ξ ) = − E ∂ 2 ℓ ∂ ξ ∂ ξ T = F ω ω F ω τ · F τ τ ! , where the dot is a shortcut to denote a symmetric sub-matrix. Differen tiating the La- grangian with r esp ect to ω and τ and equating the result to zero we obtain (5) f ω f τ ! = e + H τ h ( ω ) ! = 0 where e = ∂ l /∂ ω = n − µ , H = ∂ h /∂ ω T = D µ T T D − 1 T µ C T D and D T µ and D µ are diagonal matrices, w ith n onzero elemen ts T µ and µ , r esp ectiv ely . Let ˆ ω b e a lo cal maxim u m of the lik elihoo d sub ject to the constrain t h ( ω ) = 0 . A classical resu lt (Bertsek as, 1982 ) is that if H is of full column rank at ˆ ω , there is a uniqu e ˆ τ suc h that ℓ ( ˆ ω , ˆ τ ) = 0 . In the sequel, it is assum ed th at the maxim um lik elihoo d estimate 16 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A ˆ ω is a solution to the equation (5). Note that the constrain t 1 T µ = 1 T n is automatically satisfied as it can b e v erified that H T 1 = 0 and thus from (5) it f ollo ws that 1 T e = 0 . Aitc hison and S ilv ey pr op ose a Fisher score lik e up dating fun ction (6) ξ ( k +1) = u ( ξ ( k ) ) , with u ( ξ ) = ξ + F − 1 ( ξ ) f ( ξ ) , yielding the estimate ξ ( k +1) at cycle k + 1 from that at cycle k . As the a lgorithm does not alwa ys con verge w h en starting estimates are not close enough to ˆ ω , it is n ecessary to in tro duce a step size in to the up dating equ ation. The standard approac h to choosing a step size in optimization p roblems is to use a v alue for which the ob jectiv e fun ction to b e maximized increases. Ho wev er, since in in this case w e are lo oking for a saddle p oint of the Lagrangian lik elihoo d ℓ , w e need to adjus t th e standard strategy . First, the matrix F has a sp ecia l stru cture with F ω ω = D µ , F ω τ = − H and F τ τ = 0 . T h us, indicating the sub-matrices of F − 1 b y su p erscrip ts, we hav e F τ ω F ω τ = I and F ω ω F ω τ = 0 . Thus the up d ating function u ( ξ ) of (6) can b e rewritten as follo ws u ω ( ω ) = ω + F ω ω e + F ω τ h ( ω ) , u τ ( ω ) = F τ ω e + F τ τ h ( ω ) , neither of w hic h is a function of τ . As the up dating of the Lagrange multipliers do es non dep end on the estimation for τ at pr evious step, the algorithm essen tially searches in the space of ω . Hence, inserting a s tep size is only r equired f or up d ating ω and we prop ose, follo wing Bergsma (1997) to u se the follo wing b asic up dating equati ons w ith an add ed step s ize, 0 < step ( k ) ≤ 1: ω ( k +1) = ω ( k ) + step ( k ) { F ω ω ( k ) e ( k ) + F ω τ ( k ) h ( ω ( k ) ) } , where e ( k ) = n − ˆ µ ( k ) and wh ere F ω ω ( k ) and F ω τ ( k ) are tw o sectio ns of ˆ F − 1 at cyc le k . W e c hose the step size b y a simp le step h alving criterion, b ut more sophisticated step size rules co uld also b e considered. A discussion on the c hoice of the step size ma y b e found in Bergsma (1997). Note that the algorithm’s up d ates tak e place in th e rectangular space R t of ω rather than the not n ecessarily rectangular space Λ of the marginal log- linear parameters wh ic h ma y not b e v ariation indep endent. The algorithm con verges if it is started f r om suitable initial estimat es of ω and τ . While usually a zero v ector is a g o o d c hoice for τ , w e found emp irically th at the n umb er o f iterations to con v ergence can b e reduced s ubstant ially b y u sing as a starting v alue for ω an approximat e maxim um lik eliho o d estimate based on results b y Co x & W erm uth (199 0) and Ro d dam (2004). At con vergence , w e obtain the m aximum lik eliho o d estimates ˆ µ = exp( ˆ ω ) and ˆ π = N − 1 ˆ µ and the asymp totic co v ariance matrices co v ( ˆ ω ) = ˆ F ω ω , co v ( ˆ λ ) = H sat ˆ F ω ω H T sat , with H sat = D ˆ µ T T D − 1 T ˆ µ C T . 6. Anal ysis of some examp les The examples of this s ection illustrate b oth the p arameterizatio ns and the fi tting of marginal ind ep endence mo dels. It is r are that a pure marginal indep enden ce mod el is useful in isola tion and th u s usually it is interpreted in combinatio n with other graphical mo dels. Ho we v er, th e problem of simultaneous testing of multiple m arginal in dep end en cies in a ge neral co n tingency table is often p resen t in applica tions and it can b e carried o ut BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 17 T able 5. Par ameters estimates of the 4-chain mo del f or the data on symp- toms of psychiatric p atients under the multivariate lo gistic and the disc on- ne cte d set p ar ameterizations. The fit is χ 2 5 = 8 . 61 . Columns (1) and (2) ar e studentize d estimates. Multiv a riate logistic param. Disconnected set param. Margin ˆ η (1) Margin In tera ction ˆ λ (2) 1 − 0 . 28 − 2 . 62 13 1 − 0 . 28 − 2 . 62 2 − 0 . 13 − 1 . 23 3 0 . 21 1 . 95 3 0 . 21 1 . 95 13 0 . 00 4 0 . 24 2 . 31 14 4 0 . 24 2 . 31 12 − 0 . 7 2 − 3 . 47 14 0 . 00 13 0 . 00 24 2 − 0 . 1 3 − 1 . 23 14 0 . 00 24 0 . 00 23 − 1 . 1 2 − 5 . 32 124 12 − 0 . 72 − 3 . 47 24 0 . 00 124 0 . 00 34 0 . 79 3 . 80 134 34 0 . 79 3 . 80 123 0 . 16 0 . 36 13 4 0 . 00 124 0 . 00 123 4 23 − 0 . 78 − 1 . 80 134 0 . 00 123 0 . 14 0 . 20 234 − 0 . 90 − 2 . 03 234 − 1 . 02 − 1 . 63 1234 0 . 15 0 . 16 1234 0 . 15 0 . 16 with the technique discu s sed in this pap er. All the compu tations were programmed in the R language (R Dev elopmen t Core T eam, 2007). Example 9. T h e 4-c hain marginal indep endence mo d el w as fi tted to the data on symp- toms of psychiat ric patien ts of T able 1 with the algo rithm of Sectio n 5. After 22 iterations, the algorithm leads to a c hi-squared goo d ness of fit of 8.61 on 5 degrees of freedom. By comparison, the b est graphical log -linear mo d el h as generators [12][ 234] with a deviance of 8 . 4 on 6 degrees of freedom. Thus, b oth mo dels pro vide adequate int eresting in ter- pretations of the data. T able 5 summarizes the estimat es of the 4-c h ain graph mo del, sho wing the parameter estimates and the student ized estimates under the m ultiv ariate logistic and the disconnected set parameterizations. In the m ultiv ariate logistic parame- terizatio n the tw o-factor parameters hav e the simple inte rpretation of marginal association co efficien ts. It must b e k ept in m ind that they measur e just the strength of marginal as- so ciation b et ween pairs of adjacen t v ariables in the graph, but that the mo del includes higher order log-linear parameters whic h are not visible fr om the graph. F or instance, b oth ˆ η 23 = − 1 . 12 and ˆ η 234 = − 0 . 90 are measures of asso ciation for v ariables X 2 and X 3 . I n general, for any connected subgraph , all h igher order log-linear parameters are exp ected. As explained in Section 4, the in terp retation of the parameters necessarily d ep ends on the c h osen p arameterizati on. F or instance, ˆ η 23 = − 1 . 12 and λ 1234 23 = − 0 . 78 are a marginal asso ciation measur e and a conditional asso ciation mea sure resp ectiv ely . The four-facto r log-linear parameter is not significan t, and a simpler r ed uced mo d el with the additional 18 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A T able 6. Data fr om U.S. Gener al So cial Survey. F 1 2 3 S C G A J 1 2 3 1 2 3 1 2 3 m 1 1 1 410 241 80 691 556 1 87 192 14 8 84 2 71 31 9 109 64 3 4 27 26 15 2 1 181 128 42 307 284 82 84 93 41 2 41 17 5 61 35 20 18 13 5 2 1 1 96 77 29 163 151 7 6 5 8 55 27 2 34 18 7 58 36 15 17 13 6 2 1 29 37 4 55 54 31 22 26 17 2 16 6 6 1 6 1 6 7 10 7 2 f 1 1 1 552 353 1 45 8 99 793 26 5 180 162 94 2 98 60 15 186 122 47 40 23 14 2 1 133 74 33 219 1 64 66 3 6 47 24 2 25 15 1 54 40 13 14 6 4 2 1 1 228 153 60 356 343 166 95 80 41 2 75 45 12 125 116 34 25 20 12 2 1 41 25 13 64 56 22 15 14 11 2 17 6 1 1 9 1 8 6 3 3 2 zero constrain t on this p arameter, has an adequate chi-squared go o d ness of fit of 8 . 63 on 6 degrees of freedom. The follo wing example concerns a larger con tingency table including t w o ordinal v ari- ables with thr ee lev els. In the analysis these v ariables are treated as nominal v ariables using the b aseline con trasts (2). Although the nature of the v ariables could b e han d led b y usin g other more approp r iate con trasts, as explained in Bartolucci et al. (2007), th e fi t of the marginal ind ep endence m o del is nevertheless inv arian t. Example 10. T able 6 summ arizes observ ations for 13067 ind ividuals on 6 v ariables ob- tained from as man y questions tak en from the U.S. General So cial Sur v ey (Da vis et al. , 2007) dur ing the yea rs 19 72-20 06. The v ariables are rep orted b elo w with the original name in the GSS Co d eb o ok: C cappu n : do you fav or or opp ose d eath p enalt y for p ersons con victed of murder? (1=fa vor, 2=opp ose) F con finan : confidence in banks and financial institutions (1= a grea t deal, 2= only some, 3= hard ly any) G gunla w : would y ou fav or or opp ose a la w whic h would require a p er s on to obtain a p olice p ermit b efore he or sh e could buy a gun? (1=fa v or, 2=opp ose) J sa t job : h o w satisfied are y ou with the w ork y ou do? (1 = v ery satisfied, 2= mo derately satisfied, 3 = a little dissatisfied, 4= v ery d issatisfied). Categories 3 and 4 of sa tjob were mer ged together. S sex : Gender (f,m) A abrape : do y ou think it should b e p ossible for a pregnant w oman to obtain legal ab ortion if she b ecame p regnan t as a result of rap e? (1= y es, 2 = no) BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 19 In d ata s ets of this kind there are a large n u m b er of miss ing v alues and the table used in this example coll ects only individuals with complete observ ations. T herefore, the follo wing exploratory analysis is in tend ed to b e only an illustration with a realistic example. F rom a fir st analysis of the data, the follo wing m arginal in dep end encies are not rejected b y the c h i-squ ared goo d ness of fit test statistic F ⊥ ⊥ C A G ⊥ ⊥ J A J ⊥ ⊥ GS A ⊥ ⊥ F G χ 2 6 = 6 . 7 χ 2 5 = 3 . 3 χ 2 6 = 8 . 1 χ 2 5 = 2 . 1 and th us they suggest the indep endence mo d el represen ted by th e bi-directed graph in Figure 3(a). Fitting this mo del, u n der th e m ultinomial sampling assumption, we obtain an adequate fit with a deviance of 17 . 29 on 17 degree s of freedom. The Aitc hison and Silv ey’s algorithm con v erges after 13 it erations. The enco ded indep endencies cannot b e represent ed b y a directed acyclic graph mo d el with the same observed v ariables, b ecause the graph con tains at least one su b graph whic h is a c h ordless 4-c hain. The d isconnected set parameterizatio n defin ed by the ordered decomp osable sequence M G = { C F , F A, GJ, GA, J S, C F A, F GA, GJ S, GJ A, C F GJ S A } is v ariation in dep end en t. Instead, by searc hing in the class of graphical log-linea r mo d els with the bac kwa rd stepwise s election p ro cedure of mim (Edwa rds, 2000) w e found a m o del with a d eviance of 10 3.16 o ve r 110 degrees of f reedom. The mo del graph is s ho w n in Figure 3(b). Other selec tion p r o cedures sho w how ev er that there are seve ral equ ally w ell fitting mod els. The chosen undirected graph is sligh tly simpler (2 edge less) than th e bi-directed graph. As antici pated, the n um b er of constrain ts on p arameters is ho wev er m uc h higher. F rom the insp ection of the studen tized m ultiv ariate logistic estimates, w e noticed that the higher order log-linear p arameters are almost all n ot significant and th u s w e fitted a reduced mo del, b y fur ther restricting to zero all the log-linear parameters of order higher than t wo, obtai ning a deviance of 108 . 34 on 118 degrees of freedom. The estimates of the remaining n onzero tw o-factor log-linear parameters are sho w n in T able 7. These are estima ted lo cal log o dds -ratios in the selected t w o-wa y m arginal tables and they ha ve the exp ected sig ns. By comparison, the fitted non-graphical log-linear mo del with the graph of Figure 3(b), with add itional zero constraints on the log-linear parameters of order h igher than t w o, leads to a c hi-squared go o dness of fi t of 118 . 49 on 119 degrees of freedom. Both mo dels thus app ear adequate. G S F C J A G S F C J A (a) (b) Figure 3. Data fr om the U.S. Gener al So cial Survey 1972-200 6. (a) A bi-dir e cte d gr aph mo del ( χ 2 17 = 17 . 2 9 ). (b) A gr aphic al lo g-line ar mo del ( χ 2 110 = 103 . 1 6 ). 20 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A T able 7. Estimates of two-factor lo g-line ar p ar ameters for the bi- dir e cte d gr aph mo del of Fig u r e 3(a) with additional zer o r estrictions on higher or der terms. The asterisks indic ate the p ar ameters for which the Wald statistic is signific ant. Margin Parameter Estimate s.e. Margin Parameter Estimate s.e. C G (1) − 0 . 38 0 . 048 * F G (1 ) − 0 . 01 0 . 047 C J (1) 0 . 10 0 . 0 43 * (2) 0 . 16 0 . 058 * (2) 0 . 14 0 . 0 58 * F J (1) 0 . 29 0 . 044 * C S (1) 0 . 46 0 . 0 40 * (2) 0 . 05 0 . 065 C A (1) 0 . 56 0 . 049 * (3) 0 . 04 0 . 056 GS (1) − 0 . 77 0 . 0 42 * (4) 0 . 36 0 . 072 * J A (1) − 0 . 21 0 . 051 * F S (1) − 0 . 004 0 . 040 (2) − 0 . 03 0 . 075 (2) − 0 . 35 0 . 051 * S A (1) 0 . 18 0 . 047 * The last exa mple sh o ws that sometimes the b est fitted marginal indep endence mo del ma y b e simpler than the b est fitted d irected acyclic mo del. Example 11. The set of data in T able 8 is take n from the General So cial Su r v ey in German y in 1998 (ALLBUS, 1998 ). In a s elected p opulation age d b et wee n 18 and 65, the answers of 1228 resp ondents are collected ab ou t the follo wing 5 binary v ariables U , unconcerned ab out environmen t (y es, no); P , no own p olitic al imp act exp ected (y es, n o), E ; paren ts education, b oth at low er lev el (at most 10 y ears) (y es, no); A , ag e u nder 40 y ears(yes, no); S , gender (female, male) . A p ossible ordering of the v ariables has b een suggested by W ermuth (2003), who analyzed a sup erset of this data set and discussed a directed acyclic graph mo del. Using a similar ordering, limited to the v ariables h ere studied, we consider the v ariables { A, S } as purely explanatory , E and P as in termed iate and U as final resp onse. Our fin al we ll fitting directed acyclic graph m o del, sho wn in Fig- ure 4(a), has a d eviance 3 . 70 ov er 3 degrees of freedom. Th e sub graph for all the v ariables except gender S is complete. Sp ecifical ly , the graph has an edge E → U , indicating a d i- rect effect of education on the fi nal resp onse. The mo del without the arro w E → U has a w orse go o dn ess of fit χ 2 15 = 36 . 0 and further it can b e v erified that the t wo- factor log-l inear parameters E P and E A are large and significant. Model s election in the class of the graph- ical log-linear mo d els do es not lead to any sens ible red uction wh ilst search in the class of bi-directed graph mo dels sho ws that a sp ecial structur e of marginal ind ep endencies h olds . T able 8. Data fr om the German Gener al So cial Survey in 1998. U yes no S f m f m A E P y es no yes no yes no yes no no yes 6 8 7 27 66 186 24 230 no 4 0 1 9 8 6 4 4 60 yes yes 2 2 11 6 28 159 16 130 no 0 1 0 2 4 7 5 8 80 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 21 S U P A E S U P A E (a) (b) Figure 4. Two gr aphic al mo dels fitte d to data fr om the Gener al So cial Survey in Germany, 1998. (a) A dir e cte d acyclic gr aph mo del: χ 2 3 = 3 . 70 . (b) A bi-dir e cte d gr aph mo del: χ 2 5 = 5 . 91 . The final selecte d bi-directe d graph, represen ted in Figure 4(b), represen ts the marginal indep end encies S ⊥ ⊥ A, E and E ⊥ ⊥ S, U . The bi-directed graph co n tains the c hord less 4- c h ain E AU S and th us it is not Mark o v equiv alen t to an y directed acyclic graph in the five v ariables. T his suggests th at the d irected acyclic graph mo d el conceals some distortions due to the presence of laten t v ariables. Also in this case, the d isconnected set parame- terizatio n defined b y th e sequence M G = ( GE , GF , AE , GF E , GE A, AB E F G ) leads to a v ariation indep en d en t parameterization b ecause it can b e verified that the sequence M G is ord er d ecomp osable. 7. Discussion The d iscrete mo d els based on marginal log-linear mod els by Bergsma & Rud as (2002 ) form a large class that includ es sev eral discrete graphical m o dels. The undirected graph mo dels and the c hain graph mo dels under the classical (Lauritzen, W ermuth, F rydenberg) in terpretation can b e parameterize d as marginal log-linear models. F or an in tro d uction see Rud as et al. (2006). Th is pap er sho ws that th e discrete bi-directed graph mo dels u nder the global Marko v prop ert y are included in the same class by sp ecifying the constraints appropriately . In general, three main criteria w ere considered in c h o osing a marginal log-linear parameterizati on. (a) Up ward compatibilit y: if th e parameters ha ve a meaning that is in v ariant across differen t marginal d istributions, then the in terpretations remain the same when a sub-mo d el is c hosen. W e sa w that the multiv ariate logistic parameteriza tion has this prop ert y . (b) Modelling considerations: the parameterization should conta in all the parameters that are of inte rest for the p roblem at h and. F or example, in a regression con text where some v ariables are prior to others, effect parameters conditional on th e explanatory v ariables are most meaningful. In the seemingly unr elated r egression problem of Example 7, the chosen parameters ha v e th e interpretatio n of logistic regression co efficien ts. (c) V ariation indep end ence: if the p arameter space is the whole Euclidean space, this has certain adv antag es. First, the in terpretations are simp ler, b ecause in a certain sense d ifferen t p arameters measur e different thin gs. Second, in a Ba ye sian con text, prior s p ecification is easier. Finally , the problem of out-of-b ound estimates when transforming the parameters to prob ab ilities is a vo ided. In the examples, we alw ays 22 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A found a v ariation in dep end en t parameteriza tion, but a c h aracterizat ion of the class of bi-directed graphs admitting a v ariation indep enden t complete and hierarc h ical marginal log-li near parameteriza tion is an op en problem. The three c riteria are in some cases conflicting: typica lly v ariation indep en dence is ob- tained at th e exp ense of upw ard compatibilit y . The m u ltiv ariate logistic p arameterization has a p urp ose s im ilar to that of the M¨ obiu s parameterizatio n recen tly prop osed by Drto n & Ric hardson (20 07) for binary marginal indep end ence mo dels, w hic h is based on a min imal set of marginal p robabilities ident ify- ing the join t distribution. These authors discuss the t yp e of constraints o n the M¨ obius parameters needed to sp ecify a marginal indep endence, sho win g that th ey tak e a simple m ultiplicativ e form. T h e same constraints are defined by zero restrictions on marginal log-linear parameters in our approac h. Eve n if the parametric space can b e awkw ard, this problem is hand led b y a fitting algorithm that op erates in the space of the exp ected frequencies, while the parameters are u sed only to defin e the ind ep endence constrain ts. Moreo v er, the definition of th e mo dels th rough the complete s p ecification of the marginal log-linear parameters giv es some adv an tage when there is a mixture of n ominal and ordinal v ariables b ecause it allo w s to define appropriate parameters for both t yp es of v ariables using the t heory of generalized marginal in teractions by (Bartolucci e t al. , 2 007). This op ens the wa y to d efi ning sub classes of discrete graphical mo dels sp ecifying equalit y and inequalit y constraints. The pr op osed algorithm for maxim um like liho o d fitting of the bi-directed graph mo del is a v ery general algorithm of constrained optimizatio n b ased on Lagrange multipliers. It is essen tially b ased on Ait c h ison & Silve y (1958) as later dev elop ed by Bergsma (1997). Similar algorithms hav e b een pr op osed, for instance, b y Mole n b erghs & Lesa ffre (1994), Glonek & McCullagh (1995), Lang (1996) and fur ther generalized by C olom bi & F orcina (2001 ). Its m ain adv ant age is its generalit y (it can b e app lied to all mo dels defin ed by constrain ts on the marginal log-linear parameters). As previously stat ed, the al gorithm do es not requ ire further iterativ e pro cedures for computing, at eac h step, the in verse transformation from the marginal log-linear parameters to the cell probabilities. Thus, the risk of not compatible estimates that could arise for th e lac k of v ariation indep endence is av oided. The disad v antage is that, as f or man y gradien t-based algorithms of this t yp e, con vergence is not guaranteed and that it requires th e co mputation of a large exp ected information matrix. Ho we v er, empirically , con v ergence is achiev ed in a relativ e few num b er of iterations by includ ing a step adjustment. An alternativ e algo rithm with con ve rgence guaran tees is the Iterated Conditional Fitting algorithm, prop osed b y Drton & Ric hardson (2007 ) for binary bi-directed graph mo dels in the M¨ obius parameterizatio n. A comparison b et w een the t wo alg orithms in terms of p erf orm ance, sp eed and memory requirements needs further in vestig ation. A c knowledgement W e thank Nann y W ermuth for helpful discussions. The wo rk of the fi rst tw o authors w as partially su pp orted by MIUR, Rome, un der the pro ject PRIN 2005 1323 07. BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A 23 References Aitc hison, J. & Silvey , S. D. (1958). Maximum lik eliho od estimatio n of para meter sub ject to restraints. Annals of Mathematic al Statistics 29 , 813–8 28. ALLBUS (1998). Co debo ok ZA-Nr. 3755 . German So cial Science Infrastructure Service s . Andersson, S., Madigan, D. & Perlman, M. (2001 ). Alterna tive Ma rko v prop erties for chain graphs. Sc andinavian Journal of Statistics 28 , 33–8 5. Bartolucci, F., Colombi, R. & F o r cina, A. (2007). An ex tended class of marginal link functions for mo delling contingency ta ble s b y e q uality and inequality c o nstraints. Statistic a Sinic a 1 7 , 691– 711. Bergsma, W. P . (19 97). Mar ginal m o dels for c ate goric al data . Ph.d thesis, Tilbur g. Bergsma, W. P . & Rudas, T. (2002). Ma rginal log-linear mo dels for categ orical da ta . Annals of Statistics 30 , 1 40 – 1 59. Bertsek as, D. P . (1982 ). Constr aine d op timization and Lagr ange multiplier metho ds . Acade mic Press, New Y o r k. Colombi, R. & F orcina, A. (2001 ). Margina l regressio n mo dels for the analysis of p ositive asso cia - tion of or dinal resp ons e v ar iab els. Bio metrika 88 , 1007 – 1019 . Copp en, A. (1966). The Mark-Nyman tempera ment scale: an English translation. Brit. J . Me d. Psychol. 33 , 5 5 –59. Cox, D. R. & W ermuth, N. (199 3). Linea r dependencies r epresented by chain graphs (with discus- sion). Statistic al Scienc e 8 , 204– 218, 247–2 77. Cox, R. D. & W er mut h, N. (1990). An approximation to ma ximum likeliho o d estimates in reduced mo dels. B iometrika 77 , 747 – 7 61. Davis, J., Smith, T. & Mars den, J. A. (2007). Gener al So cial Surveys Cumulative Co deb o ok: 1972-20 06 . NORC: Chicago. Drton, M. & Ric hardson, T. S. (2 0 07). Bina ry mo dels for mar ginal indepe ndence . Journal of the R oyal S tatistic al So ciety, Ser. B , forthcoming. Edwards, D. (2000). Intr o duction to gr aphic al mo del ling . Springer V erlag , New Y ork, (2nd ed.) edn. Glonek, G. J. N. & McCullag h, P . (1995). Multiv aria te logistic mo dels . Journal of the R oyal Statistic al So ciety, Ser. B 57 , 533– 546. Kauermann, G. (1996). O n a dualiza tion of gra phical Ga ussian mo dels. Sc andinavian Journal of Statistics 23 , 1 05–1 16. Kauermann, G. (19 9 7). A note on multiv ar iate logistic mo dels for c ontingency tables. Austr alia n Journal of Statistics 39 , 261–2 76. Lang, J . B. (1996). Maximum lik eliho o d metho ds for a g eneralized class of log-linear models . Annals of Statistics 24 , 726 – 752. Lauritzen, S. L. (199 6). Gra phic al mo dels . Ox ford University Press, Oxford. Lienert, G. A. (1970 ). K onfigurations frequenzalyse einiger lysergs a urediathylamid-wirkungen. Arzneimittel lorschung 20 , 912–9 13. Molenberghs, G. & Lesa ffr e, E. (1994). Mar ginal mo delling of multiv ariate catego r ical data. Journ al of the Ameri c an Statistic al Asso ciation 89 , 6 33–6 4 4. Pearl, J. & W ermuth , N. (1994). When can asso ciation graphs admit a ca usal in terpr etation? In P . Chee s man & W. O ldford, eds., Mo dels and data, artific al intel ligenc e and statistics iv . Springer, New Y or k, pp. 20 5–214 . R Developmen t Core T eam (200 7). R: A language and en vir onment for statistic al c omput ing . R F oundation for Statistica l Computing, Vienna, Austria . ISB N 3 - 90005 1-07 - 0. 24 BI-DIRECTED G RAPH MODELS FOR CA TEGORICAL D A T A Richardson, T. S. (20 03). Mar ko v pr op erty for acy c lic dir ected mixed g r aphs. S c andinavia n Journal of Statistics 30 , 145– 1 57. Richardson, T. S. & Spir tes, P . (20 02). Ancestral graph Ma r ko v mo dels. Annals of Statistics 30 , 962–1 03. Ro ddam, A. W. (2004). An approximate maximum likeliho o d pro cedure for parameter estimation in multiv aria te discrete data r egressio n models . J. of Applie d Statistics 28 , 273 – 279 . Rudas, T. & Bergsma , W. P . (2004). On applications of marginal mode ls for catego r ical da ta. Metr on LXI I , 1– 25. Rudas, T., Berg s ma, W. P . & N ´ emeth, R. (2006). Parameterization and estimation o f path models for categorical data. In A. Rizzi & M. Vichi, eds., Compstat 2006 Pr o c e e dings in Computational Statistics . P hysica-V er lag, Heidelb erg, pp. 383–3 9 4. W ermuth , N. (1998). Pairwise indepe ndence. In P . Armitage & T. Colton, eds., Encyclop e dia of biostatistics . Wiley , New Y ork , pp. 324 4–32 4. W ermuth , N. (200 3 ). Analysing so cia l science data with graphica l Ma rko v mode ls . In P . Gr een, N. Hjort & T. S. Richardson, eds., Highly stru ct ur e d sto chastic systems . Oxfo r d University P r ess, pp. 47–52 . W ermuth , N. & Co x, D. R. (199 2). On the relation betw een interactions obtained with alter native co dings of discr ete v ar iables. Metho dika VI , 76– 8 5. W ermuth , N., Cox, D. R. & Marchetti, G. M. (2006). Co v ariance chains. Bernoul li 12 , 841– 862. Whittaker, J. (19 90). Gr aphic al mo dels in applie d mu ltivariate stat ist ics . John Wiley .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment