Profile Graphical Models

Pr oﬁle Graphical Models Alejandra A v alos-P acheco 1,2,* , Monia Lupparelli 3 , and Francesco C. Stingo 3 1 Institute of Applied S tatistics, Johannes Kepler Univ ersity Linz 2 Har var d-MIT Center for Regulatory Science, Har var d Medical School 3 Depart. of Statistics, Computer Science, Applications “G. P ar enti”, Univ ersity of Florence * alejandra.av alos_pacheco@jku.at October 2024 Abstract W e introduce a no vel class of graphical models, ter med proﬁle graphical models, that represent, within a single g raph, how an e xter nal factor inﬂuences the dependence structure of a multivariate set of variables. This class is quite general and includes multiple g raphs and chain g raphs as special cases. Proﬁle graphical models capture the conditional dis tr ibutions of a multiv ar iate random v ector giv en diﬀerent lev els of a risk factor , and lear n ho w the conditional independence structure among variables ma y vary across these risk proﬁles; we f or mally deﬁne this famil y of models and establish their corresponding Marko v properties. W e derive ke y s tr uctural and probabilistic proper ties that under pin a more po werful inf erential framew ork than e xisting approaches, underscoring that our contribution e xtends bey ond a no v el graphical representation. Further more, we show that the resulting proﬁle undirected g raphical models are independence-compatible with two-bloc k L WF chain graph models. W e then dev elop a Ba yesian approach f or Gaussian undirected proﬁle graphical models based on continuous spike-and-slab priors to lear n shared sparsity structures across diﬀerent le vels of the r isk f actor . W e also design a f ast EM algor ithm f or eﬃcient inf erence. Inferential properties are explored through simulation studies, including the compar ison with competing methods. The practical utility of this class of models is demons trated through the analysis of protein network data from various subtypes of acute m y eloid leukemia. Our results show a more parsimonious netw ork and greater patient heterogeneity than its competitors, highlighting its enhanced ability to capture subject-speciﬁc diﬀerences. Keyw or ds: Undir ected gr aphs; Chain gr aphs; Context-speciﬁc independence; Multiple gr aphs; Diﬀerential gr aphs. 1 1 Introduction Multiv ariate regression models can be represented by chain graphs [Laur itzen and W ermuth, 1989, Fr ydenberg, 1990, Andersson et al., 2001, W er muth and Sadeghi, 2012], which capture the conditional independence structure between multiple response and e xplanator y variables. In their simplest f or m, responses and predictors f or m two distinct chain components, with missing edges cor responding to conditional independencies under suitable Mark ov proper ties. Among the v arious types of chain graphs [Dr ton, 2009], we f ocus on the L WF chain graph [Frydenberg, 1990], which deﬁnes a smooth statistical model and provides a ﬂe xible framew ork f or modeling dependencies among outcomes giv en co v ariates. Model selection f or c hain graphs has attracted signiﬁcant attention, with recent dev elopments in penalized likelihood [Rothman et al., 2010, Y in and Li, 2011, Lee and Liu, 2012], two-s tep procedures [Cai et al., 2012, Chen et al., 2016], and Bay esian approaches [Bhadra and Mallick, 2013, Consonni et al., 2017]. Ho we ver , these models remain limited when the goal is to character ize ho w an explanatory variable aﬀects the joint dependence structure among outcomes, rather than eac h outcome individually . The conditional independence models encoded by chain graphs pro vide inf ormation only through missing edges, lea ving unexplored how e xisting dependencies chang e with e xter nal factors. This issue is par ticularl y rele vant in situations where associations betw een variables ma y rev erse or shift across diﬀerent conditions, as in the well-kno wn eﬀect rev ersal [Co x and W er muth, 2003] and Simpson ’ s paradox [Simpson, 1951]. An alternative line of w ork addresses this issue indirectly by modeling subgroups or subpopulations through multiple graphical models [Guo et al., 2011, Danaher et al., 2014, Peterson et al., 2015] or context-speciﬁc independencies [Hojsg aard, 2003, Corander, 2003, Nyman et al., 2014, 2016]. Y et, these approaches typicall y do not incor porate external factors directly into the model and often restrict context-speciﬁc variations to adjacent vertices. Building on these ideas, w e propose a no v el class of graphical models—proﬁle undirected graphs—which preserve the interpretability of chain graphs while extending them to model ho w dependence structures among responses v ary with an e xternal factor . Our contributions are f ourfold: (i) W e introduce proﬁle undirected graphs as a general frame w ork f or modeling all proﬁle outcome distributions, i.e., conditional distributions of responses giv en an y le vel of a r isk f actor . (ii) W e deriv e the corresponding Marko v properties based on a uniﬁed connected set r ule. (iii) W e establish f ormal compatibility , in terms of independence models, betw een the proposed proﬁle g raphs and speciﬁc chain graph structures. (iv) W e dev elop parameterizations f or Gaussian models and incor porate continuous spike-and-slab pr iors to lear n shared sparsity patterns across lev els of the e xter nal f actor . For eﬃcient inf erence, w e implement a f ast EM algor ithm and demonstrate the usefulness of 2 our proposed methodology through e xtensiv e simulations including compar ison with competing methods and a cancer g enomics application. W e pro vide a proﬁle graphical model of protein networks that ev ol ve across disease subtypes, thereby unco vering subtype-speciﬁc dependency patter ns invisible to standard chain g raph or multiple g raph anal ysis. 2 Theore tical frame w ork basic setup Let 𝐺 = ( 𝑉 , 𝐸 ) be a graph deﬁned b y a set of v er tices 𝑎 ∈ 𝑉 and a set of edges ( 𝑎 , 𝑏 ) ∈ 𝐸 joining pairs of v er tices 𝑎 , 𝑏 ∈ 𝑉 , and let 𝑌 𝑉 = ( 𝑌 𝑎 ) 𝑎 ∈ 𝑉 be a random v ector of variables index ed by the ﬁnite set 𝑉 with | 𝑝 = 𝑉 | . A graph, associated to a random v ector 𝑌 𝑉 , is generall y used to represent conditional independence structures under suitable Marko v proper ties. T ypically , missing edges in 𝐺 cor respond to conditional independencies f or the joint dis tribution of 𝑌 𝑉 . Also, let us consider the random categor ical v ar iable 𝑋 with s trictly positiv e probability distribution representing an e xter nal factor with respect to (in the sequel, wr t) the random v ector 𝑌 𝑉 of outcome/response variables. The variable 𝑋 takes lev el 𝑥 ∈ X , with 𝑞 = | X | . Our interest lies in the eﬀect of 𝑋 on the joint independence structure of 𝑌 𝑉 and, in particular , in e xplor ing via a graphical modelling approach how this s tructure ma y c hange under diﬀerent le v els 𝑥 ∈ X , which we call pr oﬁles . Chain graphs are g enerall y used to model the eﬀects of background v ar iables on joint response v ariables. In the simples t f orm, a tw o-block c hain graph 𝐶 = [ { 𝐶 1 , 𝐶 2 } , 𝐸 ] is deﬁned b y a set of v ertices partitioned in chain components 𝐶 1 and 𝐶 2 , and a set of edg es 𝐸 . Depending on the set of Marko v proper ties speciﬁed f or the chain graph w e ma y ha ve diﬀerent independence models f or the joint distr ibution of random vectors ( 𝑌 𝐶 𝑡 ) 𝑡 ∈ { 1 , 2 } , associated to the chain components { 𝐶 𝑡 } 𝑡 ∈ { 1 , 2 } . In par ticular we f ocus on the class of L WF chain graph models [Fr ydenberg, 1990]; these models cor respond to multivariate regression models with suitable independence constraints cor responding b y missing edg es, both within and between chain components. Any pair of v er tices 𝑎 , 𝑏 ∈ 𝐶 𝑡 within the same chain component with 𝑡 = 1 , 2 and 𝑎 ≠ 𝑏 , can be joined b y undirected edg es; vertices betw een chain components, 𝑎 ∈ 𝐶 1 and 𝑏 ∈ 𝐶 2 , are joined b y directed edges preserving the same direction such that cy cles are not allo wed. For our pur pose, the set of v er tices 𝐶 1 and 𝐶 2 are associated, respectiv ely , to the random v ector 𝑌 𝑉 of response v ar iables and to the bac kground v ar iable 𝑋 , so that 𝐶 1 = 𝑉 and | 𝐶 2 | = 1 . In principle, the c hain component 𝐶 2 ma y include a multiple categor ical random vector; in this case 𝑋 represents a random v ariable with state space giv en b y the combination of a multiple factor lev els. For any 𝑥 ∈ X , let 𝑌 𝑉 ( 𝑥 ) be a x-pr oﬁle outcome v ector , that is the random v ector 𝑌 𝑉 | { 𝑋 = 𝑥 } conditioned on a speciﬁc proﬁle 𝑥 of the f actor 𝑋 , and let 𝑃 ( 𝑌 𝑉 ( 𝑥 ) ) be the cor responding x-proﬁle probability distribution of 𝑌 𝑉 ( 𝑥 ) , that is the conditional 3 probability distr ibution 𝑃 ( 𝑌 𝑉 | { 𝑋 = 𝑥 } ) . N ote that 𝑃 ( ·) can be a probability density function or a probability mass function, depending on the continuous or discrete nature of the multivariate random variable 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . For sak e of simplicity , in the sequel w e omit the preﬁx 𝑥 to denote both the proﬁle outcome v ector and the proﬁle outcome distribution. Then, f or a giv en multiv ar iate random v ector 𝑌 𝑉 and an external f actor 𝑋 , let 𝑌 𝑉 | X = [ 𝑌 𝑉 ( 𝑥 ) ] 𝑥 ∈ X be the ﬁnite set of all proﬁle outcome v ectors and let 𝑃 ( 𝑌 𝑉 | X ) = [ 𝑃 ( 𝑌 𝑉 ( 𝑥 ) ) ] 𝑥 ∈ X be the cor responding set of all proﬁle outcome distributions. For an y 𝐴 ⊆ 𝑉 , 𝑌 𝐴 | X = [ 𝑌 𝐴 ( 𝑥 ) ] 𝑥 ∈ X is set of mar ginal proﬁle outcome vectors with cor responding proﬁle probability distributions 𝑃 ( 𝑌 𝐴 | X ) = [ 𝑃 ( 𝑌 𝐴 ( 𝑥 ) ) ] 𝑥 ∈ X . A deﬁnition of proﬁle-independence follo w s. Deﬁnition 1 Given a gr aph 𝐺 = ( 𝑉 , 𝐸 ) and a partition 𝐴 , 𝐵 , 𝐶 ⊆ 𝑉 , the proﬁle conditional independence 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) and the proﬁle marginal independence 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) correspond, respectiv ely, to the f actorizations 𝑃 [ 𝑌 𝐴 ( 𝑥 ) , 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] = 𝑃 [ 𝑌 𝐴 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] × 𝑃 [ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] , (1) 𝑃 [ 𝑌 𝐴 ( 𝑥 ) , 𝑌 𝐵 ( 𝑥 ) ] = 𝑃 [ 𝑌 𝐴 ( 𝑥 ) ] × 𝑃 [ 𝑌 𝐵 ( 𝑥 ) ] , (2) of the joint proﬁle distribution 𝑌 𝑉 ( 𝑥 ) , f or any 𝑥 ∈ X . The follo wing lemma holds by deﬁnition of conditional independence. Lemma 1 If the pr oﬁle independence s tatements in Eq uations (1) and (2) hold f or any level 𝑥 ∈ X , then these equations imply t hat 𝑌 𝐴 ⊥ ⊥ 𝑌 𝐵 | { 𝑌 𝐶 , 𝑋 } and 𝑌 𝐴 ⊥ ⊥ 𝑌 𝐵 | 𝑋 , r espectively . Finall y , let us consider a collection of multiple g r aphs 𝐺 𝑉 | X = { 𝐺 ( 𝑥 ) = ( 𝑉 , 𝐸 ( 𝑥 ) ) } 𝑥 ∈ X associated to the proﬁle outcome distr ibutions 𝑃 ( 𝑌 𝑉 | X ) . U nder suitable Marko v proper ties, an y graph 𝐺 ( 𝑥 ) represents an independence model for the proﬁle outcome v ector 𝑌 𝑉 ( 𝑥 ) , f or any 𝑥 ∈ X . In par ticular , missing edges wr t 𝐺 ( 𝑥 ) cor respond to proﬁle conditional independencies f or the joint distribution of 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . Graphs 𝐺 ( 𝑥 ) ∈ 𝐺 𝑉 | X ma y ha ve diﬀerent sk eletons. W e remark that chain graph models do not allo w to e xplore ho w the independence structure of 𝑌 𝑉 ma y considerabl y vary f or an y proﬁle 𝑥 ∈ X . Multiple graphs do not allow to model the eﬀect of 𝑋 on each outcome 𝑌 𝑎 ∈ 𝑌 𝑉 . In essence, the idea is to pro vide a single g raph able to embed, at the same time, information about the proﬁle independence str ucture f or an y 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X and about the conditional independence betw een 𝑋 and an y outcome 𝑌 𝑎 ∈ 𝑌 𝑉 . In the f ollowing sections, we der iv e the proper ties of this new class of graphical models that ﬁll the g ap between the class of chain graphs and the one of multiple graphs and highlight the connection 4 betw een these two classes. W e also e xploit our results to deﬁne a more po w er ful, with respect to s tate-of-the ar t approaches, inf erential procedure. 3 Proﬁle undirected graphical models 3.1 Proﬁle undir ected graphs W e introduce the class of undirected proﬁle graphs. A proﬁle undirected graph G U = ( 𝑉 , E ) is deﬁned by the set 𝑉 of v er tices and a set of Z -labelled edges E which are labelled according to a subset Z ⊆ X . Let ( 𝑎 , 𝑏 ) Z be the generic element of E associated to any pair 𝑎 , 𝑏 ∈ 𝑉 , where the presence or absence of the edge between 𝑎 and 𝑏 is determined by the subset Z of the state space X . For each pair 𝑎 , 𝑏 ∈ 𝑉 , the corresponding edge ( 𝑎 , 𝑏 ) Z ∈ E will belong to one of the follo wing three categor ies: (i) if Z = X , v er tices 𝑎 and 𝑏 are not joined b y an y edge, (ii) if Z is a nonempty proper subset of X , Z ⊂ X and Z ≠ ∅ , v er tices 𝑎 and 𝑏 are joined b y a dotted Z -labelled edge; (iii) if Z = ∅ , vertices 𝑎 and 𝑏 are joined by a full edge and, f or sake of simplicity , the ∅ -label is not displa y ed in the g raph. Under suitable Mark o v proper ties, the proﬁle graph G U pro vides an independence model f or the joint distributions of a random vector 𝑌 𝑉 | X of proﬁle outcomes. In par ticular , a missing edg e in G U cor responds to a proﬁle conditional independence f or each proﬁle 𝑥 ∈ X . A Z -labelled dotted edg e in G U cor responds to proﬁle conditional independencies holding onl y f or the proﬁles 𝑥 ∈ Z , with Z ⊂ X and Z ≠ ∅ . Further technical deﬁnitions are giv en. For any couple of v ertices 𝑎 , 𝑏 ∈ 𝑉 , we say that 𝑏 is an 𝑥 - neighbour of 𝑎 and vice v ersa , if the y are joined b y a Z -labelled edge such that 𝑥 ∉ Z , with Z ⊂ X . Let 𝑛 𝑏 𝑥 ( 𝑎 ) be the set of all 𝑥 -neighbours of 𝑎 , with 𝑎 ∈ 𝑉 and 𝑥 ∈ X . For any pair 𝑎, 𝑏 ∈ 𝑉 and 𝑥 ∈ X , an 𝑥 - 𝑝 𝑎 𝑡 ℎ betw een 𝑎 and 𝑏 is giv en by a sequence of ( 𝑎 , 𝑏 ) Z edg es, f or any Z ⊂ X , suc h that 𝑥 ∉ Z f or all edges in the seq uence. Giv en an y nonempty subset 𝐶 of 𝑉 , 𝐶 is said to be 𝑥 - connected if any pair 𝑎, 𝑏 ∈ 𝐶 is joined by a 𝑥 -path, with 𝑥 ∈ X . An y nonempty subset 𝐷 of 𝑉 is said to be 𝑥 - disconnected if it is not 𝑥 -connected, with 𝑥 ∈ X and let 𝐾 1 , . . . , 𝐾 𝑟 be the 𝑥 -connected components of 𝐷 . For an y triple 𝐴 , 𝐵 , 𝐶 of disjoint subsets of 𝑉 and 𝑥 ∈ X , w e say that 𝐶 𝑥 - separ ates 𝐴 from 𝐵 if ev er y 𝑥 -path from an y v er tex 𝑎 ∈ 𝐴 to an y v er tex 𝑏 ∈ 𝐵 intersects 𝐶 . T echnical 𝑥 -deﬁnitions abo v e can be simply extended to Z -deﬁnitions for an y subset Z of X if the y hold f or all 𝑥 ∈ Z . Example 1 Consider the proﬁle undir ected gr aph G 𝑈 in the lef t panel of F igur e 1. V ertices 𝑎 and 𝑐 are bo th { 1 , 2 } -neighbours, because t hey ar e joined by a do tted edg e with label Z = { 0 } that does not contain neither 1 or 2 . V er tices 𝑏 and 𝑑 are X -neighbours because they are joined by a full edg e. The sequence of edges { ( 𝑎 , 𝑐 ) { 0 } , ( 𝑎 , 𝑏 ) { 2 } , ( 𝑏 , 𝑑 ) ∅ } is a { 1 } -path, since 1 is not included in any label of the edg es in the sequence. 5 G 𝑈 a b c d 2 1,2 0 𝑈 ( 0 ) a b c d 𝑈 ( 1 ) a b c d 𝑈 ( 2 ) a b c d Figure 1: Given 𝑉 = { 𝑎 , 𝑏 , 𝑐 , 𝑑 } , G U is a proﬁle undirected graph f or the proﬁle outcome v ectors 𝑌 𝑉 | X = [ ( 𝑌 𝑉 ( 𝑥 ) ) 𝑥 ∈ X ] with X = { 0 , 1 , 2 } . Any 𝑈 ( 𝑥 ) is the induced undirected graph f or the proﬁle outcome v ector 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . The same sequence is not a { 2 } -path since the label of the couple ( 𝑎 , 𝑏 ) contains 2 . The set 𝑉 is { 1 } -connected, because ev er y pair of v er tices in 𝑉 are joined by a { 1 } -path. The same set is { 2 } -disconnected with { 2 } - connected components { 𝑎 , 𝑐 } and { 𝑏 , 𝑑 } , because does not exist a { 2 } -path betw een 𝑎 and 𝑏 . V er tices 𝑐 and 𝑑 ar e { 1 } -separat ed by 𝑎 because the only { 1 } -path { ( 𝑎 , 𝑐 ) { 0 } , ( 𝑎 , 𝑏 ) { 2 } , ( 𝑏 , 𝑑 ) } betw een 𝑐 and 𝑑 intersects 𝑎 ; v er tex 𝑎 does not { 0 } -separat es 𝑐 and 𝑑 because ther e exists t he { 0 } -path { ( 𝑏 , 𝑐 ) { 1 , 2 } , ( 𝑏 , 𝑑 ) ∅ } betw een t hem that does no t inter sects 𝑎 . 3.2 Proﬁle undir ected Mark o v properties In this section, w e ﬁrst deﬁne Marko v proper ties f or proﬁle undirected g raphs and then deriv e a f e w results that connect them. Deﬁnition 2 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the proﬁle outcome vect ors 𝑌 𝑉 | X satisfy the proﬁle undir ected P air wise Mar kov Pr operty ( U -PMP) wrt the gr aph G U = ( 𝑉 , E U ) if, f or any ( 𝑎 , 𝑏 ) Z ∈ E U with Z ⊆ X , 𝑌 𝑎 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑏 ( 𝑥 ) | 𝑌 𝑉 \ { 𝑎 , 𝑏 } ( 𝑥 ) , 𝑥 ∈ Z . (3) Deﬁnition 3 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the proﬁle outcome vect ors 𝑌 𝑉 | X satisfy the proﬁle undir ected Global Marko v Property ( U -GMP) wrt the gr aph G U = ( 𝑉 , E U ) if, for any triple 𝐴, 𝐵 , 𝐶 of disjoint subsets of 𝑉 such that 𝐶 𝑥 -separates 𝐴 fr om 𝐵 in G U , 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (4) Deﬁnition 4 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the proﬁle outcome vect ors 𝑌 𝑉 | X satisfy the proﬁle 6 undir ected Connected Set Marko v Property ( U -CSMP) wr t the gr aph G U = ( 𝑉 , E U ) if, for any 𝑥 -disconnect ed set 𝐷 of 𝑉 , with 𝐾 1 , . . . , 𝐾 𝑟 𝑥 -connected components of 𝐷 , 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , 𝑥 ∈ X . (5) Example 2 Consider the lef t panel including t he gr aph G U in F igur e 1. 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -PMP wrt G U if 𝑌 𝑏 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑐 ( 𝑥 ) | { 𝑌 𝑎 ( 𝑥 ) , 𝑌 𝑑 ( 𝑥 ) } f or 𝑥 ∈ { 1 , 2 } , since ( 𝑏 , 𝑐 ) { 1 , 2 } ∈ E U . 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -GMP wr t G U if 𝑌 𝑐 ( 1 ) ⊥ ⊥ { 𝑌 𝑏 ( 1 ) , 𝑌 𝑑 ( 1 ) } | 𝑌 𝑎 ( 1 ) because 𝑎 { 1 } -separat es 𝑐 fr om { 𝑏 , 𝑑 } . Consider t he subse t 𝐷 = { 𝑎 , 𝑏 , 𝑐 } of 𝑉 ; 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -CSMP wrt G U if { 𝑌 𝑎 ( 2 ) , 𝑌 𝑐 ( 2 ) } ⊥ ⊥ 𝑌 𝑏 ( 2 ) | 𝑌 𝑑 ( 2 ) because 𝐷 is { 2 } -disconnected se t with two { 2 } -connected components { 𝑎 , 𝑐 } and 𝑏 . W e prov e that all the independence s tatements encoded in a proﬁle undirected graph under the global Mark ov proper ty can be der iv ed by applying the connected set rule that giv es insight on the connectivity and on the path setting of the g raph across proﬁles. Theorem 1 Let G U = ( 𝑉 , E U ) be a pr oﬁle undirect ed graph model associat ed t o the pr oﬁle outcome v ectors 𝑌 𝑉 | X with probability distributions 𝑃 [ 𝑌 𝑉 | X ] . The U -GMP is satisﬁed if and only if the U -CSMP is satisﬁed wrt G U . The proof of Theorem 1 is giv en in the Supplementary Mater ial, along with all other proof s. The local Marko v proper ty f or proﬁle undirected g raph is also included in the Supplementar y Mater ial. Given a proﬁle undirected graph G U = ( 𝑉 , E U ) for the proﬁle outcome v ectors 𝑌 𝑉 | X , the cor responding class of multiple undirected graphs associated to eac h random vector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X can be deﬁned. Deﬁnition 5 Given a proﬁle undirect ed g raph G U = ( 𝑉 , E U ) for the proﬁle outcome vect ors 𝑌 𝑉 | X , let 𝑈 𝑉 | X = { 𝑈 ( 𝑥 ) = ( 𝑉 , 𝐸 𝑈 ( 𝑥 ) ) } 𝑥 ∈ X be the induced class of multiple undirect ed g r aphs, wher e, f or any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X , the couple 𝑎 , 𝑏 ∈ 𝑉 is joined by an undir ected edg e if 𝑥 ∉ Z in the corr esponding edg e ( 𝑎 , 𝑏 ) Z ∈ E U , with Z ⊆ X . Then, a missing edge in G U cor responds to a missing edge in 𝑈 ( 𝑥 ) , f or an y 𝑥 ∈ X ; a Z -labelled dotted edg e in G U cor responds to a missing edge in 𝑈 ( 𝑥 ) if 𝑥 ∈ Z , and to a full edge in 𝑈 ( 𝑥 ) if 𝑥 ∉ Z ; a full edg e in G U cor responds to a full edg e in 𝑈 ( 𝑥 ) , for any 𝑥 ∈ X . Example 3 Consider F igur e 1. Giv en the pr oﬁle undir ected graph G U , le t 𝑈 𝑉 | X = { 𝑈 ( 0 ) , 𝑈 ( 1 ) , 𝑈 ( 2 ) } be the induced class of multiple undirect ed g r aphs. The couple 𝑎, 𝑑 is disjoined in G U and in any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . The 7 couple 𝑏 , 𝑐 is joined by a { 1 , 2 } -labelled dotted edge in G U then is joined by a full edg e in 𝑈 ( 0 ) and is disjoined in 𝑈 ( 1 ) , 𝑈 ( 2 ) . The couple 𝑏 , 𝑑 is joined by a full edg e in G U and in any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . Pairwise, local, and global Marko v proper ty of probability distributions associated to undirected graphs are w ell kno wn [Laur itzen, 1996]. The follo wing corollar y , derived directly from Theorem 1, sho w s that the full set of conditional independencies implied by the global Marko v property f or any undirected g raph can be also derived b y applying the connected set rule. Corollary 1 Giv en an undirect ed gr aph model 𝑈 ( 𝑥 ) = ( 𝑉 , 𝐸 ( 𝑥 ) ) associated to the proﬁle outcome vector s 𝑌 𝑉 | X , the probability distributions 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] satisﬁes the g lobal Marko v property wrt 𝑈 ( 𝑥 ) if and only if the connected set Marko v property is satisﬁed f or every 𝑥 -disconnected set 𝐷 ⊆ 𝑉 , with 𝑥 ∈ X . The f ollo wing proposition sho ws that the full set of independencies encoded in the induced undirected graph model f or an y 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X can be deriv ed from the proﬁle undirected graph model f or the joint distributions of 𝑌 𝑉 | X . Proposition 1 Consider a proﬁle undirect ed gr aph G U = ( 𝑉 , E U ) associated t o the pr oﬁle outcome vect ors 𝑌 𝑉 | X and the induced class of multiple undirect ed gr aphs 𝑈 𝑉 | X . If the probability distributions 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -CSMP wr t G U , the pr obability distribution 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] of eac h pr oﬁle v ector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X satisﬁes the g lobal Mar kov property wrt the induced undir ected graph 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . In the f ollo wing proposition w e sho w that U -GMP , U -CSMP and U -PMP are equivalent for the class of proﬁle undirected graph models in case of s tr ictly positiv e probability distributions. This result directl y derives from Proposition 1. Proposition 2 Let G U = ( 𝑉 , E U ) be a proﬁle undir ected gr aph associat ed t o the proﬁle outcome v ectors 𝑌 𝑉 | X with strictly positiv e pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] . The U -GMP is satisﬁed if and only if the U -PMP is satisﬁed wr t G U . 4 Proﬁle undirected graphs and L WF chain graphs For any proﬁle undirected graph G U , w e derive an induced class of tw o-block L WF chain g raphs C 𝑈 = { 𝐶 𝑈 } , with g eneric element 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] , such that the joint distribution 𝑃 ( 𝑌 𝑉 , 𝑋 ) under 𝐶 𝑈 is compatible, in terms of independence models, with the set of proﬁle distributions 𝑃 [ 𝑌 𝑉 | X ] under G U . Preliminar y deﬁnitions are required to specify the class of L WF chain graphs induced b y a proﬁle graph. 8 A joint probability distribution 𝑃 ( 𝑌 𝑉 , 𝑋 ) satisﬁes the L WF Global Marko v property (L WF-GMP) wr t the L WF chain graph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] if [Fr ydenberg, 1990, Drton, 2009]: f or an y disconnected set 𝐷 ⊆ 𝑉 with connected components 𝐾 1 , . . . , 𝐾 𝑟 , 𝑌 𝐾 1 ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } ; (6) f or an y subset 𝐴 ⊆ 𝑉 such that there is a missing ar row betw een any v er te x 𝑎 ∈ 𝐴 and 𝑋 , 𝑌 𝐴 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝐴 . (7) W e remark that Equation (6) directly der ives from Theorem 1. A deﬁnition of Marko v -compatibility betw een a proﬁle undirected graph and an L WF c hain graphs is giv en. Deﬁnition 6 An L WF c hain g r aph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] is Mar kov-compatible with a proﬁle undirect ed gr aph G U if the L WF -GMP in (6) for 𝐶 𝑈 is implied by the U -GMP f or G U . W e no w derive the class of L WF-graphs induced b y a proﬁle undirected graph suc h that Mark o v -compatibility is satisﬁed. Theorem 2 Consider a proﬁle undirect ed gr aph G 𝑈 = ( 𝑉 , E U ) associat ed to the proﬁle outcome vect ors 𝑌 𝑉 | X . If the pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -GMP f or G 𝑈 , then also the L WF -GMP in (6) is satisﬁed f or the induced class C 𝑈 of two-bloc k L WF chain graphs, where any 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] belongs to C 𝑈 if (i) any couple 𝑎 , 𝑏 ∈ 𝑉 is joined by an undir ected edge in 𝐶 𝑈 if Z ⊂ X f or t he pair ( 𝑎 , 𝑏 ) Z ∈ E U ; (ii) f or any couple 𝑎 , 𝑏 ∈ 𝑉 , 𝑎 and 𝑏 ar e both r eached by an arro w in 𝐶 𝑈 starting from 𝑋 if Z ⊂ X and Z ≠ ∅ f or the pair ( 𝑎 , 𝑏 ) Z ∈ E U . Necessary conditions (i) and (ii) in Theorem 2 ensure that it will alwa ys e xist at least one Marko v -compatible L WF chain g raph f or any given proﬁle undirected g raph, speciﬁcally , the chain graph with no missing ar ro w s is alw a ys compatible. Condition (i) is related to the missing/non-missing undirected edges f or any induced chain graph; it s tates that dotted and full edg es in proﬁle undirected graphs correspond to full edg es in c hain graphs. Condition (ii) is related to missing/non-missing directed edges f or an y induced chain graph; it states that v er tices joined b y a dotted edge in a proﬁle undirected graph cannot be disjoined from 𝑋 in the induced chain g raph. Since condition (ii) may not be intuitiv e, the f ollo wing countere xample sho ws that this is a necessar y condition. 9 Example 4 Let 𝑉 = { 𝑎 , 𝑏 , 𝑐 } be a set of r esponse variables and 𝑋 a fact or with state-space X = { 0 , 1 } and let G U = ( 𝑉 , E U ) be a proﬁle undir ected g raph with E U = { ( 𝑎 , 𝑏 ) { 0 } , ( 𝑎 , 𝑐 ) ∅ , ( 𝑏 , 𝑐 ) X } wher e the pair 𝑎 , 𝑏 is joined by a { 0 } -dotted edg e that implies 𝑌 𝑎 ( 0 ) ⊥ ⊥ 𝑌 𝑏 ( 0 ) | 𝑌 𝑐 ( 0 ) and 𝑌 𝑎 ( 1 ) ⊥ ⊥ / 𝑌 𝑏 ( 1 ) | 𝑌 𝑐 ( 1 ) . (8) W e explor e Markov-compatibility for some c hain gr aphs. Consider a chain gr aph 𝐶 𝑈 = { ( 𝑉 , 𝑋 ) , 𝐸 𝐶 𝑈 } with 𝐸 𝐶 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑐 ) } , wher e v ertices 𝑎 and 𝑏 ar e both disjoined from 𝑋 . F or the condition (7) , we have { 𝑌 𝑎 , 𝑌 𝑏 } ⊥ ⊥ 𝑋 | 𝑌 𝑐 , i.e., 𝑃 ( 𝑌 𝑎 ( 0 ) , 𝑌 𝑏 ( 0 ) | 𝑌 𝑐 ( 0 ) ) = 𝑃 ( 𝑌 𝑎 ( 1 ) , 𝑌 𝑏 ( 1 ) | 𝑌 𝑐 ( 1 ) ) , that is not compatible with condition (8) f or the proﬁle gr aph, then 𝐶 𝑈 does not belong to the induced class C 𝑈 . Consider the chain gr aph 𝐶 ′ 𝑈 with 𝐸 𝐶 ′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑏 ) , ( 𝑋 , 𝑐 ) } , where only 𝑎 is joined to 𝑋 . Equation (7) implies that 𝑌 𝑎 ⊥ ⊥ 𝑋 | { 𝑌 𝑏 , 𝑌 𝑐 } , that is, 𝑃 [ 𝑌 𝑎 ( 0 ) | 𝑌 𝑏 ( 0 ) , 𝑌 𝑐 ( 0 ) ] = 𝑃 [ 𝑌 𝑎 ( 1 ) | 𝑌 𝑏 ( 1 ) , 𝑌 𝑐 ( 1 ) ] , that is not compatible with condition (8) f or the proﬁle gr aph implying that 𝑃 [ 𝑌 𝑎 ( 1 ) | 𝑌 𝑏 ( 1 ) , 𝑌 𝑐 ( 1 ) ] ≠ [ 𝑌 𝑎 ( 0 ) | 𝑌 𝑏 ( 0 ) 𝑌 𝑐 ( 0 ) ] . It follo w s that 𝐶 ′ 𝑈 does not belong to the induced class C 𝑈 . Consider a third c hain gr aph 𝐶 ′′ 𝑈 with t he se t 𝐸 𝐶 ′′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑎 ) , ( 𝑋 , 𝑏 ) , ( 𝑋 , 𝑐 ) } of edg es, wher e 𝑎 and 𝑏 ar e bot h joined t o 𝑋 . This c hain gr aph is compatible with condition (8) for the proﬁle g raph G U . 𝐶 ′′ 𝑈 satisﬁes bot h conditions (i) and (ii) in Theor em 2 and belongs to the induced class C 𝑈 . Consider the last c hain gr aph 𝐶 ′′′ 𝑈 with the set 𝐸 𝐶 ′′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑎 ) , ( 𝑋 , 𝑏 ) , } of edg es, wher e only 𝑎 and 𝑏 are bot h joined to 𝑋 . This gr aph implies 𝑌 𝑐 ⊥ ⊥ 𝑋 | { 𝑌 𝑎 , 𝑌 𝑏 } , i.e., 𝑃 [ 𝑌 𝑐 ( 1 ) | 𝑌 𝑎 ( 1 ) , 𝑌 𝑏 ( 1 ) ] ≠ [ 𝑌 𝑐 ( 0 ) | 𝑌 𝑎 ( 0 ) 𝑌 𝑏 ( 0 ) ] that is compatible with condition (8) for the pr oﬁle gr aph G U . 𝐶 ′′′ 𝑈 satisﬁes both conditions (i) and (ii) in Theor em 2. The induced class C 𝑈 = { 𝐶 ′′ 𝑈 , 𝐶 ′′′ 𝑈 } of c hain g raphs includes tw o elements wher e 𝐶 ′′′ 𝑈 has no missing arro ws and 𝐶 ′′′ 𝑈 has a missing arro w in corr espondence of the v ertex 𝑐 whic h is not invol v ed in dotted edg es. In essence, giv en a proﬁle graph G U , the induced class C 𝑈 includes L WF chain graphs 𝐶 𝑈 = { ( 𝑉 , 𝑋 ) , 𝐸 𝐶 𝑈 } where the chain component 𝑉 has the same skeleton of G U , and diﬀer only according to which ar row s are missing. Within this class, w e can identify the maximum element , i.e., the chain g raph with no missing arrow s and the minimum element , i.e., the chain graph with a set of ar ro ws that point to all v er tices 𝑎 ∈ 𝑉 suc h that 𝑛 𝑏 Z ( 𝑎 ) ≠ ∅ with 𝑍 ⊂ X and 𝑍 ≠ ∅ . 10 G 𝑈 a b c d 2 1,2 0 𝐶 𝑈 a b c d X Figure 2: A proﬁle undirected g raph with a compatible L WF chain graph. In order to account also f or the L WF-GMP in (7) and to establish a one-to-one relationship betw een proﬁle undirected graphs and L WF c hain graphs, w e g eneralize the class of proﬁle undirected graphs. Giv en a proﬁle undirected graph G U = ( 𝑉 , E U ) , consider the par tition 𝑉 = 𝑉 ∪ 𝑉 □ of the v er tex set so that we dis tinguish betw een tw o types of v er tices, a circle v er tex 𝑎 ∈ 𝑉 and a squar e v ertex 𝑎 □ ∈ 𝑉 □ , drawn as # and □ , respectiv ely . For ev ery 𝑎 □ ∈ 𝑉 □ , w e assume 𝑌 𝑎 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝑎 ; that is the univariate proﬁle distribution of 𝑌 𝑎 ( 𝑥 ) is in variant f or any 𝑥 ∈ X , giv en the remaining variables 𝑌 𝑉 \ 𝑎 ; other wise if 𝑎 ∈ 𝑉 , we assume 𝑌 𝑎 ⊥ ⊥ / 𝑋 | 𝑌 𝑉 \ 𝑎 . The proﬁle g raph in this generalized representation includes inf or mation also about the independence structure betw een subsets of response v ar iables 𝑌 𝐴 , with 𝐴 ⊆ 𝑉 , and the e xter nal factor 𝑋 . In par ticular , f or an y 𝐴 ⊆ 𝑉 □ , we assume that 𝑌 𝐴 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝐴 . Then, given a proﬁle undirected graph G U = ( 𝑉 , 𝑉 □ , E U ) , the compatible tw o-block L WF chain g raph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] in the class C 𝑈 is unique and is deﬁned b y a chain g raph where the undirected graph of the response component 𝑉 has the same skeleton of G U and there are missing ar ro ws betw een 𝑋 and an y square v er tex 𝑎 □ ∈ 𝑉 □ . Square and circle v er tices pro vide insights into ho w the dependence structure varies across proﬁles. Square v er tices identify v ariables with stable pairwise dependencies, while circular v er tices denote v ariables c haracterized b y pairwise context-speciﬁc independencies. The fe w er the square nodes, the greater the diﬀerence across proﬁles. Example 5 Consider the proﬁle undirect ed gr aph and the induced c hain gr aph in Figur e 2. V ertices 𝑎 , 𝑏 , 𝑐 ar e cir cled v er tices while 𝑑 is a squar e v er tex wrt G U , i.e., { 𝑎 , 𝑏 , 𝑐 } ∈ 𝑉 and 𝑑 ∈ 𝑉 □ . Then, both the pr oﬁle undir ected gr aph and the chain g raph imply the independence statement 𝑌 𝑑 ⊥ ⊥ 𝑋 | { 𝑌 𝑎 , 𝑌 𝑏 , 𝑌 𝑐 } . Also, both gr aphs imply that { 𝑌 𝑎 , 𝑌 𝑐 } ⊥ ⊥ 𝑌 𝑑 | { 𝑌 𝑏 , 𝑋 } . U nlike the proﬁle gr aph, the chain gr aph does no t pr ovide inf ormation about the eﬀect of 𝑋 on t he 𝑌 𝑉 association structure, e.g., 𝑌 𝑎 ( 2 ) ⊥ ⊥ 𝑌 𝑏 ( 2 ) | { 𝑌 𝑐 ( 2 ) , 𝑌 𝑑 ( 2 ) } . 11 5 Gaussian proﬁle undirected graphical model W e can no w deﬁne the class of Gaussian proﬁle undir ected gr aphical model b y imposing zero-cons traints o v er the model parameters; these cons traints naturall y f ollo w from the Marko v equiv alence betw een proﬁle graphs and multiple graphs, and the compatibility betw een proﬁle g raphs and chain graphs es tablished previousl y . For all 𝑥 ∈ X , let 𝑌 𝑉 ( 𝑥 ) ∼ 𝑁 ( 𝛼 + 𝛽 𝑥 , Σ 𝑥 ) where [ 𝛼 𝑎 + 𝛽 𝑎 𝑥 ] 𝑎 ∈ 𝑉 = E [ 𝑌 𝑎 ( 𝑥 ) ] 𝑎 ∈ 𝑉 is the proﬁle marginal mean v ector and Σ 𝑥 is the pr oﬁle co variance matrix with entr ies 𝜎 𝑎 𝑏 , 𝑥 , 𝑥 ∈ X . Let 𝜁 𝑎 𝑥 , 𝑎 ∈ 𝑉 , be the linear eﬀect of the external factor on the proﬁle conditional mean vect or E [ 𝑌 𝑎 ( 𝑥 ) | 𝑌 𝑉 \ 𝑎 ( 𝑥 ) ] 𝑎 ∈ 𝑉 and let Ω 𝑥 = Σ − 1 𝑥 be the proﬁle pr ecision matrix with entries 𝜔 𝑎 𝑏 , 𝑥 , 𝑥 ∈ X ; note that 𝜁 𝑥 = Ω 𝑥 𝛽 𝑥 where 𝜁 𝑥 = [ 𝜁 𝑎 𝑥 ] 𝑎 ∈ 𝑉 and 𝛽 𝑥 = [ 𝛽 𝑎 𝑥 ] 𝑎 ∈ 𝑉 [Andersson et al., 2001]. Deﬁnition 7 The Gaussian proﬁle undirect ed g r aphical model f or 𝑌 𝑉 | X wrt G U = ( 𝑉 , E U ) is such that, (i) f or any 𝑎 ∈ 𝑉 □ , 𝜁 𝑎 𝑥 = 0 f or all 𝑥 ∈ X , (ii) f or any ( 𝑎 , 𝑏 ) Z ∈ E U , with Z ⊆ X , 𝜔 𝑎 𝑏 , 𝑥 = 0 , f or each 𝑥 ∈ Z . Estimation of Gaussian proﬁle undirected graphical models can build on e xisting methods f or Gaussian chain or multiple graph inf erence. Penalty terms that promote shared network structures across proﬁles—similar to the group graphical lasso, joint graphical lasso [Danaher et al., 2014], or GemBag [Y ang et al., 2021] ma y be appropriate when the x-proﬁle vectors f ollow distributions with comparable g raph structures. In Section 5.1, w e extend the model of Y ang et al. [2021] b y introducing proﬁle indicators 𝑟 𝑎 𝑏 , 𝑥 that specify the sparsity of the cor responding entr ies 𝜔 𝑎 𝑏 , 𝑥 and proﬁle coeﬃcients 𝛽 𝑎 , 𝑥 . The entries 𝜔 𝑎 𝑏 , 𝑥 estimated to be zero deﬁne the zero patter ns that cor respond to a giv en proﬁle undirected g raph G U = ( 𝑉 , E U ) . 5.1 Ba y esian model f ormulation T o express our model in a Ba yesian frame w ork, w e ﬁrst introduce binary global-lev el indicators 𝛾 𝑖 𝑗 such that 𝛾 𝑖 𝑗 : = 1 indicates that nodes 𝑖 and 𝑗 are connected in at least one of the 𝑞 g raphs, and 𝛾 𝑖 𝑗 : = 0 denotes no such connection. W e place independent Bernoulli priors 𝛾 𝑖 𝑗 ∼ Ber noulli ( 𝛾 𝑖 𝑗 | 𝑝 1 ) on these indicators. Similarl y , w e introduce global-lev el coeﬃcient indicators 𝜃 𝑖 . The indicator 𝜃 𝑖 denotes whether at least one of the coeﬃcients that represent the eﬀect of the e xternal f actor is non-zero ( 𝜃 𝑖 : = 1 ) or zero ( 𝜃 𝑖 : = 0 ). These indicators will also aﬀect the proﬁle-lev el sparsity in the corresponding entr ies of the precision matrices across the 𝑞 graphs. W e assign a Ber noulli prior 𝜃 𝑖 ∼ Ber noulli ( 𝜃 𝑖 | 𝑝 2 ) at each indicator . 12 W e introduce proﬁle indicators 𝑟 𝑖 𝑗 , 𝑥 to capture the sparsity of the cor responding entr ies 𝜔 𝑖 𝑗 , 𝑥 . T o encourag e similarity among these indicators, we place joint pr iors on 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 | 𝛾 𝑖 𝑗 . These distr ibutions encourage across-proﬁle inf ormation shar ing while allowing within-proﬁle heterogeneity . In this paper , we assume the f ollo wing hierarchical s tr ucture f or this dis tribution: P ( 𝑟 𝑖 𝑗 , 𝑥 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 | 𝛾 𝑖 𝑗 , 𝜃 𝑖 , 𝜃 𝑗 ) = 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 Bernoulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 3 ) + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) Ber noulli ( 𝑟 𝑖 𝑗 , 0 | 𝑝 4 ) 𝛿 ( 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ) ( 𝑟 𝑖 𝑗 , 0 ) , (9) where 𝛿 · ( · ) denotes a point mass at zero. Under this setup, when the global-lev el indicator 𝛾 𝑖 𝑗 = 0 , all the 𝑟 𝑖 𝑗 , 𝑘 are set to zero. When 𝛾 𝑖 𝑗 = 1 , each 𝑟 𝑖 𝑗 , 𝑘 can still independently take the value 0 with probability 1 − 𝑝 3 if 𝜃 𝑖 = 𝜃 𝑗 = 1 , or with probability 1 − 𝑝 4 if 𝜃 𝑖 = 𝜃 𝑗 = 0 and 𝑟 𝑖 𝑗 , 0 = 𝑟 𝑖 𝑗 , 1 = , . . . , = 𝑟 𝑖 𝑗 , 𝑞 − 1 . This pro vides a ﬂe xible approach to shar ing inf or mation across proﬁles while also accounting for the eﬀect of the e xter nal factor , e.g., the inclusion or ex clusion of arrow s. W e place an exponential pr ior on the positiv e diagonal entries of the 𝑞 precision matrices to induce proper shrinkage: 𝜔 𝑖 𝑖 , 𝑥 ∼ Exponential ( 𝜏 ) , and a spike-and-slab pr ior on the upper tr iangular entr ies 𝜔 𝑖 𝑗 , 𝑥 ( 𝑖 < 𝑗 ) : 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ∼ 𝑟 𝑖 𝑗 , 𝑥 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 1 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 0 ) , with 𝜈 1 > 𝜈 0 > 0 . Follo wing Y ang et al. [2021], w e adopt a spike-and-slab Lasso prior [R ock o va and Georg e, 2018], where P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 1 ) represents the slab component with a larg e variance, allo wing f or lar ge signals, and P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 0 ) represents the spike component with a small v ar iance, encouraging values close to zero. Finall y w e set a nor mal spike-and-slab pr ior [Georg e and McCulloch, 1993] on the proﬁle coeﬃcients: 𝛽 𝑖 , 𝑥 | 𝜃 𝑖 ∼ 𝜃 𝑖 P ( 𝛽 𝑖 , 𝑥 | 𝜆 1 ) + ( 1 − 𝜃 𝑖 ) P ( 𝛽 𝑖 , 𝑥 | 𝜆 0 ) , with 𝜆 1 > 𝜆 0 > 0 . Figure 3 pro vides a graphical representation of our Ba yesian model. Parameter estimation f or our proposed Bay esian Gaussian proﬁle undirected graphical model is conducted using Expectation-Maximisation (EM) algor ithm [Dempster et al., 1977] along the lines of the EM algor ithm of Y ang et al. [2021]. Let Δ = ( 𝛼, 𝛽 𝑥 , Ω 𝑥 ) be the unknown parameters. The EM algorithm aims to maximise the log-posterior log ( P ( Δ | 𝑌 𝑉 𝑥 ) by w orking with the complete-data log-pos ter ior log ( P ( Δ | 𝑌 𝑉 𝑥 , 𝜃 , 𝑅 ) . The EM algorithm has tw o steps: the E-step calculates the e xpected values E [ 𝑟 𝑖 𝑗 , 𝑥 | b Δ , 𝑌 𝑉 𝑥 ] and E [ 𝜃 𝑖 | b Δ , 𝑌 𝑉 𝑥 ] where Δ ( 𝑡 ) = b Δ are the current values at 𝑡 of Δ . The M-step maximises 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ [ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ] = ⟨ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ⟩ w .r .t. b Δ = ( b 𝛼 , b 𝛽 𝑥 , b Ω 𝑥 ) , giving new updates Δ ( 𝑡 + 1 ) = ( 𝛼 ( 𝑡 + 1 ) , 𝛽 ( 𝑡 + 1 ) 𝑥 , Ω ( 𝑡 + 1 ) 𝑥 ) = b Δ . Supplementary Material C provides a detail der ivation of our EM algorithm. 13 𝑌 𝑣 𝑥 𝛼 𝜔 𝑖 𝑗 , 𝑥 𝜔 𝑖 𝑖 , 𝑥 𝑟 𝑖 𝑗 , 𝑥 𝛾 𝑖 𝑗 𝛽 𝑖 , 𝑥 𝜃 𝑖 𝜏 𝑝 4 𝑝 3 𝜆 0 𝜆 1 𝜈 0 𝜈 1 𝑝 2 𝑝 1 0 ≤ 𝑥 ≤ 𝑞 − 1 1 ≤ 𝑖 < 𝑗 ≤ 𝑝 Figure 3: Directed acy clic g raph (D A G) f or Gaussian undirected proﬁle graphical model with Spike-and-slab prior on the proﬁle cov ar iance matr ices and the proﬁle coeﬃcients 6 R esults 6.1 Simulations W e conduct simulation studies to e v aluate the per f or mance of the proposed Bay esian Gaussian undirected proﬁle graphical model (BPUGM). W e consider 𝑞 = 4 lev els of the co v ar iate 𝑋 , with 𝑋 = { 0 , 1 , 2 , 3 } , and e xamine performance under varying numbers of nodes 𝑝 ∈ { 20 , 50 , 100 } and three lev els of sparsity 𝑠 (the larg er the v alue of 𝑠 , the sparser the graphs). Four str uctural scenarios are inv estig ated: (i) all f our lev els ha ve distinct graph str uctures; (ii) { 𝐺 ( 0 ) = 𝐺 ( 1 ) } ≠ { 𝐺 ( 2 ) = 𝐺 ( 3 ) } ; (iii) { 𝐺 ( 0 ) = 𝐺 ( 1 ) = 𝐺 ( 2 ) } ≠ { 𝐺 ( 3 ) } ; and (iv) all le v els share the same g raph structure. For each scenario, data are generated independentl y f or eac h 𝑥 ∈ X from a multivariate nor mal distribution N ( 𝛽 𝑥 , Σ 𝑥 ) , where Σ 𝑥 = Ω − 1 𝑥 and 𝛽 𝑥 = Σ 𝑥 𝜁 𝑥 , with sample size 𝑛 𝑥 = 50 . A dditional details on the data g enerating mec hanism are pro vided in the Supplementary Materials. W e compare BPUGM with GemBag [Y ang et al., 2021], the fused g raphical lasso (FGL), and the g roup graphical lasso (GGL) [Danaher et al., 2014]. Graph structure es timation accuracy is assessed using the area under the R OC curve (A UC), accuracy , sensitivity , and speciﬁcity of Λ ( 𝑥 ) f or all 𝑥 ∈ X . Results are a verag ed o ver 100 simulated datasets, with standard er rors repor ted. Summaries are presented in Figures 4 to 6 and detailed values in T ables 1 to 4 of the Supplementar y Materials. A cross all scenar ios, v alues of 𝑝 , and sparsity le v els, BPUGM consistentl y achiev es the highest A UC, indi- cating super ior edge discr imination compared with competing methods. The impro v ement is mos t pronounced in Scenarios 2 and 3, where par tial shar ing of g raph structures across lev els allow s BPUGM to eﬀectiv el y bor row 14 strength across groups. In Scenar io 1, where graph structures are fully dis tinct, BPUGM remains competitive and stable as 𝑝 increases. In Scenario 4, where all graphs are identical, BPUGM performs comparably to FGL and GGL, while maintaining slightl y higher sensitivity without loss of speciﬁcity . Ov erall, BPUGM pro vides a fa v orable balance betw een sensitivity and speciﬁcity , leading to higher accuracy . These results demons trate the advantag es of the proposed Bay esian proﬁle graphical modeling frame work f or joint estimation of multiple related g raph s tr uctures under ﬁnite-sample conditions. 6.2 AML pro tein dat a W e analyze protein expression data from patients aﬀected by acute my eloid leukemia (AML) with the goal of recons tr ucting and compar ing protein networks across disease subtypes; compar ing the netw orks for these groups provides insight into the diﬀerences in protein signaling that ma y aﬀect whether treatments for one subtype will be eﬀective in another one. A set of protein le vels, collected using the re verse phase protein array (RPP A) technology , is observed in a sample of 213 ne w ly diagnosed AML patients [Kornblau et al., 2009] 1 . Patients are classiﬁed b y subtype according to the Frenc h- Amer ican-Br itish (F AB) classiﬁcation sys tem. W e consider 4 diﬀerent proﬁles given b y 4 AML subtypes, for which a reasonable sample size is a vailable: M0 (17 subjects), M1 (34 subjects), M2 (68 subjects), and M4 (59 subjects). These proﬁles, based on cr iteria including cytog enetics and cellular mor phology , show v ar ying prognosis. W e e xpect to observe diﬀerent protein interactions in the subtypes. W e f ocus on 18 proteins relev ant to the apoptosis and cell cy cle regulation KEGG pathwa y s [Kanehisa et al., 2011]. Our interest is modelling the eﬀect of the AML subtype on the joint independence structure of the protein le v els. Proﬁle undirected graphical models are an encompassing tool that coherently and jointly per f or ms all inf erential tasks of interest of learning ho w the protein dependency s tructure chang es across subtypes as w ell as the mean protein le v els. Theref ore, considering the 𝑝 = 18 protein le v els f ollowing a multivariate Gaussian distribution and 𝑞 = 4 diﬀerent proﬁles of AML, where the le vels 𝑥 ∈ X = { 0 , 1 , 2 , 3 } denote the subtypes M0, M1, M2, M4 respectivel y , w e estimate and select the proﬁle undirected g raphical model represented in Figure 7. For the sake of comparison, w e represent the cor responding multiple-graph in F igure 9; this graph is ar guably harder to read. Most importantly , the man y proﬁle speciﬁc independencies are ob viously missed b y the graph. For instance, from the selected proﬁle graph we lear n that for the proﬁles 𝑥 ∈ { 0 , 1 } , 𝑌 AKTp.308 ( 𝑥 ) ⊥ ⊥ 𝑌 BCI.2 ( 𝑥 ) | 𝑌 𝑉 \ { AKTp.308,BCI.2 } ( 𝑥 ) ; f or an y proﬁle 𝑥 ∈ X , 𝑌 AKTp.308 ( 𝑥 ) ⊥ ⊥ 𝑌 B AD ( 𝑥 ) | 𝑌 𝑉 \ { AKTp.308,B AD } ( 𝑥 ) . The lev el of proteins B AX, GSK3 and XIAP are independent to the AML subtypes. 1 http://bioinf or matics.mdanderson.org/Supplements/K or nblau-AML-RPP A/aml-r ppa.xls 15 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.025 0.500 0.525 0.550 0.575 0.600 pugm gembag fgl ggl s = 0.050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.50 0.55 0.60 0.65 0.70 0.75 pugm gembag fgl ggl s = 0.025 0.500 0.525 0.550 0.575 0.600 pugm gembag fgl ggl s = 0.050 Scenario 2 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.025 0.49 0.51 0.53 0.55 0.57 pugm gembag fgl ggl s = 0.050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.025 0.52 0.56 0.60 0.64 pugm gembag fgl ggl s = 0.050 Scenario 4 Figure 4: A UC ov er 100 datasets, 𝑃 = 20 , 𝐾 = 4 , 𝑁 𝑥 = 20 f or the f our diﬀerent scenarios 16 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 2 0.6 0.8 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 4 Figure 5: A UC ov er 100 datasets, 𝑃 = 50 , 𝐾 = 4 , 𝑁 𝑥 = 50 f or the f our diﬀerent scenarios 17 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 pugm gembag fgl ggl s = 0.0050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.0050 Scenario 2 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.0050 Scenario 4 Figure 6: A UC ov er 100 datasets, 𝑃 = 100 , 𝐾 = 4 , 𝑁 𝑥 = 50 f or the f our diﬀerent scenarios 18 The selected proﬁle graph has only three sq uare vertices and onl y three ar ro ws can be remo v ed as all edg es are dotted with the e x ception of one. This means that the dependence structure of 𝑌 𝑉 is e xpected to substantially v ary across proﬁle and modeling the direct eﬀect of 𝑋 on single 𝑌 𝑎 ∈ 𝑌 𝑉 becomes rele vant to capture how the conditional dependence structure of 𝑌 𝑉 chang es in diﬀerent proﬁles. W e then compute the maximal connected components of each proﬁle graph, cor responding to the set of maximal paths. These quantities summar ise the heterog eneity in protein connectivity and path s tructure across proﬁles. The cardinalities of the resulting sets diﬀer across the f our le vels, taking v alues 4, 9, 13 and 12, respectiv el y , indicating variation in structural comple xity betw een proﬁles. The estimated proﬁle g raphs obtained with our method exhibit varying lev els of sparsity across groups, compared with those from the GemBag approach. For g roups 0 and 1, the inferred graphs are substantiall y sparser than those produced by GemBag, sugges ting that e xplicitly modeling the eﬀect of 𝑋 on 𝑌 𝑉 | X ma y e xplain a lar g e por tion of the dependence structure within 𝑌 𝑉 | X , thereb y reducing the number of residual conditional associations. T able 1 pro vides a direct comparison of edg e detection results between GemBag and BPUGM. For groups 0 and 1, all edg es identiﬁed b y BPUGM are also detected by GemBag, while f or groups 2 and 3 the majority of BPUGM edg es are like wise reco v ered. This patter n suggests that BPUGM yields a more parsimonious representation of the conditional dependence structure. Moreo ver , group-speciﬁc graphs generated b y our method (Figure 8) appear more heterog eneous than those obtained with GemBag (Figure 9), indicating a greater ability to capture g roup-speciﬁc diﬀerences. BPUGM 𝑈 ( 0 ) 𝑈 ( 1 ) 𝑈 ( 2 ) 𝑈 ( 3 ) T otal GemBag Edge No edge Edge N o edge Edge N o edge Edge N o edge Edge N o edge Edge 3 12 10 4 14 0 13 0 40 16 No edge 0 138 0 139 4 135 3 137 7 549 T able 1: Compar ison of GemBag and BPUGM edg e detection Finall y , we repeated the e xper iment 100 times, randomly removing 10% of the data each time to assess the robustness of the inf er red models. For each repetition, w e computed the balanced accuracy between the multiple-graphs obtained from the reduced dataset and the multiple-g raphs estimated from the full dataset, to be considered as the ground tr uth f or this metric. T able 2 show s that BPUGM exhibited higher o verall robustness than GemBag, achieving a higher ov erall balanced accuracy (0.9736 vs. 0.9339). In par ticular , BPUGM substantiall y outperf ormed GemBag in the two smallest-sample groups ( 𝑈 ( 0 ) and 𝑈 ( 1 ) ), where it inf er red sparser graphs, while perf or mance w as comparable betw een methods in the larg er groups ( 𝑈 ( 2 ) and 19 AKT AKTp.308 AKTp.473 B AD B AD.p112 B AD.p136 B AD.p155 B AX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 0,1,3 0,1,3 0 0,1 0,1,2 0,1,3 0,1 0,1,2 0,1,3 0,1,2 0,2 0,1 0,2 0,1,3 0,1,2 0 0,2,3 0 0,2,3 3 0,1 0,1,3 0,1,3 1 0 Figure 7: The selected proﬁle undirected graph model f or protein data 20 𝑈 ( 3 ) ). This consistent impro v ement demonstrates the advantag e of BPUGM in model selection, highlighting its greater s tability and ability to reco v er graph str uctures closer to the or iginal model under data per turbations. BPUGM GemBag Mean SE Mean SE 𝑈 ( 0 ) 0.9803 0.0425 0.8607 0.0502 𝑈 ( 1 ) 0.9622 0.0395 0.9509 0.0394 𝑈 ( 2 ) 0.9772 0.0172 0.9736 0.0262 𝑈 ( 3 ) 0.9758 0.0235 0.9829 0.0210 Ov erall 0.9736 0.0143 0.9339 0.0192 T able 2: Mean balanced accuracy and standard er ror (SE) between the multiple-graphs inf er red from reduced datasets and those estimated from the full dataset, computed across 100 repetitions, compar ing BPUGM and GemBag. 7 Discussion W e propose a class of g raphical models that generalizes both chain graphs and multiple g raphs and, f or the ﬁrst time, w e establish compatibility betw een these tw o types of graph. In line with L WF chain graphs, proﬁle undirected graphs can be used for modelling the proﬁle conditional independencies resulting from a sequence of non-independent reg ression models inv ol ving all response variables. From this perspectiv e, the speciﬁcation of a class of proﬁle chain graphs represents an interesting generalization to e xplore proﬁle independencies in a multivariate regression setting. The parameter ization discussed in Section 5 f or the Gaussian case is quite standard. U nder the assumption of a Multinomial sampling scheme f or the multiv ariate outcome v ector , a parameterization based on the log-linear transf ormation [Laur itzen, 1996] could be used f or proﬁle undirected graph models. W e dev eloped a Bay esian inferential procedure and a companion EM algor ithm f or fast inference. Alternative inferential approaches f or this type of graph are possible; f or e xample, model compar ison within the class of proﬁle undirected graphs can be based on the likelihood ratio test in the case of nested models. These graphical models are smooth and belong to the cur v ed exponential f amily , so the likelihood ratio test has an asymptotic chi-square distribution. W e demonstrate the practical utility of this class of models through the anal y sis of protein netw ork data from multiple subtypes of acute my eloid leukemia. In this application, our proposed approach yields more robust network estimates than GemBag, as evidenced b y higher balanced accuracy when the experiment is repeated 100 times with 10% of the data randomly remo v ed at eac h iteration. Although this represents limited empirical evidence, the observed gains in robustness sugg est that our method tends to include fe w er false-positiv e 21 edg es. This beha vior is consistent with the e xplicit modeling of external f actors acting on the nodes, which allo ws the method to disentangle g enuine conditional dependencies from associations induced b y unmodeled heterog eneity . As a result, BPUGM lear ns sparser and more stable g raph structures by ex cluding edges that are not truly suppor ted by the data, whereas GemBag appears to retain weak er residual associations that are less stable across subsamples. Ov erall, these ﬁndings indicate that explicitl y accounting f or external factors can lead to more reliable inference of biologicall y meaningful netw ork connections. A ckno wledg ement The authors gratefully ackno w ledge Andrea Lazzerini f or his contributions. Supplementary material The Supplementar y Mater ial includes the proﬁle local Marko v proper ty , proofs of the theorems and propositions, details of the EM algorithm, the data-generating mechanism used in the simulation studies, and additional simulation results 22 R ef erences S. A. Andersson, D. Madigan, and M. D. P erlman. Alter nativ e marko v proper ties f or chain g raphs. Scandinavian journal of statistics , 28(1):33–85, 2001. A. Bhadra and B. K. Mallick. Joint high-dimensional ba yesian v ar iable and co variance selection with an application to eqtl analy sis. Biometrics , 69(2):447–457, 2013. T . T . Cai, H. Li, W . Liu, and J. Xie. Co variate-adjusted precision matr ix estimation with an application in g enetical genomics. Biometrika , pag e ass058, 2012. M. Chen, Z. R en, H. Zhao, and H. Zhou. Asymptoticall y normal and eﬃcient estimation of co variate-adjusted gaussian graphical model. Jour nal of the American S tatistical Association , 111(513):394–406, 2016. G. Consonni, L. La Rocca, and S. Peluso. Objective bay es co v ariate-adjusted sparse g raphical model selection. Scandinavian Journal of Statistics , pages 741–764, 2017. ISSN 1467-9469. doi: 10.1111/sjos.12273. URL http://dx.doi.org/10.1111/sjos.12273 . J. Corander . Labelled g raphical models. Scandinavian Journal of Statistics , 30(3):493–508, 2003. doi: 10.1111/1467-9469.00344. D. Co x and N. W ermuth. A g eneral condition f or a v oiding eﬀect re versal after marginalization. Jour nal of the Ro yal S tatistical Socie ty, Series B , 65:937–941, 2003. P . Danaher, P . W ang, and D. Witten. The joint graphical lasso f or in verse cov ar iance estimation across multiple classes. Jour nal of the Roy al Statistical Society : Series B , 76:373–397, 2014. A. P . Dempster , N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algor ithm. Journal of the Royal Statistical Society. Series B, Methodological , 39(1):1–38, 1977. ISSN 0035-9246. M. Dr ton. Discrete chain graph models. Bernoulli , 15(3):736–753, 2009. doi: 10.3150/08-BEJ172. URL https://doi.org/10.3150/08-BEJ172 . M. Fr ydenberg. The chain graph mark o v proper ty . Scandinavian Journal of Statistics , pages 333–353, 1990. L. Gan, N. N. N ar isetty , and F . Liang. Ba yesian regular ization f or g raphical models with uneq ual shrinkage. Journal of the American Statis tical Association , 114(527):1218–1231, 2019. ISSN 0162-1459. E. Georg e and R. McCulloch. V ariable selection via Gibbs sampling. Journal of the American Statistical Association , 88(423):881–889, 1993. ISSN 0162-1459. doi: 10.1080/01621459.1993.10476353. J. Guo, E. Le vina, G. Michailidis, and J. Zhu. Joint estimation of multiple graphical models. Biometrika , 98 (1):1–15, 2011. 23 S. Hojsg aard. Split models f or conting ency tables. Computational Statis tics & Data Analysis , 42(4):621–645, 2003. M. Kanehisa, S. Goto, Y . Sato, M. Fur umichi, and M. T anabe. K egg f or integration and inter pretation of larg e-scale molecular data sets. Nucleic Acids Resear ch , 40(D1):D109–D114, 2011. S. M. Kornblau, R. Tibes, Y . H. Qiu, W . Chen, H. M. Kantarjian, and M. Andreeﬀ. Functional proteomic proﬁling of aml predicts response and survival. Blood , 113(1):154–164, 2009. S. L. Lauritzen. Graphical Models . Oxf ord U niv . Press, N ew Y ork, 1996. S. L. Laur itzen and N. W ermuth. Graphical models f or associations between variables, some of which are qualitativ e and some q uantitative. The Annals of S tatistics , pag es 31–57, 1989. W . Lee and Y . Liu. Simultaneous multiple response regression and inv erse co variance matr ix estimation via penalized gaussian maximum likelihood. Journal of Multivariat e Analysis , 111:241–255, 2012. H. Nyman, J. P ensar , T . K oski, and J. Corander . Stratiﬁed graphical models - conte xt-speciﬁc independence in graphical models. Bayesian Analysis , 9(4):883–908, 2014. H. Nyman, J. Pensar , T . K oski, and J. Corander . Conte xt-speciﬁc independence in g raphical log-linear models. Computational Statistics , 31(4):1493–1512, 2016. C. B. Peterson, F . Stingo, and M. V annucci. Ba y esian inf erence of multiple Gaussian graphical models. Journal of the American Statistical Association , 110(509):159–174, 2015. V . R ock ov a and E. I. George. EMV S: The EM approach to Ba yesian variable selection. Jour nal of the American Statistical Association , 109(506):828–846, 2014. doi: 10.1080/01621459.2013.869223. URL http://dx.doi.org/10.1080/01621459.2013.869223 . V . R ock ov a and E. I. George. The Spike-and-Slab LASSO. Journal of the American Sta- tistical Association , 113(521):431–444, 2018. doi: 10.1080/01621459.2016.1260469. URL https://doi.org/10.1080/01621459.2016.1260469 . A. J. R othman, E. Levina, and J. Zhu. Sparse multiv ar iate regression with co v ariance estimation. Journal of Computational and Gr aphical Statis tics , 19(4):947–962, 2010. E. H. Simpson. The inter pretation of interaction in contingency tables. Jour nal of the Roy al S tatistical Society : Series B , 13:238–241, 1951. N. W er muth and K. Sadeghi. Sequences of regressions and their independences. TEST , 21(2):215–252, 2012. 24 X. Y ang, L. Gan, N. N. Narisetty , and F . Liang. Gembag: Group estimation of multiple bay esian graphical models. Jour nal of machine lear ning r esearc h , 22:1–48, 2021. ISSN 1532-4435. J. Yin and H. Li. A sparse conditional g aussian graphical model f or analy sis of g enetical g enomics data. The Annals of Applied Statistics , 5(4):2630, 2011. 25 AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 0 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 1 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 2 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MYC PTEN PTEN.p TP53 XIAP 𝑈 ( 3 ) Figure 8: Induced undirected multiple-graph f or the proﬁle outcome v ector 𝑌 𝑉 ( 𝑥 ) 26 AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 0 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 1 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 2 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 3 ) Figure 9: Induced multiple-graph from GemBag 27 Supplemental Materials: Proﬁle Graphical Models A Proﬁle Local Mark o v property The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the proﬁle outcome vectors 𝑌 𝑉 | X satisfy the proﬁle undir ected Local Marko v Pr oper ty ( U -LMP) wrt the graph G U = ( 𝑉 , E U ) if, f or an y v er te x 𝑎 ∈ 𝑉 𝑌 𝑎 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑉 \ { 𝑎 ∪ 𝑛 𝑏 𝑥 ( 𝑎 ) } ( 𝑥 ) | 𝑌 𝑛𝑏 𝑥 ( 𝑎 ) ( 𝑥 ) , 𝑥 ∈ X . (S1) B Proofs Proof 1 of Theor em 1. Let G U = ( 𝑉 , E U ) be a pr oﬁle undir ected g raph and 𝐷 ⊆ 𝑉 be any 𝑥 -disconnected set with 𝑥 -connected components 𝐾 1 , . . . , 𝐾 𝑟 suc h that f or every pair 𝐾 𝑖 , 𝐾 𝑗 with 𝑖 , 𝑗 = 1 , . . . , 𝑟 , 𝑖 ≠ 𝑗 , for the U -CSMP wr t G U w e hav e 𝑌 𝐾 𝑖 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐾 𝑗 ( 𝑥 ) | 𝑌 𝑉 \ { 𝐾 𝑖 , 𝐾 𝑗 } ( 𝑥 ) , 𝑥 ∈ X . (S2) F or any pair 𝐾 𝑖 , 𝐾 𝑗 ⊂ 𝐷 with 𝑖 , 𝑗 = 1 , . . . , 𝑟 , 𝑖 ≠ 𝑗 , the set 𝑆 𝑖 𝑗 = 𝑉 \ { 𝐾 𝑖 , 𝐾 𝑗 } is an 𝑥 -separ ator . Then, the U -CSMP implies the U -GMP wrt G U . Conv ersely, consider any 𝑥 -connected set 𝐶 wrt G U and let 𝑛 𝑏 𝑥 ( 𝐶 ) = Ð 𝑎 ∈ 𝐶 𝑛 𝑏 𝑥 ( 𝑎 ) be the neighbour se t including 𝐶 and let 𝑆 𝐶 = 𝑛 𝑏 𝑥 ( 𝐶 ) \ 𝐶 be an 𝑥 -separator f or the sets 𝐶 and 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) , f or any 𝑥 ∈ X . The U -GMP implies t hat 𝑌 𝐶 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) ( 𝑥 ) | 𝑌 𝑆 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (S3) N ot e t hat 𝐶 ∪ { 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) } is an 𝑥 -disconnected set, for 𝑥 ∈ X . W e distinguish tw o cases, whether 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) is 𝑥 -connected or 𝑥 -disconnected. In the ﬁrs t case, the 𝑥 -connect ed components of 𝐶 ∪ { 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) } ar e 𝐶 and 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) , then the U -CSMP is satisﬁed. If 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) is 𝑥 -disconnected wit h 𝐾 1 ∪ · · · ∪ 𝐾 𝑟 connected components, the U -GMP also implies that 𝑌 𝐶 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑆 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (S4) Then the U -GMP implies the U -CSMP wrt G U . Proof 2 of Proposition 1. Consider a proﬁle undirect ed gr aph G U = ( 𝑉 , E U ) associated to the proﬁle outcome v ectors 𝑌 𝑉 | X and the induced class 𝑈 𝑉 | X of multiple undir ected gr aphs. If the pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] 1 satisfy the U -CSMP wrt G U , the U -GMP is also satisﬁed from Theor em 1. So, given thr ee disjoint subsets 𝐴, 𝐵, 𝐶 of 𝑉 , 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) , (S5) wher e 𝐴 and 𝐵 are 𝑥 -separ ated by 𝐶 , with 𝑥 ∈ X . The result follo w s by Deﬁnition 5, since 𝐴 and 𝐵 ar e 𝑥 -separated by 𝐶 in G B if and only if they ar e 𝑥 -separ ated by 𝐶 in 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X , with 𝑥 ∈ X . Proof 3 of Proposition 2. Given an undir ected gr aph 𝑈 = ( 𝑉 , 𝐸 𝑈 ) associated to a random v ector 𝑌 𝑉 , the g lobal and the pair wise Marko v pr operties are eq uiv alent if the joint pr obability distribution 𝑃 ( 𝑌 𝑉 ) is strictly positiv e; see Lauritzen [1996]. The proposition f ollow s by applying this result to the strictly positive pr obability distribution 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] of any proﬁle outcome v ector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X . Proof 4 of Theorem 2. F irs t of all we recall that, given a proﬁle undir ected gr aph G U associated to the proﬁle outcome vect ors 𝑌 𝑉 | X , for Theor em (1) t he probability distributions 𝑃 [ 𝑌 𝑉 | X ] satisﬁes the U -CSMP if and only if the U -GMP is satisﬁed. In or der t o pro v e the compatibility, w e distinguish thr ee cases: a) if a set 𝐷 is 𝑥 -disconnected in G U with 𝑥 = ∅ , this means t hat it is a connected set and then the same set 𝐷 is also connected in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) and, ther ef ore, the U -CSMP trivially implies the condition (6) as no independence s tatement is implied f or 𝑌 𝐷 giv en 𝑋 and the remaining set of variables; in this case condition ( 𝑖 𝑖 ) is not inv oked as no independence statements hold f or 𝑌 𝐷 r egar dless of any 𝑌 𝑗 ∈ 𝑌 𝐷 is dependent or independent of 𝑋 giv en the remaining set of variables; b) if a set 𝐷 is 𝑥 -disconnected in G U f or all 𝑥 ∈ X , then the same set 𝐷 is also disconnected in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) and, ther efor e, the U -CSMP implies the condition (6) for the class of induced chain gr aphs; in this case condition ( 𝑖 𝑖 ) is also not invoked as 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , ∀ 𝑥 ∈ X ⇒ 𝑌 𝐾 1 ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } , r egar dless of any 𝑌 𝑗 ∈ 𝑌 𝐷 is dependent or independent of 𝑋 giv en the remaining set of v ariables; c) if a se t 𝐷 is 𝑥 -disconnect ed in G U f or some 𝑥 ∈ Z ⊂ X with 𝑥 ≠ ∅ , then t he same set 𝐷 is connect ed in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) ; t he U -CSMP implies condition (6) as 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , 𝑥 ∈ Z ⊂ X ⇒ 𝑌 𝐾 1 ⊥ ⊥ / . . . ⊥ ⊥ / 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } ; 2 in this case 𝑌 𝐷 cannot be independent of 𝑋 giv en 𝑌 𝑉 \ 𝐷 as the proﬁle conditional distribution 𝑌 𝐷 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) behav es diﬀerently f or any 𝑥 ∈ 𝑋 , i.e. 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) f or 𝑥 ∈ Z and 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ / . . . ⊥ ⊥ / 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) for 𝑥 ∈ X \ Z ; condition ( 𝑖𝑖 ) is then r equired so that 𝑌 𝐷 ⊥ ⊥ / 𝑋 | 𝑌 𝑉 \ 𝐷 . C EM algorithm The E-step tak es the e xpectation of the complete data log posterior 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ ( 𝑞 − 1  𝑥 = 0 h log  P ( 𝑌 𝑣 𝑥 | b Ω 𝑥 , b 𝛽 𝑥 , b 𝛼 )  i + 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 h log  P ( b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 )  i + 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1 h log  P ( b 𝛽 𝑖 𝑥 | 𝜃 ∗ 𝑖 )  + log  P ( b 𝜔 𝑖 𝑖 , 𝑥 | 𝜏 )  i ) (S6) Let 𝑄 = 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ [ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ] = ⟨ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ⟩ 𝑄 ∝ * 1 2 𝑞 − 1  𝑥 = 0 " 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr ( 𝑛 𝑥  𝑘 = 1  𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )   𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )  ⊤ b Ω 𝑥 ) # + 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1 h log  P ( b 𝛽 𝑖 𝑥 | 𝜃 ∗ 𝑖 )  − 𝜏 b 𝜔 𝑖 𝑖 , 𝑥 i + 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 h log  P ( b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 )  i + , (S7) where tr ( 𝐴 ) is the trace of matrix 𝐴 and det ( 𝐴 ) its deter minant. Setting a Laplace spik e-and-slab prior on 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 as in Y ang et al. [2021], and a Normal spike-and-slab on 𝛽 𝑖 𝑥 | 𝜃 𝑖 as in R ock ov a and Georg e [2014], (S7) becomes 𝑄 ∝ 1 2 𝑞  𝑥 = 1 " 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr ( 𝑛 𝑥  𝑘 = 1  𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )   𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )  ⊤ b Ω 𝑥 ) # − 𝑞  𝑥 = 1 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  E 𝜃 | b Δ ,𝑌 𝑉 𝑥  1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0   + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞  𝑥 = 1  𝑖 < 𝑗  | b 𝜔 𝑖 𝑗 , 𝑥 |  E 𝑅 | b Δ ,𝑌 𝑉 𝑥  𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0    (S8) 3 Equation (S8) onl y depends on ( 𝜃 , 𝑅 ) through E 𝜃 | b Δ ,𝑌 𝑉 𝑥  1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0  = E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 𝜃 𝑖 ] 𝜆 1 + E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 1 − 𝜃 𝑖 ] 𝜆 0 = 𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0 (S9) and E 𝑅 | b Δ ,𝑌 𝑉 𝑥  𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0  = E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] 𝜈 1 + E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 1 − 𝑟 𝑖 𝑗 , 𝑥 ] 𝜈 0 = 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 , (S10) The model hierarch y separates 𝜃 from the data 𝑌 𝑉 𝑥 through the coeﬃcients 𝛽 and the precision Ω so that P [ 𝜃 | b Δ , 𝑌 𝑉 𝑥 ] = P [ 𝜃 | b Δ ] = P [ 𝜃 | 𝛽 , Ω ] . This leads to 𝜃 ∗ 𝑖 = E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 𝜃 𝑖 ] = E 𝜃 | Δ [ 𝜃 𝑖 ] = P [ 𝜃 𝑖 = 1 | 𝛽 𝑖 , 𝜔 𝑖 · ] = P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] P [ 𝜃 𝑖 = 1 ] P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] P [ 𝜃 𝑖 = 1 ] + P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] P [ 𝜃 𝑖 = 0 ] = P [ 𝛽 𝑖 | 𝜃 𝑖 = 1 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] 𝑝 2 P [ 𝛽 𝑖 | 𝜃 𝑖 = 1 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] 𝑝 2 + P [ 𝛽 𝑖 | 𝜃 𝑖 = 0 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] ( 1 − 𝑝 2 ) = 𝑝 2 P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] Î 𝑞 − 1 𝑥 = 0 P 1 ( 𝛽 𝑖 𝑥 ) 𝑝 2 P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] Î 𝑞 − 1 𝑥 = 0 P 1 ( 𝛽 𝑖 𝑥 ) + ( 1 − 𝑝 2 ) P [ 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] Î 𝑞 − 1 𝑥 = 0 P 0 ( 𝛽 𝑖 𝑥 ) (S11) The P [ 𝜔 𝑖 · | 𝜃 𝑖 ] in (S11) is 4 P ( 𝜔 𝑖 · | 𝜃 𝑖 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝛾 𝑖 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ) ! P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑥 | 𝜃 𝑖 , 𝜃 𝑗 , 𝛾 𝑖 𝑗 ) ∗ P ( 𝜃 𝑗 ) P ( 𝛾 𝑖 𝑗 ) # =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝛾 𝑖 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗  𝑝 𝛾 𝑖 𝑗 1 ( 1 − 𝑝 1 ) 1 − 𝛾 𝑖 𝑗  # =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( 𝑝 1 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + 𝑝 1 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) + ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ) (S12) 5 Then, P ( 𝜔 𝑖 · | 𝜃 𝑖 = 1 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( 𝑝 1 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + 𝑝 1 ( 1 − 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) + ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ) =  𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 1 𝑝 2 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝑝 1 ) 𝑝 2 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) =  𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) n 𝑝 1 𝑝 2 ( 1 − 𝑝 3 ) 𝑞 + 𝑝 1 ( 1 − 𝑝 2 ) ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) 𝑝 2 o + 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 )  𝑝 1 𝑝 2 𝑝 𝑞 3 + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 4  + 𝑝 1 𝑝 2 𝑞 − 1  𝑘 = 1 𝑝 𝑘 3 ( 1 − 𝑝 3 ) 𝑞 − 𝑘  𝑛 ∈ 𝐴 ( 𝑘 ) 𝑞 𝑞 − 1 Ö 𝑙 = 0 P 𝑛 𝑙 ( 𝜔 𝑖 𝑗 ,𝑙 ) # , (S13) with 𝐴 ( 𝑘 ) 𝑞 : = { 𝑛 = ( 𝑛 0 , . . . , 𝑛 𝑞 − 1 : 𝑛 𝑙 ∈ { 0 , 1 } f or all 0 ≤ 𝑙 ≤ 𝑞 − 1 and Í 𝑞 − 1 𝑙 = 0 = 𝑘 ) } , the set of { 0 , 1 } -valued binary seq uences of length q, with 𝑘 elements with P 1 ( 𝜔 𝑖 𝑗 , 𝑘 ) , 𝑘 = 1 , . . . , 𝑞 − 1 . W e note that the cardinality of 𝐴 ( 𝑘 ) 𝑞 is # 𝐴 ( 𝑘 ) 𝑞 =  𝑞 𝑘  and Í 𝑞 𝑘 = 0 # 𝐴 ( 𝑘 ) 𝑞 = Í 𝑞 𝑘 = 0  𝑞 𝑘  = 2 𝑞 . 6 P ( 𝜔 𝑖 · | 𝜃 𝑖 = 0 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) # =  𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) # =  𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 𝑝 1 ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) ) + 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 1 𝑝 4 # (S14) 7 For instance, assuming 𝑞 = 4 , Eq uation (S13) and (S14) are of the f or m P ( 𝜔 𝑖 · | 𝜃 𝑖 = 1 ) =  𝑗 " P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) n 𝑝 1 𝑝 2 ( 1 − 𝑝 3 ) 4 + 𝑝 1 ( 1 − 𝑝 2 ) ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) 𝑝 2 o + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) n 𝑝 1 𝑝 2 𝑝 4 3 + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 4 o + 𝑝 1 𝑝 2 n 𝑝 3 ( 1 − 𝑝 3 ) 3  P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 )  + 𝑝 2 3 ( 1 − 𝑝 3 ) 2  P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 )  + 𝑝 3 3 ( 1 − 𝑝 3 )  P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 )  o # , P ( 𝜔 𝑖 · | 𝜃 𝑖 = 0 ) =  𝑗 " P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) ( 𝑝 1 ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) 𝑝 1 𝑝 4 # . (S15) 8 The conditional e xpectation E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] = 𝑟 ∗ 𝑖 𝑗 , 𝑥 is 𝑟 ∗ 𝑖 𝑗 , 𝑥 = E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] = E 𝑅 | b Δ [ 𝑟 𝑖 𝑗 , 𝑥 ] = P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | Δ ) = P ( 𝛾 𝑖 𝑗 = 1 | Δ ) ∗ h P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝑟 𝑖 𝑗 , / 𝑥 = 1 | Δ ) n P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 0 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 0 , Δ ) o i (S16) 1 . P ( 𝛾 i j = 1 | 𝚫 ) in Equation (S16) is P ( 𝛾 𝑖 𝑗 = 1 | Δ ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) P ( 𝛾 𝑖 𝑗 = 1 ) P ( Δ | 𝛾 𝑖 𝑗 = 1 ) P ( 𝛾 𝑖 𝑗 = 1 ) + P ( Δ | 𝛾 𝑖 𝑗 = 0 ) P ( 𝛾 𝑖 𝑗 = 0 ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑝 1 P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑝 1 + P ( Δ | 𝛾 𝑖 𝑗 = 0 ) ( 1 − 𝑝 1 ) . , (S17) P ( Δ | 𝛾 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝜃 𝑖  𝜃 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ) ! P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑥 | 𝜃 𝑖 , 𝜃 𝑗 , 𝛾 𝑖 𝑗 ) ∗ 𝑞 − 1 Ö 𝑥 = 0 P ( 𝛽 𝑖 , 𝑥 | 𝜃 𝑖 ) P ( 𝛽 𝑗 , 𝑥 | 𝜃 𝑗 ) ! P ( 𝜃 𝑖 ) P ( 𝜃 𝑗 ) # , (S18) 9 P ( Δ | 𝛾 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝜃 𝑖  𝜃 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ∗ 𝑞 − 1 Ö 𝑥 = 0  𝜃 𝑖 P 1 ( 𝛽 𝑖 𝑥 ) + ( 1 − 𝜃 𝑖 ) P 0 ( 𝛽 𝑖 𝑥 )   𝜃 𝑗 P 1 ( 𝛽 𝑗 𝑥 ) + ( 1 − 𝜃 𝑗 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ∗ 𝑝 𝜃 𝑖 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑖 𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗 # =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝛾 𝑖 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ! + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ∗ ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ) # (S19) 10 Set 𝛾 𝑖 𝑗 = 0 and let 𝑔 0 ( 𝜔 𝑖 𝑗 ) = P ( Δ | 𝛾 𝑖 𝑗 = 0 ) 𝑔 0 ( 𝜔 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) + 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) # = 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) " 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! # . (S20) Set 𝛾 𝑖 𝑗 = 1 and let 𝑔 1 ( 𝜔 𝑖 𝑗 ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑔 1 ( 𝜔 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ∗ 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ) # 11 𝑔 1 ( 𝜔 𝑖 𝑗 ) = 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) ∗ ( 𝑝 𝑞 3 ∗ 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑝 4 ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) + 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ∗ ( ( 1 − 𝑝 3 ) 𝑞 ∗ 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + ( 1 − 𝑝 4 ) ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) + 𝑝 2 2 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  𝑞 − 1  𝑘 = 1 𝑝 𝑘 3 ( 1 − 𝑝 3 ) 𝑞 − 𝑘  𝑛 ∈ 𝐴 ( 𝑘 ) 𝑞 𝑞 − 1 Ö 𝑙 = 0 P 𝑛 𝑙 ( 𝜔 𝑖 𝑗 ,𝑙 ) . (S21) 12 2 . P ( r i j , x = 1 | 𝛾 i j = 1 , 𝜃 i , 𝜃 j , 𝚫 ) = h 𝜃 i , 𝜃 j ( 𝜔 i j , x ) is equal to = P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 0 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 = P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 0 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 = P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 0 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 (S22) Setting P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 1 | 𝛾 𝑖 𝑗 , 𝜃 𝑖 , 𝜃 𝑗 ) as deﬁned in Equation ( ?? ) w e compute P ( 𝑟 𝑖 𝑗 , 𝑥 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) =  𝑟 𝑖 𝑗 , / 𝑥 " 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 𝛿 𝑟 𝑖 𝑗 , / 𝑥 ( 𝑟 𝑖 𝑗 , 𝑥 ) # = 𝜃 𝑖 𝜃 𝑗 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 + ( 1 − 𝜃 𝑖 𝜃 𝑗 )  𝑟 𝑖 𝑗 , / 𝑥 h 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 𝛿 𝑟 𝑖 𝑗 , / 𝑥 ( 𝑟 𝑖 𝑗 , 𝑥 ) i = 𝜃 𝑖 𝜃 𝑗 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) h 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 + ( 1 − 𝑝 4 ) ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) i = 𝜃 𝑖 𝜃 𝑗 Bernoulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 3 ) + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) Ber noulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 4 ) (S23) P ( 𝑟 𝑖 𝑗 , 𝑥 | 𝛾 𝑖 𝑗 = 0 , 𝜃 𝑖 , 𝜃 𝑗 ) =  𝑟 𝑖 𝑗 , / 𝑥 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) (S24) 13 Substituting Equation (S23), Equation (S22) ℎ 𝜃 𝑖 , 𝜃 𝑗 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 𝑝 3 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 4 i P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 𝑝 3 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 4 i + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 ( 1 − 𝑝 3 ) + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) ( 1 − 𝑝 4 ) i (S25) Thus 𝑟 ∗ 𝑖 𝑗 , 𝑥 is equal to 𝑟 ∗ 𝑖 𝑗 , 𝑥 = P ( 𝛾 𝑖 𝑗 = 1 | Δ ) ∗ h P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝑟 𝑖 𝑗 , / 𝑥 = 1 | Δ ) n P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 0 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 0 , Δ ) o i = 𝛾 ∗ 𝑖 𝑗 " 𝜃 ∗ 𝑖 𝜃 ∗ 𝑗 ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + 𝑟 ∗ 𝑖 𝑗 , / 𝑥 ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) n 𝜃 ∗ 𝑖 ( 1 − 𝜃 ∗ 𝑗 ) + ( 1 − 𝜃 ∗ 𝑖 ) 𝜃 ∗ 𝑗 + ( 1 − 𝜃 ∗ 𝑖 ) ( 1 − 𝜃 ∗ 𝑗 ) o # (S26) In which 𝛾 ∗ 𝑖 𝑗 = 𝑔 1 ( 𝜔 𝑖 𝑗 ) 𝑝 1 𝑔 1 ( 𝜔 𝑖 𝑗 ) 𝑝 1 + 𝑔 0 ( 𝜔 𝑖 𝑗 ) ( 1 − 𝑝 1 ) , (S27) From Equation (S25) w e see that ℎ 1 , 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 , 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 , 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) . W e then deﬁned ℎ 1 , 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) , g etting: ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 3 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 3 + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 1 − 𝑝 3 ) ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 4 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 4 + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 1 − 𝑝 4 ) (S28) In the M-step we maximise 𝑄 ( Δ ) w .r .t. ( Ω 𝑥 , 𝛽 𝑥 , 𝛼 ) . Without lost of g enerality , we consider that w e ha v e mean centered observations bef ore proceeding, i.e. 𝛼 = { 0 , . . . , 0 } . Let 𝑌 𝑥 ∈ R 𝑛 × 𝑝 a matrix with column 𝑘 equal to the 𝑌 𝑥 , 𝑘 f or individual 𝑘 , 𝛽 ∈ R 𝑝 × 1 = [ 𝛽 𝑖 𝑥 ] 𝑖 ∈ 𝑉 and 1 𝑛 𝑥 ∈ R 1 × 𝑛 𝑥 a column v ector of ones. Setting E 𝜃 | b Δ ,𝑌 𝑉 𝑥 h 1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0 i = 𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0 and E 𝑅 | b Δ ,𝑌 𝑉 𝑥 h 𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0 i = 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 , Equation (S8) can be seen as: 14 𝑄 ∝ 1 2 𝑞 − 1  𝑥 = 0  𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr   𝑌 ⊤ 𝑥 − 1 𝑛 𝑥 b 𝛽 𝑥   𝑌 ⊤ 𝑥 − 1 𝑛 𝑥 b 𝛽 𝑥  ⊤ b Ω 𝑥   − 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0  + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # 𝑄 ∝ 1 2 𝑞 − 1  𝑥 = 0 h 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr n  𝑌 ⊤ 𝑥 𝑌 𝑥 − 2 b 𝛽 𝑥 1 𝑛 𝑥 𝑌 ⊤ 𝑥 − b 𝛽 𝑥 1 𝑛 𝑥 1 ⊤ 𝑛 𝑥 b 𝛽 ⊤ 𝑥  b Ω 𝑥 o i − 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0  + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # (S29) T o 𝑄 w .r .t. 𝛽 𝑥 , we set the par tial derivativ e 𝛽 𝑥 , to 0: 𝜕 𝑄 𝜕 𝛽 𝑥 = b Ω ⊤ 𝑥 𝑌 ⊤ 𝑥 1 ⊤ 𝑛 𝑥 − b Ω ⊤ 𝑥 b 𝛽 𝑥 1 𝑛 𝑥 1 ⊤ 𝑛 𝑥 − diag  𝜃 ∗ 1 𝜆 1 + 1 − 𝜃 ∗ 1 𝜆 0 . . . 𝜃 ∗ p 𝜆 1 + 1 − 𝜃 ∗ p 𝜆 0  b 𝛽 x = 0 , (S30) sol ving Equation (S30) b 𝛽 𝑥 =  𝑛 𝑥 b Ω 𝑥 + 𝐷 Θ ∗  − 1 b Ω 𝑥 𝑌 ⊤ 𝑥 1 ⊤ 𝑛 𝑥 , (S31) with 𝐷 Θ ∗ = diag n 𝜃 ∗ 1 𝜆 1 + 1 − 𝜃 ∗ 1 𝜆 0 . . . 𝜃 ∗ p 𝜆 1 + 1 − 𝜃 ∗ p 𝜆 0 o . Equation (S31) has the f or m of a ridge regression estimator with penalty 𝐷 Θ ∗ . Maximising Q w .r .t b Ω 𝑥 f or eac h 𝑥 = 0 , . . . , 𝑞 − 1 , implies optimizing the f ollowing objectiv e function: 𝑄 ( b Ω 𝑥 ) = 𝑛 𝑥 2 log ( det ( b Ω 𝑥 ) ) − 𝑛 𝑥 2 tr n b 𝑆 𝑥 b Ω 𝑥 o − 𝑝  𝑖 = 1 𝜏 b 𝜔 𝑖 𝑖 , 𝑥 −  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # , (S32) with b 𝑆 𝑥 = 1 𝑛 𝑥 Í 𝑛 𝑥 𝑘 = 1 ( 𝑌 𝑥 , 𝑘 − b 𝛽 𝑥 ) ( 𝑌 𝑥 , 𝑘 − b 𝛽 𝑥 ) ⊤ . W e optimize 𝑄 ( b Ω 𝑥 ) subject to the constraints that b Ω 𝑥 ≻ 0 and | | b Ω 𝑥 | | 2 ≤ 𝐵 , with a reasonabl y larg e 𝐵 to obtain an objective function 𝑄 ( b Ω 𝑥 ) strictly conv e x, and guaranteed that the local solution b Ω 𝑥 is the unique 15 solution as in Y ang et al. [2021]. In order to optimize 𝑄 ( b Ω 𝑥 ) , we f ollo w the algorithm sugges ted in Gan et al. [2019]. D Simulations: Data generating mechanism W e generate obser vations from a set 𝑌 𝑉 | X of random v ectors associated to a proﬁle undirected graph G U with 𝑝 = 20 , 50 and 100 nodes and 𝑞 = 4 lev els of 𝑋 , such that 𝑥 ∈ X = { 0 , 1 , 2 , 3 } . Follo wing Peterson et al. [2015], w e ﬁrst construct Ω 0 , the precision matr ix of the baseline lev el 𝑥 = 0 . W e set Ω 0 to be a 𝑝 × 𝑝 symmetric matrix with main diagonal entries 𝜔 𝑎 𝑎 , 0 = 𝑎 , with 𝑎 = 1 , . . . , 𝑝 , and oﬀ-diagonal entries 𝜔 ( 𝑎 + 1 ) 𝑎 , 0 = 𝜔 𝑎 ( 𝑎 + 1 ) , 0 = 0 . 5 with 𝑎 = 1 , . . . , 19 and 𝜔 ( 𝑎 + 2 ) 𝑎 , 0 = 𝜔 𝑎 ( 𝑎 + 2 ) , 0 = 0 . 4 with 𝑎 = 1 , . . . , 18 . For all 𝑎 ∈ 𝑉 , w e set both 𝛼 𝑎 and 𝜁 𝑎 𝑥 to zero. F or 𝑥 = { 1 , 2 , 3 } , w e also set 𝜁 𝑎 𝑥 = 0 f or 𝑎 = 5 , . . . 20 and 𝜁 𝑎 𝑥 = 1 f or 𝑎 = 1 , . . . 4 , i.e., the e xter nal f actor 𝑋 aﬀects only the ﬁrst f our response v ar iables. The remaining precision matrices Ω 𝑥 f or 𝑥 = { 1 , 2 , 3 } are obtained as f ollo w , ﬁrst we set Ω 𝑥 = Ω 0 , then with probability 0.5 we set to zero its non-zero entr ies.W e then change the sparcity lev el of the precision matrices, varying the 𝑠 parameter from 0.0010 to 0.0050 to ha v e increasing number of non-zero elements. Data are generated by dra wing a random sample of size 𝑛 𝑥 = 50 from the distr ibution N ( 𝛽 𝑥 , Σ 𝑥 ) where Σ 𝑥 = Ω − 1 𝑥 and 𝛽 𝑥 = Σ 𝑥 𝜁 𝑥 , for all 𝑥 ∈ X . E More results 16 T able 1: A ccuracy , sensitivity , speciﬁcity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50 Scenar io 1: diﬀerent g raph A ccuracy Sensitivity Speciﬁcity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.914 0.000 0.837 0.026 0.914 0.000 0.875 0.006 GemBag 0.996 0.000 0.572 0.064 1.000 0.000 0.786 0.016 FGL 0.983 0.001 0.603 0.096 0.986 0.001 0.794 0.023 GGL 0.982 0.001 0.665 0.083 0.984 0.001 0.825 0.020 S = 0.025 BPUGM 0.899 0.000 0.431 0.007 0.915 0.000 0.673 0.002 GemBag 0.972 0.000 0.182 0.007 1.000 0.000 0.591 0.002 FGL 0.943 0.001 0.275 0.022 0.967 0.002 0.621 0.004 GGL 0.948 0.001 0.298 0.021 0.971 0.002 0.635 0.004 S = 0.050 BPUGM 0.804 0.000 0.195 0.002 0.915 0.000 0.555 0.000 GemBag 0.851 0.000 0.033 0.000 0.999 0.000 0.516 0.000 FGL 0.831 0.001 0.110 0.008 0.962 0.003 0.536 0.001 GGL 0.828 0.001 0.129 0.007 0.955 0.003 0.542 0.001 𝑝 = 50 S = 0.010 BPUGM 0.948 0.000 0.790 0.051 0.949 0.000 0.869 0.013 GemBag 0.999 0.000 0.623 0.150 1.000 0.000 0.811 0.037 FGL 0.998 0.000 0.353 0.190 0.998 0.000 0.676 0.047 GGL 0.997 0.000 0.490 0.194 0.997 0.000 0.744 0.048 S = 0.0025 BPUGM 0.948 0.000 0.596 0.018 0.949 0.000 0.773 0.004 GemBag 0.998 0.000 0.362 0.023 1.000 0.000 0.681 0.006 FGL 0.995 0.000 0.340 0.075 0.997 0.000 0.668 0.018 GGL 0.995 0.000 0.380 0.061 0.997 0.000 0.688 0.015 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.988 0.000 0.140 0.015 0.995 0.000 0.568 0.003 GGL 0.991 0.000 0.271 0.033 0.997 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.520 0.008 0.966 0.000 0.743 0.002 GemBag 0.999 0.000 0.378 0.013 1.000 0.000 0.689 0.003 FGL 0.999 0.000 0.223 0.047 1.000 0.000 0.611 0.012 GGL 0.999 0.000 0.277 0.043 1.000 0.000 0.638 0.011 S = 0.0025 BPUGM 0.965 0.000 0.546 0.004 0.966 0.000 0.756 0.001 GemBag 0.998 0.000 0.368 0.006 1.000 0.000 0.684 0.002 FGL 0.997 0.000 0.285 0.031 0.999 0.000 0.642 0.008 GGL 0.997 0.000 0.374 0.024 0.999 0.000 0.687 0.006 S = 0.0050 BPUGM 0.959 0.000 0.235 0.001 0.966 0.000 0.600 0.000 GemBag 0.991 0.000 0.123 0.000 1.000 0.000 0.561 0.000 FGL 0.989 0.000 0.100 0.004 0.998 0.000 0.549 0.001 GGL 0.990 0.000 0.127 0.003 0.998 0.000 0.563 0.001 17 T able 2: Accuracy , sensitivity , speciﬁcity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 2: { 𝐺 ( 0 ) = 𝐺 ( 1 ) } ≠ { 𝐺 ( 2 ) = 𝐺 ( 3 ) } A ccuracy Sensitivity Speciﬁcity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.913 0.000 0.837 0.025 0.914 0.000 0.875 0.006 GemBag 0.996 0.000 0.567 0.052 1.000 0.000 0.783 0.013 FGL 0.974 0.002 0.708 0.069 0.976 0.002 0.842 0.016 GGL 0.968 0.002 0.708 0.084 0.970 0.002 0.839 0.019 S = 0.025 BPUGM 0.894 0.000 0.399 0.006 0.914 0.000 0.657 0.001 GemBag 0.967 0.000 0.174 0.006 1.000 0.000 0.587 0.002 FGL 0.944 0.001 0.254 0.023 0.972 0.002 0.613 0.004 GGL 0.946 0.001 0.266 0.017 0.974 0.002 0.620 0.003 S = 0.050 BPUGM 0.804 0.000 0.213 0.001 0.915 0.000 0.564 0.000 GemBag 0.849 0.000 0.048 0.000 0.999 0.000 0.524 0.000 FGL 0.845 0.000 0.083 0.005 0.987 0.001 0.535 0.001 GGL 0.837 0.001 0.097 0.006 0.976 0.001 0.537 0.001 𝑝 = 50 S = 0.0010 BPUGM 0.948 0.000 0.815 0.069 0.948 0.000 0.881 0.017 GemBag 0.999 0.000 0.505 0.220 1.000 0.000 0.752 0.055 FGL 0.998 0.000 0.225 0.158 0.998 0.000 0.612 0.039 GGL 0.995 0.000 0.415 0.197 0.995 0.000 0.705 0.049 S = 0.0025 BPUGM 0.948 0.000 0.601 0.014 0.949 0.000 0.775 0.004 GemBag 0.998 0.000 0.383 0.017 1.000 0.000 0.691 0.004 FGL 0.996 0.000 0.531 0.099 0.998 0.000 0.765 0.025 GGL 0.995 0.000 0.446 0.079 0.997 0.000 0.722 0.020 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.989 0.000 0.136 0.014 0.996 0.000 0.566 0.003 GGL 0.992 0.000 0.270 0.034 0.998 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.549 0.008 0.965 0.000 0.757 0.002 GemBag 0.999 0.000 0.415 0.014 1.000 0.000 0.707 0.003 FGL 0.999 0.000 0.528 0.038 0.999 0.000 0.764 0.009 GGL 0.999 0.000 0.507 0.040 0.999 0.000 0.753 0.010 S = 0.0025 BPUGM 0.965 0.000 0.594 0.004 0.965 0.000 0.780 0.001 GemBag 0.998 0.000 0.412 0.008 1.000 0.000 0.706 0.002 FGL 0.998 0.000 0.423 0.044 0.999 0.000 0.711 0.011 GGL 0.998 0.000 0.413 0.045 0.999 0.000 0.706 0.011 S = 0.0050 BPUGM 0.959 0.000 0.269 0.001 0.966 0.000 0.617 0.000 GemBag 0.991 0.000 0.147 0.001 1.000 0.000 0.573 0.000 FGL 0.990 0.000 0.240 0.012 0.997 0.000 0.619 0.003 GGL 0.989 0.000 0.186 0.005 0.997 0.000 0.592 0.001 18 T able 3: Accuracy , sensitivity , speciﬁcity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 3: { 𝐺 ( 0 ) = 𝐺 ( 1 ) = 𝐺 ( 2 ) } ≠ { 𝐺 ( 3 ) } A ccuracy Sensitivity Speciﬁcity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.913 0.000 0.843 0.039 0.913 0.000 0.878 0.010 GemBag 0.998 0.000 0.677 0.091 1.000 0.000 0.839 0.023 FGL 0.990 0.000 0.698 0.167 0.991 0.000 0.844 0.042 GGL 0.985 0.001 0.703 0.164 0.986 0.001 0.844 0.040 S = 0.025 BPUGM 0.892 0.000 0.297 0.005 0.915 0.000 0.606 0.001 GemBag 0.967 0.000 0.139 0.004 0.999 0.000 0.569 0.001 FGL 0.947 0.001 0.200 0.011 0.976 0.001 0.588 0.002 GGL 0.948 0.001 0.198 0.011 0.978 0.001 0.588 0.002 S = 0.050 BPUGM 0.797 0.000 0.156 0.001 0.914 0.000 0.535 0.000 GemBag 0.849 0.000 0.023 0.000 1.000 0.000 0.511 0.000 FGL 0.832 0.001 0.076 0.005 0.970 0.002 0.523 0.000 GGL 0.826 0.001 0.085 0.006 0.960 0.002 0.523 0.000 𝑝 = 50 S = 0.0010 BPUGM 0.948 0.000 0.740 0.194 0.948 0.000 0.844 0.048 GemBag 0.999 0.000 0.260 0.194 1.000 0.000 0.630 0.049 FGL 0.999 0.000 0.040 0.039 1.000 0.000 0.520 0.010 GGL 0.997 0.000 0.140 0.122 0.997 0.000 0.569 0.030 S = 0.0025 BPUGM 0.948 0.000 0.626 0.017 0.949 0.000 0.788 0.004 GemBag 0.998 0.000 0.394 0.023 1.000 0.000 0.697 0.006 FGL 0.997 0.000 0.440 0.088 0.998 0.000 0.719 0.022 GGL 0.995 0.000 0.438 0.074 0.997 0.000 0.717 0.018 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.989 0.000 0.136 0.014 0.996 0.000 0.566 0.003 GGL 0.992 0.000 0.270 0.034 0.998 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.515 0.008 0.965 0.000 0.740 0.002 GemBag 0.999 0.000 0.397 0.013 1.000 0.000 0.698 0.003 FGL 0.999 0.000 0.485 0.039 0.999 0.000 0.742 0.010 GGL 0.998 0.000 0.467 0.037 0.999 0.000 0.733 0.009 S = 0.0025 BPUGM 0.965 0.000 0.545 0.005 0.966 0.000 0.755 0.001 GemBag 0.998 0.000 0.393 0.008 1.000 0.000 0.696 0.002 FGL 0.998 0.000 0.419 0.064 0.999 0.000 0.709 0.016 GGL 0.998 0.000 0.405 0.049 0.999 0.000 0.702 0.012 S = 0.0050 BPUGM 0.963 0.000 0.492 0.001 0.966 0.000 0.729 0.000 GemBag 0.996 0.000 0.354 0.003 1.000 0.000 0.677 0.001 FGL 0.992 0.000 0.493 0.046 0.995 0.000 0.744 0.011 GGL 0.994 0.000 0.525 0.012 0.997 0.000 0.761 0.003 19 T able 4: Accuracy , sensitivity , speciﬁcity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 4: Same g raph A ccuracy Sensitivity Speciﬁcity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.915 0.000 0.840 0.020 0.915 0.000 0.878 0.005 GemBag 0.996 0.000 0.642 0.052 1.000 0.000 0.821 0.013 FGL 0.966 0.001 0.575 0.092 0.970 0.001 0.773 0.020 GGL 0.977 0.001 0.772 0.078 0.980 0.001 0.876 0.019 S = 0.025 BPUGM 0.898 0.000 0.502 0.007 0.915 0.000 0.709 0.002 GemBag 0.969 0.000 0.264 0.008 1.000 0.000 0.632 0.002 FGL 0.907 0.003 0.420 0.026 0.928 0.003 0.674 0.004 GGL 0.923 0.003 0.470 0.023 0.943 0.004 0.706 0.005 S = 0.050 BPUGM 0.815 0.000 0.286 0.002 0.919 0.000 0.602 0.000 GemBag 0.852 0.000 0.095 0.001 1.000 0.000 0.547 0.000 FGL 0.792 0.001 0.319 0.009 0.884 0.002 0.602 0.001 GGL 0.799 0.001 0.312 0.009 0.894 0.003 0.603 0.001 𝑝 = 50 S = 0.0010 BPUGM 0.950 0.000 0.812 0.038 0.950 0.000 0.881 0.009 GemBag 0.999 0.000 0.693 0.089 1.000 0.000 0.846 0.022 FGL 0.998 0.000 0.193 0.057 0.999 0.000 0.596 0.014 GGL 0.999 0.000 0.590 0.212 0.999 0.000 0.795 0.053 S = 0.0025 BPUGM 0.949 0.000 0.603 0.015 0.950 0.000 0.777 0.004 GemBag 0.998 0.000 0.384 0.018 1.000 0.000 0.692 0.004 FGL 0.994 0.000 0.193 0.032 0.997 0.000 0.595 0.008 GGL 0.997 0.000 0.390 0.081 0.999 0.000 0.694 0.020 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.988 0.000 0.143 0.015 0.995 0.000 0.569 0.003 GGL 0.992 0.000 0.275 0.033 0.997 0.000 0.636 0.008 𝑝 = 100 S = 0.010 BPUGM 0.966 0.000 0.609 0.010 0.966 0.000 0.787 0.002 GemBag 0.999 0.000 0.466 0.012 1.000 0.000 0.733 0.003 FGL 0.998 0.000 0.127 0.015 0.999 0.000 0.563 0.004 GGL 0.999 0.000 0.444 0.069 1.000 0.000 0.722 0.017 S = 0.0025 BPUGM 0.965 0.000 0.591 0.003 0.966 0.000 0.779 0.001 GemBag 0.998 0.000 0.432 0.005 1.000 0.000 0.716 0.001 FGL 0.995 0.000 0.221 0.019 0.997 0.000 0.609 0.005 GGL 0.997 0.000 0.530 0.046 0.999 0.000 0.764 0.012 S = 0.0050 BPUGM 0.960 0.000 0.313 0.001 0.966 0.000 0.640 0.000 GemBag 0.991 0.000 0.190 0.001 1.000 0.000 0.595 0.000 FGL 0.986 0.000 0.146 0.003 0.995 0.000 0.571 0.001 GGL 0.990 0.000 0.264 0.009 0.998 0.000 0.631 0.002 20

Profile Graphical Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment