Profile Graphical Models

We introduce a novel class of graphical models, termed profile graphical models, that represent, within a single graph, how an external factor influences the dependence structure of a multivariate set of variables. This class is quite general and inc…

Authors: Alej, ra Avalos-Pacheco, Monia Lupparelli

Profile Graphical Models
Pr ofile Graphical Models Alejandra A v alos-P acheco 1,2,* , Monia Lupparelli 3 , and Francesco C. Stingo 3 1 Institute of Applied S tatistics, Johannes Kepler Univ ersity Linz 2 Har var d-MIT Center for Regulatory Science, Har var d Medical School 3 Depart. of Statistics, Computer Science, Applications “G. P ar enti”, Univ ersity of Florence * alejandra.av alos_pacheco@jku.at October 2024 Abstract W e introduce a no vel class of graphical models, ter med profile graphical models, that represent, within a single g raph, how an e xter nal factor influences the dependence structure of a multivariate set of variables. This class is quite general and includes multiple g raphs and chain g raphs as special cases. Profile graphical models capture the conditional dis tr ibutions of a multiv ar iate random v ector giv en different lev els of a risk factor , and lear n ho w the conditional independence structure among variables ma y vary across these risk profiles; we f or mally define this famil y of models and establish their corresponding Marko v properties. W e derive ke y s tr uctural and probabilistic proper ties that under pin a more po werful inf erential framew ork than e xisting approaches, underscoring that our contribution e xtends bey ond a no v el graphical representation. Further more, we show that the resulting profile undirected g raphical models are independence-compatible with two-bloc k L WF chain graph models. W e then dev elop a Ba yesian approach f or Gaussian undirected profile graphical models based on continuous spike-and-slab priors to lear n shared sparsity structures across different le vels of the r isk f actor . W e also design a f ast EM algor ithm f or efficient inf erence. Inferential properties are explored through simulation studies, including the compar ison with competing methods. The practical utility of this class of models is demons trated through the analysis of protein network data from various subtypes of acute m y eloid leukemia. Our results show a more parsimonious netw ork and greater patient heterogeneity than its competitors, highlighting its enhanced ability to capture subject-specific differences. Keyw or ds: Undir ected gr aphs; Chain gr aphs; Context-specific independence; Multiple gr aphs; Differential gr aphs. 1 1 Introduction Multiv ariate regression models can be represented by chain graphs [Laur itzen and W ermuth, 1989, Fr ydenberg, 1990, Andersson et al., 2001, W er muth and Sadeghi, 2012], which capture the conditional independence structure between multiple response and e xplanator y variables. In their simplest f or m, responses and predictors f or m two distinct chain components, with missing edges cor responding to conditional independencies under suitable Mark ov proper ties. Among the v arious types of chain graphs [Dr ton, 2009], we f ocus on the L WF chain graph [Frydenberg, 1990], which defines a smooth statistical model and provides a fle xible framew ork f or modeling dependencies among outcomes giv en co v ariates. Model selection f or c hain graphs has attracted significant attention, with recent dev elopments in penalized likelihood [Rothman et al., 2010, Y in and Li, 2011, Lee and Liu, 2012], two-s tep procedures [Cai et al., 2012, Chen et al., 2016], and Bay esian approaches [Bhadra and Mallick, 2013, Consonni et al., 2017]. Ho we ver , these models remain limited when the goal is to character ize ho w an explanatory variable affects the joint dependence structure among outcomes, rather than eac h outcome individually . The conditional independence models encoded by chain graphs pro vide inf ormation only through missing edges, lea ving unexplored how e xisting dependencies chang e with e xter nal factors. This issue is par ticularl y rele vant in situations where associations betw een variables ma y rev erse or shift across different conditions, as in the well-kno wn effect rev ersal [Co x and W er muth, 2003] and Simpson ’ s paradox [Simpson, 1951]. An alternative line of w ork addresses this issue indirectly by modeling subgroups or subpopulations through multiple graphical models [Guo et al., 2011, Danaher et al., 2014, Peterson et al., 2015] or context-specific independencies [Hojsg aard, 2003, Corander, 2003, Nyman et al., 2014, 2016]. Y et, these approaches typicall y do not incor porate external factors directly into the model and often restrict context-specific variations to adjacent vertices. Building on these ideas, w e propose a no v el class of graphical models—profile undirected graphs—which preserve the interpretability of chain graphs while extending them to model ho w dependence structures among responses v ary with an e xternal factor . Our contributions are f ourfold: (i) W e introduce profile undirected graphs as a general frame w ork f or modeling all profile outcome distributions, i.e., conditional distributions of responses giv en an y le vel of a r isk f actor . (ii) W e deriv e the corresponding Marko v properties based on a unified connected set r ule. (iii) W e establish f ormal compatibility , in terms of independence models, betw een the proposed profile g raphs and specific chain graph structures. (iv) W e dev elop parameterizations f or Gaussian models and incor porate continuous spike-and-slab pr iors to lear n shared sparsity patterns across lev els of the e xter nal f actor . For efficient inf erence, w e implement a f ast EM algor ithm and demonstrate the usefulness of 2 our proposed methodology through e xtensiv e simulations including compar ison with competing methods and a cancer g enomics application. W e pro vide a profile graphical model of protein networks that ev ol ve across disease subtypes, thereby unco vering subtype-specific dependency patter ns invisible to standard chain g raph or multiple g raph anal ysis. 2 Theore tical frame w ork basic setup Let 𝐺 = ( 𝑉 , 𝐸 ) be a graph defined b y a set of v er tices 𝑎 ∈ 𝑉 and a set of edges ( 𝑎 , 𝑏 ) ∈ 𝐸 joining pairs of v er tices 𝑎 , 𝑏 ∈ 𝑉 , and let 𝑌 𝑉 = ( 𝑌 𝑎 ) 𝑎 ∈ 𝑉 be a random v ector of variables index ed by the finite set 𝑉 with | 𝑝 = 𝑉 | . A graph, associated to a random v ector 𝑌 𝑉 , is generall y used to represent conditional independence structures under suitable Marko v proper ties. T ypically , missing edges in 𝐺 cor respond to conditional independencies f or the joint dis tribution of 𝑌 𝑉 . Also, let us consider the random categor ical v ar iable 𝑋 with s trictly positiv e probability distribution representing an e xter nal factor with respect to (in the sequel, wr t) the random v ector 𝑌 𝑉 of outcome/response variables. The variable 𝑋 takes lev el 𝑥 ∈ X , with 𝑞 = | X | . Our interest lies in the effect of 𝑋 on the joint independence structure of 𝑌 𝑉 and, in particular , in e xplor ing via a graphical modelling approach how this s tructure ma y c hange under different le v els 𝑥 ∈ X , which we call pr ofiles . Chain graphs are g enerall y used to model the effects of background v ar iables on joint response v ariables. In the simples t f orm, a tw o-block c hain graph 𝐶 = [ { 𝐶 1 , 𝐶 2 } , 𝐸 ] is defined b y a set of v ertices partitioned in chain components 𝐶 1 and 𝐶 2 , and a set of edg es 𝐸 . Depending on the set of Marko v proper ties specified f or the chain graph w e ma y ha ve different independence models f or the joint distr ibution of random vectors ( 𝑌 𝐶 𝑡 ) 𝑡 ∈ { 1 , 2 } , associated to the chain components { 𝐶 𝑡 } 𝑡 ∈ { 1 , 2 } . In par ticular we f ocus on the class of L WF chain graph models [Fr ydenberg, 1990]; these models cor respond to multivariate regression models with suitable independence constraints cor responding b y missing edg es, both within and between chain components. Any pair of v er tices 𝑎 , 𝑏 ∈ 𝐶 𝑡 within the same chain component with 𝑡 = 1 , 2 and 𝑎 ≠ 𝑏 , can be joined b y undirected edg es; vertices betw een chain components, 𝑎 ∈ 𝐶 1 and 𝑏 ∈ 𝐶 2 , are joined b y directed edges preserving the same direction such that cy cles are not allo wed. For our pur pose, the set of v er tices 𝐶 1 and 𝐶 2 are associated, respectiv ely , to the random v ector 𝑌 𝑉 of response v ar iables and to the bac kground v ar iable 𝑋 , so that 𝐶 1 = 𝑉 and | 𝐶 2 | = 1 . In principle, the c hain component 𝐶 2 ma y include a multiple categor ical random vector; in this case 𝑋 represents a random v ariable with state space giv en b y the combination of a multiple factor lev els. For any 𝑥 ∈ X , let 𝑌 𝑉 ( 𝑥 ) be a x-pr ofile outcome v ector , that is the random v ector 𝑌 𝑉 | { 𝑋 = 𝑥 } conditioned on a specific profile 𝑥 of the f actor 𝑋 , and let 𝑃 ( 𝑌 𝑉 ( 𝑥 ) ) be the cor responding x-profile probability distribution of 𝑌 𝑉 ( 𝑥 ) , that is the conditional 3 probability distr ibution 𝑃 ( 𝑌 𝑉 | { 𝑋 = 𝑥 } ) . N ote that 𝑃 ( ·) can be a probability density function or a probability mass function, depending on the continuous or discrete nature of the multivariate random variable 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . For sak e of simplicity , in the sequel w e omit the prefix 𝑥 to denote both the profile outcome v ector and the profile outcome distribution. Then, f or a giv en multiv ar iate random v ector 𝑌 𝑉 and an external f actor 𝑋 , let 𝑌 𝑉 | X = [ 𝑌 𝑉 ( 𝑥 ) ] 𝑥 ∈ X be the finite set of all profile outcome v ectors and let 𝑃 ( 𝑌 𝑉 | X ) = [ 𝑃 ( 𝑌 𝑉 ( 𝑥 ) ) ] 𝑥 ∈ X be the cor responding set of all profile outcome distributions. For an y 𝐴 ⊆ 𝑉 , 𝑌 𝐴 | X = [ 𝑌 𝐴 ( 𝑥 ) ] 𝑥 ∈ X is set of mar ginal profile outcome vectors with cor responding profile probability distributions 𝑃 ( 𝑌 𝐴 | X ) = [ 𝑃 ( 𝑌 𝐴 ( 𝑥 ) ) ] 𝑥 ∈ X . A definition of profile-independence follo w s. Definition 1 Given a gr aph 𝐺 = ( 𝑉 , 𝐸 ) and a partition 𝐴 , 𝐵 , 𝐶 ⊆ 𝑉 , the profile conditional independence 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) and the profile marginal independence 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) correspond, respectiv ely, to the f actorizations 𝑃 [ 𝑌 𝐴 ( 𝑥 ) , 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] = 𝑃 [ 𝑌 𝐴 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] × 𝑃 [ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) ] , (1) 𝑃 [ 𝑌 𝐴 ( 𝑥 ) , 𝑌 𝐵 ( 𝑥 ) ] = 𝑃 [ 𝑌 𝐴 ( 𝑥 ) ] × 𝑃 [ 𝑌 𝐵 ( 𝑥 ) ] , (2) of the joint profile distribution 𝑌 𝑉 ( 𝑥 ) , f or any 𝑥 ∈ X . The follo wing lemma holds by definition of conditional independence. Lemma 1 If the pr ofile independence s tatements in Eq uations (1) and (2) hold f or any level 𝑥 ∈ X , then these equations imply t hat 𝑌 𝐴 ⊥ ⊥ 𝑌 𝐵 | { 𝑌 𝐶 , 𝑋 } and 𝑌 𝐴 ⊥ ⊥ 𝑌 𝐵 | 𝑋 , r espectively . Finall y , let us consider a collection of multiple g r aphs 𝐺 𝑉 | X = { 𝐺 ( 𝑥 ) = ( 𝑉 , 𝐸 ( 𝑥 ) ) } 𝑥 ∈ X associated to the profile outcome distr ibutions 𝑃 ( 𝑌 𝑉 | X ) . U nder suitable Marko v proper ties, an y graph 𝐺 ( 𝑥 ) represents an independence model for the profile outcome v ector 𝑌 𝑉 ( 𝑥 ) , f or any 𝑥 ∈ X . In par ticular , missing edges wr t 𝐺 ( 𝑥 ) cor respond to profile conditional independencies f or the joint distribution of 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . Graphs 𝐺 ( 𝑥 ) ∈ 𝐺 𝑉 | X ma y ha ve different sk eletons. W e remark that chain graph models do not allo w to e xplore ho w the independence structure of 𝑌 𝑉 ma y considerabl y vary f or an y profile 𝑥 ∈ X . Multiple graphs do not allow to model the effect of 𝑋 on each outcome 𝑌 𝑎 ∈ 𝑌 𝑉 . In essence, the idea is to pro vide a single g raph able to embed, at the same time, information about the profile independence str ucture f or an y 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X and about the conditional independence betw een 𝑋 and an y outcome 𝑌 𝑎 ∈ 𝑌 𝑉 . In the f ollowing sections, we der iv e the proper ties of this new class of graphical models that fill the g ap between the class of chain graphs and the one of multiple graphs and highlight the connection 4 betw een these two classes. W e also e xploit our results to define a more po w er ful, with respect to s tate-of-the ar t approaches, inf erential procedure. 3 Profile undirected graphical models 3.1 Profile undir ected graphs W e introduce the class of undirected profile graphs. A profile undirected graph G U = ( 𝑉 , E ) is defined by the set 𝑉 of v er tices and a set of Z -labelled edges E which are labelled according to a subset Z ⊆ X . Let ( 𝑎 , 𝑏 ) Z be the generic element of E associated to any pair 𝑎 , 𝑏 ∈ 𝑉 , where the presence or absence of the edge between 𝑎 and 𝑏 is determined by the subset Z of the state space X . For each pair 𝑎 , 𝑏 ∈ 𝑉 , the corresponding edge ( 𝑎 , 𝑏 ) Z ∈ E will belong to one of the follo wing three categor ies: (i) if Z = X , v er tices 𝑎 and 𝑏 are not joined b y an y edge, (ii) if Z is a nonempty proper subset of X , Z ⊂ X and Z ≠ ∅ , v er tices 𝑎 and 𝑏 are joined b y a dotted Z -labelled edge; (iii) if Z = ∅ , vertices 𝑎 and 𝑏 are joined by a full edge and, f or sake of simplicity , the ∅ -label is not displa y ed in the g raph. Under suitable Mark o v proper ties, the profile graph G U pro vides an independence model f or the joint distributions of a random vector 𝑌 𝑉 | X of profile outcomes. In par ticular , a missing edg e in G U cor responds to a profile conditional independence f or each profile 𝑥 ∈ X . A Z -labelled dotted edg e in G U cor responds to profile conditional independencies holding onl y f or the profiles 𝑥 ∈ Z , with Z ⊂ X and Z ≠ ∅ . Further technical definitions are giv en. For any couple of v ertices 𝑎 , 𝑏 ∈ 𝑉 , we say that 𝑏 is an 𝑥 - neighbour of 𝑎 and vice v ersa , if the y are joined b y a Z -labelled edge such that 𝑥 ∉ Z , with Z ⊂ X . Let 𝑛 𝑏 𝑥 ( 𝑎 ) be the set of all 𝑥 -neighbours of 𝑎 , with 𝑎 ∈ 𝑉 and 𝑥 ∈ X . For any pair 𝑎, 𝑏 ∈ 𝑉 and 𝑥 ∈ X , an 𝑥 - 𝑝 𝑎 𝑡 ℎ betw een 𝑎 and 𝑏 is giv en by a sequence of ( 𝑎 , 𝑏 ) Z edg es, f or any Z ⊂ X , suc h that 𝑥 ∉ Z f or all edges in the seq uence. Giv en an y nonempty subset 𝐶 of 𝑉 , 𝐶 is said to be 𝑥 - connected if any pair 𝑎, 𝑏 ∈ 𝐶 is joined by a 𝑥 -path, with 𝑥 ∈ X . An y nonempty subset 𝐷 of 𝑉 is said to be 𝑥 - disconnected if it is not 𝑥 -connected, with 𝑥 ∈ X and let 𝐾 1 , . . . , 𝐾 𝑟 be the 𝑥 -connected components of 𝐷 . For an y triple 𝐴 , 𝐵 , 𝐶 of disjoint subsets of 𝑉 and 𝑥 ∈ X , w e say that 𝐶 𝑥 - separ ates 𝐴 from 𝐵 if ev er y 𝑥 -path from an y v er tex 𝑎 ∈ 𝐴 to an y v er tex 𝑏 ∈ 𝐵 intersects 𝐶 . T echnical 𝑥 -definitions abo v e can be simply extended to Z -definitions for an y subset Z of X if the y hold f or all 𝑥 ∈ Z . Example 1 Consider the profile undir ected gr aph G 𝑈 in the lef t panel of F igur e 1. V ertices 𝑎 and 𝑐 are bo th { 1 , 2 } -neighbours, because t hey ar e joined by a do tted edg e with label Z = { 0 } that does not contain neither 1 or 2 . V er tices 𝑏 and 𝑑 are X -neighbours because they are joined by a full edg e. The sequence of edges { ( 𝑎 , 𝑐 ) { 0 } , ( 𝑎 , 𝑏 ) { 2 } , ( 𝑏 , 𝑑 ) ∅ } is a { 1 } -path, since 1 is not included in any label of the edg es in the sequence. 5 G 𝑈 a b c d 2 1,2 0 𝑈 ( 0 ) a b c d 𝑈 ( 1 ) a b c d 𝑈 ( 2 ) a b c d Figure 1: Given 𝑉 = { 𝑎 , 𝑏 , 𝑐 , 𝑑 } , G U is a profile undirected graph f or the profile outcome v ectors 𝑌 𝑉 | X = [ ( 𝑌 𝑉 ( 𝑥 ) ) 𝑥 ∈ X ] with X = { 0 , 1 , 2 } . Any 𝑈 ( 𝑥 ) is the induced undirected graph f or the profile outcome v ector 𝑌 𝑉 ( 𝑥 ) , with 𝑥 ∈ X . The same sequence is not a { 2 } -path since the label of the couple ( 𝑎 , 𝑏 ) contains 2 . The set 𝑉 is { 1 } -connected, because ev er y pair of v er tices in 𝑉 are joined by a { 1 } -path. The same set is { 2 } -disconnected with { 2 } - connected components { 𝑎 , 𝑐 } and { 𝑏 , 𝑑 } , because does not exist a { 2 } -path betw een 𝑎 and 𝑏 . V er tices 𝑐 and 𝑑 ar e { 1 } -separat ed by 𝑎 because the only { 1 } -path { ( 𝑎 , 𝑐 ) { 0 } , ( 𝑎 , 𝑏 ) { 2 } , ( 𝑏 , 𝑑 ) } betw een 𝑐 and 𝑑 intersects 𝑎 ; v er tex 𝑎 does not { 0 } -separat es 𝑐 and 𝑑 because ther e exists t he { 0 } -path { ( 𝑏 , 𝑐 ) { 1 , 2 } , ( 𝑏 , 𝑑 ) ∅ } betw een t hem that does no t inter sects 𝑎 . 3.2 Profile undir ected Mark o v properties In this section, w e first define Marko v proper ties f or profile undirected g raphs and then deriv e a f e w results that connect them. Definition 2 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the profile outcome vect ors 𝑌 𝑉 | X satisfy the profile undir ected P air wise Mar kov Pr operty ( U -PMP) wrt the gr aph G U = ( 𝑉 , E U ) if, f or any ( 𝑎 , 𝑏 ) Z ∈ E U with Z ⊆ X , 𝑌 𝑎 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑏 ( 𝑥 ) | 𝑌 𝑉 \ { 𝑎 , 𝑏 } ( 𝑥 ) , 𝑥 ∈ Z . (3) Definition 3 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the profile outcome vect ors 𝑌 𝑉 | X satisfy the profile undir ected Global Marko v Property ( U -GMP) wrt the gr aph G U = ( 𝑉 , E U ) if, for any triple 𝐴, 𝐵 , 𝐶 of disjoint subsets of 𝑉 such that 𝐶 𝑥 -separates 𝐴 fr om 𝐵 in G U , 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (4) Definition 4 The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the profile outcome vect ors 𝑌 𝑉 | X satisfy the profile 6 undir ected Connected Set Marko v Property ( U -CSMP) wr t the gr aph G U = ( 𝑉 , E U ) if, for any 𝑥 -disconnect ed set 𝐷 of 𝑉 , with 𝐾 1 , . . . , 𝐾 𝑟 𝑥 -connected components of 𝐷 , 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , 𝑥 ∈ X . (5) Example 2 Consider the lef t panel including t he gr aph G U in F igur e 1. 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -PMP wrt G U if 𝑌 𝑏 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑐 ( 𝑥 ) | { 𝑌 𝑎 ( 𝑥 ) , 𝑌 𝑑 ( 𝑥 ) } f or 𝑥 ∈ { 1 , 2 } , since ( 𝑏 , 𝑐 ) { 1 , 2 } ∈ E U . 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -GMP wr t G U if 𝑌 𝑐 ( 1 ) ⊥ ⊥ { 𝑌 𝑏 ( 1 ) , 𝑌 𝑑 ( 1 ) } | 𝑌 𝑎 ( 1 ) because 𝑎 { 1 } -separat es 𝑐 fr om { 𝑏 , 𝑑 } . Consider t he subse t 𝐷 = { 𝑎 , 𝑏 , 𝑐 } of 𝑉 ; 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -CSMP wrt G U if { 𝑌 𝑎 ( 2 ) , 𝑌 𝑐 ( 2 ) } ⊥ ⊥ 𝑌 𝑏 ( 2 ) | 𝑌 𝑑 ( 2 ) because 𝐷 is { 2 } -disconnected se t with two { 2 } -connected components { 𝑎 , 𝑐 } and 𝑏 . W e prov e that all the independence s tatements encoded in a profile undirected graph under the global Mark ov proper ty can be der iv ed by applying the connected set rule that giv es insight on the connectivity and on the path setting of the g raph across profiles. Theorem 1 Let G U = ( 𝑉 , E U ) be a pr ofile undirect ed graph model associat ed t o the pr ofile outcome v ectors 𝑌 𝑉 | X with probability distributions 𝑃 [ 𝑌 𝑉 | X ] . The U -GMP is satisfied if and only if the U -CSMP is satisfied wrt G U . The proof of Theorem 1 is giv en in the Supplementary Mater ial, along with all other proof s. The local Marko v proper ty f or profile undirected g raph is also included in the Supplementar y Mater ial. Given a profile undirected graph G U = ( 𝑉 , E U ) for the profile outcome v ectors 𝑌 𝑉 | X , the cor responding class of multiple undirected graphs associated to eac h random vector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X can be defined. Definition 5 Given a profile undirect ed g raph G U = ( 𝑉 , E U ) for the profile outcome vect ors 𝑌 𝑉 | X , let 𝑈 𝑉 | X = { 𝑈 ( 𝑥 ) = ( 𝑉 , 𝐸 𝑈 ( 𝑥 ) ) } 𝑥 ∈ X be the induced class of multiple undirect ed g r aphs, wher e, f or any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X , the couple 𝑎 , 𝑏 ∈ 𝑉 is joined by an undir ected edg e if 𝑥 ∉ Z in the corr esponding edg e ( 𝑎 , 𝑏 ) Z ∈ E U , with Z ⊆ X . Then, a missing edge in G U cor responds to a missing edge in 𝑈 ( 𝑥 ) , f or an y 𝑥 ∈ X ; a Z -labelled dotted edg e in G U cor responds to a missing edge in 𝑈 ( 𝑥 ) if 𝑥 ∈ Z , and to a full edge in 𝑈 ( 𝑥 ) if 𝑥 ∉ Z ; a full edg e in G U cor responds to a full edg e in 𝑈 ( 𝑥 ) , for any 𝑥 ∈ X . Example 3 Consider F igur e 1. Giv en the pr ofile undir ected graph G U , le t 𝑈 𝑉 | X = { 𝑈 ( 0 ) , 𝑈 ( 1 ) , 𝑈 ( 2 ) } be the induced class of multiple undirect ed g r aphs. The couple 𝑎, 𝑑 is disjoined in G U and in any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . The 7 couple 𝑏 , 𝑐 is joined by a { 1 , 2 } -labelled dotted edge in G U then is joined by a full edg e in 𝑈 ( 0 ) and is disjoined in 𝑈 ( 1 ) , 𝑈 ( 2 ) . The couple 𝑏 , 𝑑 is joined by a full edg e in G U and in any 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . Pairwise, local, and global Marko v proper ty of probability distributions associated to undirected graphs are w ell kno wn [Laur itzen, 1996]. The follo wing corollar y , derived directly from Theorem 1, sho w s that the full set of conditional independencies implied by the global Marko v property f or any undirected g raph can be also derived b y applying the connected set rule. Corollary 1 Giv en an undirect ed gr aph model 𝑈 ( 𝑥 ) = ( 𝑉 , 𝐸 ( 𝑥 ) ) associated to the profile outcome vector s 𝑌 𝑉 | X , the probability distributions 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] satisfies the g lobal Marko v property wrt 𝑈 ( 𝑥 ) if and only if the connected set Marko v property is satisfied f or every 𝑥 -disconnected set 𝐷 ⊆ 𝑉 , with 𝑥 ∈ X . The f ollo wing proposition sho ws that the full set of independencies encoded in the induced undirected graph model f or an y 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X can be deriv ed from the profile undirected graph model f or the joint distributions of 𝑌 𝑉 | X . Proposition 1 Consider a profile undirect ed gr aph G U = ( 𝑉 , E U ) associated t o the pr ofile outcome vect ors 𝑌 𝑉 | X and the induced class of multiple undirect ed gr aphs 𝑈 𝑉 | X . If the probability distributions 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -CSMP wr t G U , the pr obability distribution 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] of eac h pr ofile v ector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X satisfies the g lobal Mar kov property wrt the induced undir ected graph 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X . In the f ollo wing proposition w e sho w that U -GMP , U -CSMP and U -PMP are equivalent for the class of profile undirected graph models in case of s tr ictly positiv e probability distributions. This result directl y derives from Proposition 1. Proposition 2 Let G U = ( 𝑉 , E U ) be a profile undir ected gr aph associat ed t o the profile outcome v ectors 𝑌 𝑉 | X with strictly positiv e pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] . The U -GMP is satisfied if and only if the U -PMP is satisfied wr t G U . 4 Profile undirected graphs and L WF chain graphs For any profile undirected graph G U , w e derive an induced class of tw o-block L WF chain g raphs C 𝑈 = { 𝐶 𝑈 } , with g eneric element 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] , such that the joint distribution 𝑃 ( 𝑌 𝑉 , 𝑋 ) under 𝐶 𝑈 is compatible, in terms of independence models, with the set of profile distributions 𝑃 [ 𝑌 𝑉 | X ] under G U . Preliminar y definitions are required to specify the class of L WF chain graphs induced b y a profile graph. 8 A joint probability distribution 𝑃 ( 𝑌 𝑉 , 𝑋 ) satisfies the L WF Global Marko v property (L WF-GMP) wr t the L WF chain graph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] if [Fr ydenberg, 1990, Drton, 2009]: f or an y disconnected set 𝐷 ⊆ 𝑉 with connected components 𝐾 1 , . . . , 𝐾 𝑟 , 𝑌 𝐾 1 ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } ; (6) f or an y subset 𝐴 ⊆ 𝑉 such that there is a missing ar row betw een any v er te x 𝑎 ∈ 𝐴 and 𝑋 , 𝑌 𝐴 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝐴 . (7) W e remark that Equation (6) directly der ives from Theorem 1. A definition of Marko v -compatibility betw een a profile undirected graph and an L WF c hain graphs is giv en. Definition 6 An L WF c hain g r aph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] is Mar kov-compatible with a profile undirect ed gr aph G U if the L WF -GMP in (6) for 𝐶 𝑈 is implied by the U -GMP f or G U . W e no w derive the class of L WF-graphs induced b y a profile undirected graph suc h that Mark o v -compatibility is satisfied. Theorem 2 Consider a profile undirect ed gr aph G 𝑈 = ( 𝑉 , E U ) associat ed to the profile outcome vect ors 𝑌 𝑉 | X . If the pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] satisfy the U -GMP f or G 𝑈 , then also the L WF -GMP in (6) is satisfied f or the induced class C 𝑈 of two-bloc k L WF chain graphs, where any 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] belongs to C 𝑈 if (i) any couple 𝑎 , 𝑏 ∈ 𝑉 is joined by an undir ected edge in 𝐶 𝑈 if Z ⊂ X f or t he pair ( 𝑎 , 𝑏 ) Z ∈ E U ; (ii) f or any couple 𝑎 , 𝑏 ∈ 𝑉 , 𝑎 and 𝑏 ar e both r eached by an arro w in 𝐶 𝑈 starting from 𝑋 if Z ⊂ X and Z ≠ ∅ f or the pair ( 𝑎 , 𝑏 ) Z ∈ E U . Necessary conditions (i) and (ii) in Theorem 2 ensure that it will alwa ys e xist at least one Marko v -compatible L WF chain g raph f or any given profile undirected g raph, specifically , the chain graph with no missing ar ro w s is alw a ys compatible. Condition (i) is related to the missing/non-missing undirected edges f or any induced chain graph; it s tates that dotted and full edg es in profile undirected graphs correspond to full edg es in c hain graphs. Condition (ii) is related to missing/non-missing directed edges f or an y induced chain graph; it states that v er tices joined b y a dotted edge in a profile undirected graph cannot be disjoined from 𝑋 in the induced chain g raph. Since condition (ii) may not be intuitiv e, the f ollo wing countere xample sho ws that this is a necessar y condition. 9 Example 4 Let 𝑉 = { 𝑎 , 𝑏 , 𝑐 } be a set of r esponse variables and 𝑋 a fact or with state-space X = { 0 , 1 } and let G U = ( 𝑉 , E U ) be a profile undir ected g raph with E U = { ( 𝑎 , 𝑏 ) { 0 } , ( 𝑎 , 𝑐 ) ∅ , ( 𝑏 , 𝑐 ) X } wher e the pair 𝑎 , 𝑏 is joined by a { 0 } -dotted edg e that implies 𝑌 𝑎 ( 0 ) ⊥ ⊥ 𝑌 𝑏 ( 0 ) | 𝑌 𝑐 ( 0 ) and 𝑌 𝑎 ( 1 ) ⊥ ⊥ / 𝑌 𝑏 ( 1 ) | 𝑌 𝑐 ( 1 ) . (8) W e explor e Markov-compatibility for some c hain gr aphs. Consider a chain gr aph 𝐶 𝑈 = { ( 𝑉 , 𝑋 ) , 𝐸 𝐶 𝑈 } with 𝐸 𝐶 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑐 ) } , wher e v ertices 𝑎 and 𝑏 ar e both disjoined from 𝑋 . F or the condition (7) , we have { 𝑌 𝑎 , 𝑌 𝑏 } ⊥ ⊥ 𝑋 | 𝑌 𝑐 , i.e., 𝑃 ( 𝑌 𝑎 ( 0 ) , 𝑌 𝑏 ( 0 ) | 𝑌 𝑐 ( 0 ) ) = 𝑃 ( 𝑌 𝑎 ( 1 ) , 𝑌 𝑏 ( 1 ) | 𝑌 𝑐 ( 1 ) ) , that is not compatible with condition (8) f or the profile gr aph, then 𝐶 𝑈 does not belong to the induced class C 𝑈 . Consider the chain gr aph 𝐶 ′ 𝑈 with 𝐸 𝐶 ′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑏 ) , ( 𝑋 , 𝑐 ) } , where only 𝑎 is joined to 𝑋 . Equation (7) implies that 𝑌 𝑎 ⊥ ⊥ 𝑋 | { 𝑌 𝑏 , 𝑌 𝑐 } , that is, 𝑃 [ 𝑌 𝑎 ( 0 ) | 𝑌 𝑏 ( 0 ) , 𝑌 𝑐 ( 0 ) ] = 𝑃 [ 𝑌 𝑎 ( 1 ) | 𝑌 𝑏 ( 1 ) , 𝑌 𝑐 ( 1 ) ] , that is not compatible with condition (8) f or the profile gr aph implying that 𝑃 [ 𝑌 𝑎 ( 1 ) | 𝑌 𝑏 ( 1 ) , 𝑌 𝑐 ( 1 ) ] ≠ [ 𝑌 𝑎 ( 0 ) | 𝑌 𝑏 ( 0 ) 𝑌 𝑐 ( 0 ) ] . It follo w s that 𝐶 ′ 𝑈 does not belong to the induced class C 𝑈 . Consider a third c hain gr aph 𝐶 ′′ 𝑈 with t he se t 𝐸 𝐶 ′′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑎 ) , ( 𝑋 , 𝑏 ) , ( 𝑋 , 𝑐 ) } of edg es, wher e 𝑎 and 𝑏 ar e bot h joined t o 𝑋 . This c hain gr aph is compatible with condition (8) for the profile g raph G U . 𝐶 ′′ 𝑈 satisfies bot h conditions (i) and (ii) in Theor em 2 and belongs to the induced class C 𝑈 . Consider the last c hain gr aph 𝐶 ′′′ 𝑈 with the set 𝐸 𝐶 ′′ 𝑈 = { ( 𝑎 , 𝑏 ) , ( 𝑏 , 𝑐 ) , ( 𝑋 , 𝑎 ) , ( 𝑋 , 𝑏 ) , } of edg es, wher e only 𝑎 and 𝑏 are bot h joined to 𝑋 . This gr aph implies 𝑌 𝑐 ⊥ ⊥ 𝑋 | { 𝑌 𝑎 , 𝑌 𝑏 } , i.e., 𝑃 [ 𝑌 𝑐 ( 1 ) | 𝑌 𝑎 ( 1 ) , 𝑌 𝑏 ( 1 ) ] ≠ [ 𝑌 𝑐 ( 0 ) | 𝑌 𝑎 ( 0 ) 𝑌 𝑏 ( 0 ) ] that is compatible with condition (8) for the pr ofile gr aph G U . 𝐶 ′′′ 𝑈 satisfies both conditions (i) and (ii) in Theor em 2. The induced class C 𝑈 = { 𝐶 ′′ 𝑈 , 𝐶 ′′′ 𝑈 } of c hain g raphs includes tw o elements wher e 𝐶 ′′′ 𝑈 has no missing arro ws and 𝐶 ′′′ 𝑈 has a missing arro w in corr espondence of the v ertex 𝑐 whic h is not invol v ed in dotted edg es. In essence, giv en a profile graph G U , the induced class C 𝑈 includes L WF chain graphs 𝐶 𝑈 = { ( 𝑉 , 𝑋 ) , 𝐸 𝐶 𝑈 } where the chain component 𝑉 has the same skeleton of G U , and differ only according to which ar row s are missing. Within this class, w e can identify the maximum element , i.e., the chain g raph with no missing arrow s and the minimum element , i.e., the chain graph with a set of ar ro ws that point to all v er tices 𝑎 ∈ 𝑉 suc h that 𝑛 𝑏 Z ( 𝑎 ) ≠ ∅ with 𝑍 ⊂ X and 𝑍 ≠ ∅ . 10 G 𝑈 a b c d 2 1,2 0 𝐶 𝑈 a b c d X Figure 2: A profile undirected g raph with a compatible L WF chain graph. In order to account also f or the L WF-GMP in (7) and to establish a one-to-one relationship betw een profile undirected graphs and L WF c hain graphs, w e g eneralize the class of profile undirected graphs. Giv en a profile undirected graph G U = ( 𝑉 , E U ) , consider the par tition 𝑉 = 𝑉 ∪ 𝑉 □ of the v er tex set so that we dis tinguish betw een tw o types of v er tices, a circle v er tex 𝑎 ∈ 𝑉 and a squar e v ertex 𝑎 □ ∈ 𝑉 □ , drawn as # and □ , respectiv ely . For ev ery 𝑎 □ ∈ 𝑉 □ , w e assume 𝑌 𝑎 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝑎 ; that is the univariate profile distribution of 𝑌 𝑎 ( 𝑥 ) is in variant f or any 𝑥 ∈ X , giv en the remaining variables 𝑌 𝑉 \ 𝑎 ; other wise if 𝑎 ∈ 𝑉 , we assume 𝑌 𝑎 ⊥ ⊥ / 𝑋 | 𝑌 𝑉 \ 𝑎 . The profile g raph in this generalized representation includes inf or mation also about the independence structure betw een subsets of response v ar iables 𝑌 𝐴 , with 𝐴 ⊆ 𝑉 , and the e xter nal factor 𝑋 . In par ticular , f or an y 𝐴 ⊆ 𝑉 □ , we assume that 𝑌 𝐴 ⊥ ⊥ 𝑋 | 𝑌 𝑉 \ 𝐴 . Then, given a profile undirected graph G U = ( 𝑉 , 𝑉 □ , E U ) , the compatible tw o-block L WF chain g raph 𝐶 𝑈 = [ { 𝑉 , 𝑋 } , 𝐸 𝐶 𝑈 ] in the class C 𝑈 is unique and is defined b y a chain g raph where the undirected graph of the response component 𝑉 has the same skeleton of G U and there are missing ar ro ws betw een 𝑋 and an y square v er tex 𝑎 □ ∈ 𝑉 □ . Square and circle v er tices pro vide insights into ho w the dependence structure varies across profiles. Square v er tices identify v ariables with stable pairwise dependencies, while circular v er tices denote v ariables c haracterized b y pairwise context-specific independencies. The fe w er the square nodes, the greater the difference across profiles. Example 5 Consider the profile undirect ed gr aph and the induced c hain gr aph in Figur e 2. V ertices 𝑎 , 𝑏 , 𝑐 ar e cir cled v er tices while 𝑑 is a squar e v er tex wrt G U , i.e., { 𝑎 , 𝑏 , 𝑐 } ∈ 𝑉 and 𝑑 ∈ 𝑉 □ . Then, both the pr ofile undir ected gr aph and the chain g raph imply the independence statement 𝑌 𝑑 ⊥ ⊥ 𝑋 | { 𝑌 𝑎 , 𝑌 𝑏 , 𝑌 𝑐 } . Also, both gr aphs imply that { 𝑌 𝑎 , 𝑌 𝑐 } ⊥ ⊥ 𝑌 𝑑 | { 𝑌 𝑏 , 𝑋 } . U nlike the profile gr aph, the chain gr aph does no t pr ovide inf ormation about the effect of 𝑋 on t he 𝑌 𝑉 association structure, e.g., 𝑌 𝑎 ( 2 ) ⊥ ⊥ 𝑌 𝑏 ( 2 ) | { 𝑌 𝑐 ( 2 ) , 𝑌 𝑑 ( 2 ) } . 11 5 Gaussian profile undirected graphical model W e can no w define the class of Gaussian profile undir ected gr aphical model b y imposing zero-cons traints o v er the model parameters; these cons traints naturall y f ollo w from the Marko v equiv alence betw een profile graphs and multiple graphs, and the compatibility betw een profile g raphs and chain graphs es tablished previousl y . For all 𝑥 ∈ X , let 𝑌 𝑉 ( 𝑥 ) ∼ 𝑁 ( 𝛼 + 𝛽 𝑥 , Σ 𝑥 ) where [ 𝛼 𝑎 + 𝛽 𝑎 𝑥 ] 𝑎 ∈ 𝑉 = E [ 𝑌 𝑎 ( 𝑥 ) ] 𝑎 ∈ 𝑉 is the profile marginal mean v ector and Σ 𝑥 is the pr ofile co variance matrix with entr ies 𝜎 𝑎 𝑏 , 𝑥 , 𝑥 ∈ X . Let 𝜁 𝑎 𝑥 , 𝑎 ∈ 𝑉 , be the linear effect of the external factor on the profile conditional mean vect or E [ 𝑌 𝑎 ( 𝑥 ) | 𝑌 𝑉 \ 𝑎 ( 𝑥 ) ] 𝑎 ∈ 𝑉 and let Ω 𝑥 = Σ − 1 𝑥 be the profile pr ecision matrix with entries 𝜔 𝑎 𝑏 , 𝑥 , 𝑥 ∈ X ; note that 𝜁 𝑥 = Ω 𝑥 𝛽 𝑥 where 𝜁 𝑥 = [ 𝜁 𝑎 𝑥 ] 𝑎 ∈ 𝑉 and 𝛽 𝑥 = [ 𝛽 𝑎 𝑥 ] 𝑎 ∈ 𝑉 [Andersson et al., 2001]. Definition 7 The Gaussian profile undirect ed g r aphical model f or 𝑌 𝑉 | X wrt G U = ( 𝑉 , E U ) is such that, (i) f or any 𝑎 ∈ 𝑉 □ , 𝜁 𝑎 𝑥 = 0 f or all 𝑥 ∈ X , (ii) f or any ( 𝑎 , 𝑏 ) Z ∈ E U , with Z ⊆ X , 𝜔 𝑎 𝑏 , 𝑥 = 0 , f or each 𝑥 ∈ Z . Estimation of Gaussian profile undirected graphical models can build on e xisting methods f or Gaussian chain or multiple graph inf erence. Penalty terms that promote shared network structures across profiles—similar to the group graphical lasso, joint graphical lasso [Danaher et al., 2014], or GemBag [Y ang et al., 2021] ma y be appropriate when the x-profile vectors f ollow distributions with comparable g raph structures. In Section 5.1, w e extend the model of Y ang et al. [2021] b y introducing profile indicators 𝑟 𝑎 𝑏 , 𝑥 that specify the sparsity of the cor responding entr ies 𝜔 𝑎 𝑏 , 𝑥 and profile coefficients 𝛽 𝑎 , 𝑥 . The entries 𝜔 𝑎 𝑏 , 𝑥 estimated to be zero define the zero patter ns that cor respond to a giv en profile undirected g raph G U = ( 𝑉 , E U ) . 5.1 Ba y esian model f ormulation T o express our model in a Ba yesian frame w ork, w e first introduce binary global-lev el indicators 𝛾 𝑖 𝑗 such that 𝛾 𝑖 𝑗 : = 1 indicates that nodes 𝑖 and 𝑗 are connected in at least one of the 𝑞 g raphs, and 𝛾 𝑖 𝑗 : = 0 denotes no such connection. W e place independent Bernoulli priors 𝛾 𝑖 𝑗 ∼ Ber noulli ( 𝛾 𝑖 𝑗 | 𝑝 1 ) on these indicators. Similarl y , w e introduce global-lev el coefficient indicators 𝜃 𝑖 . The indicator 𝜃 𝑖 denotes whether at least one of the coefficients that represent the effect of the e xternal f actor is non-zero ( 𝜃 𝑖 : = 1 ) or zero ( 𝜃 𝑖 : = 0 ). These indicators will also affect the profile-lev el sparsity in the corresponding entr ies of the precision matrices across the 𝑞 graphs. W e assign a Ber noulli prior 𝜃 𝑖 ∼ Ber noulli ( 𝜃 𝑖 | 𝑝 2 ) at each indicator . 12 W e introduce profile indicators 𝑟 𝑖 𝑗 , 𝑥 to capture the sparsity of the cor responding entr ies 𝜔 𝑖 𝑗 , 𝑥 . T o encourag e similarity among these indicators, we place joint pr iors on 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 | 𝛾 𝑖 𝑗 . These distr ibutions encourage across-profile inf ormation shar ing while allowing within-profile heterogeneity . In this paper , we assume the f ollo wing hierarchical s tr ucture f or this dis tribution: P ( 𝑟 𝑖 𝑗 , 𝑥 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 | 𝛾 𝑖 𝑗 , 𝜃 𝑖 , 𝜃 𝑗 ) = 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 Bernoulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 3 ) + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) Ber noulli ( 𝑟 𝑖 𝑗 , 0 | 𝑝 4 ) 𝛿 ( 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ) ( 𝑟 𝑖 𝑗 , 0 ) , (9) where 𝛿 · ( · ) denotes a point mass at zero. Under this setup, when the global-lev el indicator 𝛾 𝑖 𝑗 = 0 , all the 𝑟 𝑖 𝑗 , 𝑘 are set to zero. When 𝛾 𝑖 𝑗 = 1 , each 𝑟 𝑖 𝑗 , 𝑘 can still independently take the value 0 with probability 1 − 𝑝 3 if 𝜃 𝑖 = 𝜃 𝑗 = 1 , or with probability 1 − 𝑝 4 if 𝜃 𝑖 = 𝜃 𝑗 = 0 and 𝑟 𝑖 𝑗 , 0 = 𝑟 𝑖 𝑗 , 1 = , . . . , = 𝑟 𝑖 𝑗 , 𝑞 − 1 . This pro vides a fle xible approach to shar ing inf or mation across profiles while also accounting for the effect of the e xter nal factor , e.g., the inclusion or ex clusion of arrow s. W e place an exponential pr ior on the positiv e diagonal entries of the 𝑞 precision matrices to induce proper shrinkage: 𝜔 𝑖 𝑖 , 𝑥 ∼ Exponential ( 𝜏 ) , and a spike-and-slab pr ior on the upper tr iangular entr ies 𝜔 𝑖 𝑗 , 𝑥 ( 𝑖 < 𝑗 ) : 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ∼ 𝑟 𝑖 𝑗 , 𝑥 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 1 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 0 ) , with 𝜈 1 > 𝜈 0 > 0 . Follo wing Y ang et al. [2021], w e adopt a spike-and-slab Lasso prior [R ock o va and Georg e, 2018], where P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 1 ) represents the slab component with a larg e variance, allo wing f or lar ge signals, and P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝜈 0 ) represents the spike component with a small v ar iance, encouraging values close to zero. Finall y w e set a nor mal spike-and-slab pr ior [Georg e and McCulloch, 1993] on the profile coefficients: 𝛽 𝑖 , 𝑥 | 𝜃 𝑖 ∼ 𝜃 𝑖 P ( 𝛽 𝑖 , 𝑥 | 𝜆 1 ) + ( 1 − 𝜃 𝑖 ) P ( 𝛽 𝑖 , 𝑥 | 𝜆 0 ) , with 𝜆 1 > 𝜆 0 > 0 . Figure 3 pro vides a graphical representation of our Ba yesian model. Parameter estimation f or our proposed Bay esian Gaussian profile undirected graphical model is conducted using Expectation-Maximisation (EM) algor ithm [Dempster et al., 1977] along the lines of the EM algor ithm of Y ang et al. [2021]. Let Δ = ( 𝛼, 𝛽 𝑥 , Ω 𝑥 ) be the unknown parameters. The EM algorithm aims to maximise the log-posterior log ( P ( Δ | 𝑌 𝑉 𝑥 ) by w orking with the complete-data log-pos ter ior log ( P ( Δ | 𝑌 𝑉 𝑥 , 𝜃 , 𝑅 ) . The EM algorithm has tw o steps: the E-step calculates the e xpected values E [ 𝑟 𝑖 𝑗 , 𝑥 | b Δ , 𝑌 𝑉 𝑥 ] and E [ 𝜃 𝑖 | b Δ , 𝑌 𝑉 𝑥 ] where Δ ( 𝑡 ) = b Δ are the current values at 𝑡 of Δ . The M-step maximises 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ [ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ] = ⟨ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ⟩ w .r .t. b Δ = ( b 𝛼 , b 𝛽 𝑥 , b Ω 𝑥 ) , giving new updates Δ ( 𝑡 + 1 ) = ( 𝛼 ( 𝑡 + 1 ) , 𝛽 ( 𝑡 + 1 ) 𝑥 , Ω ( 𝑡 + 1 ) 𝑥 ) = b Δ . Supplementary Material C provides a detail der ivation of our EM algorithm. 13 𝑌 𝑣 𝑥 𝛼 𝜔 𝑖 𝑗 , 𝑥 𝜔 𝑖 𝑖 , 𝑥 𝑟 𝑖 𝑗 , 𝑥 𝛾 𝑖 𝑗 𝛽 𝑖 , 𝑥 𝜃 𝑖 𝜏 𝑝 4 𝑝 3 𝜆 0 𝜆 1 𝜈 0 𝜈 1 𝑝 2 𝑝 1 0 ≤ 𝑥 ≤ 𝑞 − 1 1 ≤ 𝑖 < 𝑗 ≤ 𝑝 Figure 3: Directed acy clic g raph (D A G) f or Gaussian undirected profile graphical model with Spike-and-slab prior on the profile cov ar iance matr ices and the profile coefficients 6 R esults 6.1 Simulations W e conduct simulation studies to e v aluate the per f or mance of the proposed Bay esian Gaussian undirected profile graphical model (BPUGM). W e consider 𝑞 = 4 lev els of the co v ar iate 𝑋 , with 𝑋 = { 0 , 1 , 2 , 3 } , and e xamine performance under varying numbers of nodes 𝑝 ∈ { 20 , 50 , 100 } and three lev els of sparsity 𝑠 (the larg er the v alue of 𝑠 , the sparser the graphs). Four str uctural scenarios are inv estig ated: (i) all f our lev els ha ve distinct graph str uctures; (ii) { 𝐺 ( 0 ) = 𝐺 ( 1 ) } ≠ { 𝐺 ( 2 ) = 𝐺 ( 3 ) } ; (iii) { 𝐺 ( 0 ) = 𝐺 ( 1 ) = 𝐺 ( 2 ) } ≠ { 𝐺 ( 3 ) } ; and (iv) all le v els share the same g raph structure. For each scenario, data are generated independentl y f or eac h 𝑥 ∈ X from a multivariate nor mal distribution N ( 𝛽 𝑥 , Σ 𝑥 ) , where Σ 𝑥 = Ω − 1 𝑥 and 𝛽 𝑥 = Σ 𝑥 𝜁 𝑥 , with sample size 𝑛 𝑥 = 50 . A dditional details on the data g enerating mec hanism are pro vided in the Supplementary Materials. W e compare BPUGM with GemBag [Y ang et al., 2021], the fused g raphical lasso (FGL), and the g roup graphical lasso (GGL) [Danaher et al., 2014]. Graph structure es timation accuracy is assessed using the area under the R OC curve (A UC), accuracy , sensitivity , and specificity of Λ ( 𝑥 ) f or all 𝑥 ∈ X . Results are a verag ed o ver 100 simulated datasets, with standard er rors repor ted. Summaries are presented in Figures 4 to 6 and detailed values in T ables 1 to 4 of the Supplementar y Materials. A cross all scenar ios, v alues of 𝑝 , and sparsity le v els, BPUGM consistentl y achiev es the highest A UC, indi- cating super ior edge discr imination compared with competing methods. The impro v ement is mos t pronounced in Scenarios 2 and 3, where par tial shar ing of g raph structures across lev els allow s BPUGM to effectiv el y bor row 14 strength across groups. In Scenar io 1, where graph structures are fully dis tinct, BPUGM remains competitive and stable as 𝑝 increases. In Scenario 4, where all graphs are identical, BPUGM performs comparably to FGL and GGL, while maintaining slightl y higher sensitivity without loss of specificity . Ov erall, BPUGM pro vides a fa v orable balance betw een sensitivity and specificity , leading to higher accuracy . These results demons trate the advantag es of the proposed Bay esian profile graphical modeling frame work f or joint estimation of multiple related g raph s tr uctures under finite-sample conditions. 6.2 AML pro tein dat a W e analyze protein expression data from patients affected by acute my eloid leukemia (AML) with the goal of recons tr ucting and compar ing protein networks across disease subtypes; compar ing the netw orks for these groups provides insight into the differences in protein signaling that ma y affect whether treatments for one subtype will be effective in another one. A set of protein le vels, collected using the re verse phase protein array (RPP A) technology , is observed in a sample of 213 ne w ly diagnosed AML patients [Kornblau et al., 2009] 1 . Patients are classified b y subtype according to the Frenc h- Amer ican-Br itish (F AB) classification sys tem. W e consider 4 different profiles given b y 4 AML subtypes, for which a reasonable sample size is a vailable: M0 (17 subjects), M1 (34 subjects), M2 (68 subjects), and M4 (59 subjects). These profiles, based on cr iteria including cytog enetics and cellular mor phology , show v ar ying prognosis. W e e xpect to observe different protein interactions in the subtypes. W e f ocus on 18 proteins relev ant to the apoptosis and cell cy cle regulation KEGG pathwa y s [Kanehisa et al., 2011]. Our interest is modelling the effect of the AML subtype on the joint independence structure of the protein le v els. Profile undirected graphical models are an encompassing tool that coherently and jointly per f or ms all inf erential tasks of interest of learning ho w the protein dependency s tructure chang es across subtypes as w ell as the mean protein le v els. Theref ore, considering the 𝑝 = 18 protein le v els f ollowing a multivariate Gaussian distribution and 𝑞 = 4 different profiles of AML, where the le vels 𝑥 ∈ X = { 0 , 1 , 2 , 3 } denote the subtypes M0, M1, M2, M4 respectivel y , w e estimate and select the profile undirected g raphical model represented in Figure 7. For the sake of comparison, w e represent the cor responding multiple-graph in F igure 9; this graph is ar guably harder to read. Most importantly , the man y profile specific independencies are ob viously missed b y the graph. For instance, from the selected profile graph we lear n that for the profiles 𝑥 ∈ { 0 , 1 } , 𝑌 AKTp.308 ( 𝑥 ) ⊥ ⊥ 𝑌 BCI.2 ( 𝑥 ) | 𝑌 𝑉 \ { AKTp.308,BCI.2 } ( 𝑥 ) ; f or an y profile 𝑥 ∈ X , 𝑌 AKTp.308 ( 𝑥 ) ⊥ ⊥ 𝑌 B AD ( 𝑥 ) | 𝑌 𝑉 \ { AKTp.308,B AD } ( 𝑥 ) . The lev el of proteins B AX, GSK3 and XIAP are independent to the AML subtypes. 1 http://bioinf or matics.mdanderson.org/Supplements/K or nblau-AML-RPP A/aml-r ppa.xls 15 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.025 0.500 0.525 0.550 0.575 0.600 pugm gembag fgl ggl s = 0.050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.50 0.55 0.60 0.65 0.70 0.75 pugm gembag fgl ggl s = 0.025 0.500 0.525 0.550 0.575 0.600 pugm gembag fgl ggl s = 0.050 Scenario 2 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.025 0.49 0.51 0.53 0.55 0.57 pugm gembag fgl ggl s = 0.050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.025 0.52 0.56 0.60 0.64 pugm gembag fgl ggl s = 0.050 Scenario 4 Figure 4: A UC ov er 100 datasets, 𝑃 = 20 , 𝐾 = 4 , 𝑁 𝑥 = 20 f or the f our different scenarios 16 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 2 0.6 0.8 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0050 Scenario 4 Figure 5: A UC ov er 100 datasets, 𝑃 = 50 , 𝐾 = 4 , 𝑁 𝑥 = 50 f or the f our different scenarios 17 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 pugm gembag fgl ggl s = 0.0050 Scenario 1 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.0050 Scenario 2 0.5 0.6 0.7 0.8 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0050 Scenario 3 0.5 0.6 0.7 0.8 0.9 1.0 pugm gembag fgl ggl s = 0.0010 0.5 0.6 0.7 0.8 0.9 pugm gembag fgl ggl s = 0.0025 0.50 0.55 0.60 0.65 0.70 pugm gembag fgl ggl s = 0.0050 Scenario 4 Figure 6: A UC ov er 100 datasets, 𝑃 = 100 , 𝐾 = 4 , 𝑁 𝑥 = 50 f or the f our different scenarios 18 The selected profile graph has only three sq uare vertices and onl y three ar ro ws can be remo v ed as all edg es are dotted with the e x ception of one. This means that the dependence structure of 𝑌 𝑉 is e xpected to substantially v ary across profile and modeling the direct effect of 𝑋 on single 𝑌 𝑎 ∈ 𝑌 𝑉 becomes rele vant to capture how the conditional dependence structure of 𝑌 𝑉 chang es in different profiles. W e then compute the maximal connected components of each profile graph, cor responding to the set of maximal paths. These quantities summar ise the heterog eneity in protein connectivity and path s tructure across profiles. The cardinalities of the resulting sets differ across the f our le vels, taking v alues 4, 9, 13 and 12, respectiv el y , indicating variation in structural comple xity betw een profiles. The estimated profile g raphs obtained with our method exhibit varying lev els of sparsity across groups, compared with those from the GemBag approach. For g roups 0 and 1, the inferred graphs are substantiall y sparser than those produced by GemBag, sugges ting that e xplicitly modeling the effect of 𝑋 on 𝑌 𝑉 | X ma y e xplain a lar g e por tion of the dependence structure within 𝑌 𝑉 | X , thereb y reducing the number of residual conditional associations. T able 1 pro vides a direct comparison of edg e detection results between GemBag and BPUGM. For groups 0 and 1, all edg es identified b y BPUGM are also detected by GemBag, while f or groups 2 and 3 the majority of BPUGM edg es are like wise reco v ered. This patter n suggests that BPUGM yields a more parsimonious representation of the conditional dependence structure. Moreo ver , group-specific graphs generated b y our method (Figure 8) appear more heterog eneous than those obtained with GemBag (Figure 9), indicating a greater ability to capture g roup-specific differences. BPUGM 𝑈 ( 0 ) 𝑈 ( 1 ) 𝑈 ( 2 ) 𝑈 ( 3 ) T otal GemBag Edge No edge Edge N o edge Edge N o edge Edge N o edge Edge N o edge Edge 3 12 10 4 14 0 13 0 40 16 No edge 0 138 0 139 4 135 3 137 7 549 T able 1: Compar ison of GemBag and BPUGM edg e detection Finall y , we repeated the e xper iment 100 times, randomly removing 10% of the data each time to assess the robustness of the inf er red models. For each repetition, w e computed the balanced accuracy between the multiple-graphs obtained from the reduced dataset and the multiple-g raphs estimated from the full dataset, to be considered as the ground tr uth f or this metric. T able 2 show s that BPUGM exhibited higher o verall robustness than GemBag, achieving a higher ov erall balanced accuracy (0.9736 vs. 0.9339). In par ticular , BPUGM substantiall y outperf ormed GemBag in the two smallest-sample groups ( 𝑈 ( 0 ) and 𝑈 ( 1 ) ), where it inf er red sparser graphs, while perf or mance w as comparable betw een methods in the larg er groups ( 𝑈 ( 2 ) and 19 AKT AKTp.308 AKTp.473 B AD B AD.p112 B AD.p136 B AD.p155 B AX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 0,1,3 0,1,3 0 0,1 0,1,2 0,1,3 0,1 0,1,2 0,1,3 0,1,2 0,2 0,1 0,2 0,1,3 0,1,2 0 0,2,3 0 0,2,3 3 0,1 0,1,3 0,1,3 1 0 Figure 7: The selected profile undirected graph model f or protein data 20 𝑈 ( 3 ) ). This consistent impro v ement demonstrates the advantag e of BPUGM in model selection, highlighting its greater s tability and ability to reco v er graph str uctures closer to the or iginal model under data per turbations. BPUGM GemBag Mean SE Mean SE 𝑈 ( 0 ) 0.9803 0.0425 0.8607 0.0502 𝑈 ( 1 ) 0.9622 0.0395 0.9509 0.0394 𝑈 ( 2 ) 0.9772 0.0172 0.9736 0.0262 𝑈 ( 3 ) 0.9758 0.0235 0.9829 0.0210 Ov erall 0.9736 0.0143 0.9339 0.0192 T able 2: Mean balanced accuracy and standard er ror (SE) between the multiple-graphs inf er red from reduced datasets and those estimated from the full dataset, computed across 100 repetitions, compar ing BPUGM and GemBag. 7 Discussion W e propose a class of g raphical models that generalizes both chain graphs and multiple g raphs and, f or the first time, w e establish compatibility betw een these tw o types of graph. In line with L WF chain graphs, profile undirected graphs can be used for modelling the profile conditional independencies resulting from a sequence of non-independent reg ression models inv ol ving all response variables. From this perspectiv e, the specification of a class of profile chain graphs represents an interesting generalization to e xplore profile independencies in a multivariate regression setting. The parameter ization discussed in Section 5 f or the Gaussian case is quite standard. U nder the assumption of a Multinomial sampling scheme f or the multiv ariate outcome v ector , a parameterization based on the log-linear transf ormation [Laur itzen, 1996] could be used f or profile undirected graph models. W e dev eloped a Bay esian inferential procedure and a companion EM algor ithm f or fast inference. Alternative inferential approaches f or this type of graph are possible; f or e xample, model compar ison within the class of profile undirected graphs can be based on the likelihood ratio test in the case of nested models. These graphical models are smooth and belong to the cur v ed exponential f amily , so the likelihood ratio test has an asymptotic chi-square distribution. W e demonstrate the practical utility of this class of models through the anal y sis of protein netw ork data from multiple subtypes of acute my eloid leukemia. In this application, our proposed approach yields more robust network estimates than GemBag, as evidenced b y higher balanced accuracy when the experiment is repeated 100 times with 10% of the data randomly remo v ed at eac h iteration. Although this represents limited empirical evidence, the observed gains in robustness sugg est that our method tends to include fe w er false-positiv e 21 edg es. This beha vior is consistent with the e xplicit modeling of external f actors acting on the nodes, which allo ws the method to disentangle g enuine conditional dependencies from associations induced b y unmodeled heterog eneity . As a result, BPUGM lear ns sparser and more stable g raph structures by ex cluding edges that are not truly suppor ted by the data, whereas GemBag appears to retain weak er residual associations that are less stable across subsamples. Ov erall, these findings indicate that explicitl y accounting f or external factors can lead to more reliable inference of biologicall y meaningful netw ork connections. A ckno wledg ement The authors gratefully ackno w ledge Andrea Lazzerini f or his contributions. Supplementary material The Supplementar y Mater ial includes the profile local Marko v proper ty , proofs of the theorems and propositions, details of the EM algorithm, the data-generating mechanism used in the simulation studies, and additional simulation results 22 R ef erences S. A. Andersson, D. Madigan, and M. D. P erlman. Alter nativ e marko v proper ties f or chain g raphs. Scandinavian journal of statistics , 28(1):33–85, 2001. A. Bhadra and B. K. Mallick. Joint high-dimensional ba yesian v ar iable and co variance selection with an application to eqtl analy sis. Biometrics , 69(2):447–457, 2013. T . T . Cai, H. Li, W . Liu, and J. Xie. Co variate-adjusted precision matr ix estimation with an application in g enetical genomics. Biometrika , pag e ass058, 2012. M. Chen, Z. R en, H. Zhao, and H. Zhou. Asymptoticall y normal and efficient estimation of co variate-adjusted gaussian graphical model. Jour nal of the American S tatistical Association , 111(513):394–406, 2016. G. Consonni, L. La Rocca, and S. Peluso. Objective bay es co v ariate-adjusted sparse g raphical model selection. Scandinavian Journal of Statistics , pages 741–764, 2017. ISSN 1467-9469. doi: 10.1111/sjos.12273. URL http://dx.doi.org/10.1111/sjos.12273 . J. Corander . Labelled g raphical models. Scandinavian Journal of Statistics , 30(3):493–508, 2003. doi: 10.1111/1467-9469.00344. D. Co x and N. W ermuth. A g eneral condition f or a v oiding effect re versal after marginalization. Jour nal of the Ro yal S tatistical Socie ty, Series B , 65:937–941, 2003. P . Danaher, P . W ang, and D. Witten. The joint graphical lasso f or in verse cov ar iance estimation across multiple classes. Jour nal of the Roy al Statistical Society : Series B , 76:373–397, 2014. A. P . Dempster , N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algor ithm. Journal of the Royal Statistical Society. Series B, Methodological , 39(1):1–38, 1977. ISSN 0035-9246. M. Dr ton. Discrete chain graph models. Bernoulli , 15(3):736–753, 2009. doi: 10.3150/08-BEJ172. URL https://doi.org/10.3150/08-BEJ172 . M. Fr ydenberg. The chain graph mark o v proper ty . Scandinavian Journal of Statistics , pages 333–353, 1990. L. Gan, N. N. N ar isetty , and F . Liang. Ba yesian regular ization f or g raphical models with uneq ual shrinkage. Journal of the American Statis tical Association , 114(527):1218–1231, 2019. ISSN 0162-1459. E. Georg e and R. McCulloch. V ariable selection via Gibbs sampling. Journal of the American Statistical Association , 88(423):881–889, 1993. ISSN 0162-1459. doi: 10.1080/01621459.1993.10476353. J. Guo, E. Le vina, G. Michailidis, and J. Zhu. Joint estimation of multiple graphical models. Biometrika , 98 (1):1–15, 2011. 23 S. Hojsg aard. Split models f or conting ency tables. Computational Statis tics & Data Analysis , 42(4):621–645, 2003. M. Kanehisa, S. Goto, Y . Sato, M. Fur umichi, and M. T anabe. K egg f or integration and inter pretation of larg e-scale molecular data sets. Nucleic Acids Resear ch , 40(D1):D109–D114, 2011. S. M. Kornblau, R. Tibes, Y . H. Qiu, W . Chen, H. M. Kantarjian, and M. Andreeff. Functional proteomic profiling of aml predicts response and survival. Blood , 113(1):154–164, 2009. S. L. Lauritzen. Graphical Models . Oxf ord U niv . Press, N ew Y ork, 1996. S. L. Laur itzen and N. W ermuth. Graphical models f or associations between variables, some of which are qualitativ e and some q uantitative. The Annals of S tatistics , pag es 31–57, 1989. W . Lee and Y . Liu. Simultaneous multiple response regression and inv erse co variance matr ix estimation via penalized gaussian maximum likelihood. Journal of Multivariat e Analysis , 111:241–255, 2012. H. Nyman, J. P ensar , T . K oski, and J. Corander . Stratified graphical models - conte xt-specific independence in graphical models. Bayesian Analysis , 9(4):883–908, 2014. H. Nyman, J. Pensar , T . K oski, and J. Corander . Conte xt-specific independence in g raphical log-linear models. Computational Statistics , 31(4):1493–1512, 2016. C. B. Peterson, F . Stingo, and M. V annucci. Ba y esian inf erence of multiple Gaussian graphical models. Journal of the American Statistical Association , 110(509):159–174, 2015. V . R ock ov a and E. I. George. EMV S: The EM approach to Ba yesian variable selection. Jour nal of the American Statistical Association , 109(506):828–846, 2014. doi: 10.1080/01621459.2013.869223. URL http://dx.doi.org/10.1080/01621459.2013.869223 . V . R ock ov a and E. I. George. The Spike-and-Slab LASSO. Journal of the American Sta- tistical Association , 113(521):431–444, 2018. doi: 10.1080/01621459.2016.1260469. URL https://doi.org/10.1080/01621459.2016.1260469 . A. J. R othman, E. Levina, and J. Zhu. Sparse multiv ar iate regression with co v ariance estimation. Journal of Computational and Gr aphical Statis tics , 19(4):947–962, 2010. E. H. Simpson. The inter pretation of interaction in contingency tables. Jour nal of the Roy al S tatistical Society : Series B , 13:238–241, 1951. N. W er muth and K. Sadeghi. Sequences of regressions and their independences. TEST , 21(2):215–252, 2012. 24 X. Y ang, L. Gan, N. N. Narisetty , and F . Liang. Gembag: Group estimation of multiple bay esian graphical models. Jour nal of machine lear ning r esearc h , 22:1–48, 2021. ISSN 1532-4435. J. Yin and H. Li. A sparse conditional g aussian graphical model f or analy sis of g enetical g enomics data. The Annals of Applied Statistics , 5(4):2630, 2011. 25 AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 0 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 1 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 2 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MYC PTEN PTEN.p TP53 XIAP 𝑈 ( 3 ) Figure 8: Induced undirected multiple-graph f or the profile outcome v ector 𝑌 𝑉 ( 𝑥 ) 26 AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 0 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 1 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 2 ) AKT AKTp.308 AKTp.473 BAD BAD.p112 BAD.p136 BAD.p155 BAX BCI.2 BCI.XL CCNDI GSK3 GSK3.p MY C PTEN PTEN.p TP53 XIAP 𝑈 ( 3 ) Figure 9: Induced multiple-graph from GemBag 27 Supplemental Materials: Profile Graphical Models A Profile Local Mark o v property The probability distributions 𝑃 [ 𝑌 𝑉 | X ] of the profile outcome vectors 𝑌 𝑉 | X satisfy the profile undir ected Local Marko v Pr oper ty ( U -LMP) wrt the graph G U = ( 𝑉 , E U ) if, f or an y v er te x 𝑎 ∈ 𝑉 𝑌 𝑎 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑉 \ { 𝑎 ∪ 𝑛 𝑏 𝑥 ( 𝑎 ) } ( 𝑥 ) | 𝑌 𝑛𝑏 𝑥 ( 𝑎 ) ( 𝑥 ) , 𝑥 ∈ X . (S1) B Proofs Proof 1 of Theor em 1. Let G U = ( 𝑉 , E U ) be a pr ofile undir ected g raph and 𝐷 ⊆ 𝑉 be any 𝑥 -disconnected set with 𝑥 -connected components 𝐾 1 , . . . , 𝐾 𝑟 suc h that f or every pair 𝐾 𝑖 , 𝐾 𝑗 with 𝑖 , 𝑗 = 1 , . . . , 𝑟 , 𝑖 ≠ 𝑗 , for the U -CSMP wr t G U w e hav e 𝑌 𝐾 𝑖 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐾 𝑗 ( 𝑥 ) | 𝑌 𝑉 \ { 𝐾 𝑖 , 𝐾 𝑗 } ( 𝑥 ) , 𝑥 ∈ X . (S2) F or any pair 𝐾 𝑖 , 𝐾 𝑗 ⊂ 𝐷 with 𝑖 , 𝑗 = 1 , . . . , 𝑟 , 𝑖 ≠ 𝑗 , the set 𝑆 𝑖 𝑗 = 𝑉 \ { 𝐾 𝑖 , 𝐾 𝑗 } is an 𝑥 -separ ator . Then, the U -CSMP implies the U -GMP wrt G U . Conv ersely, consider any 𝑥 -connected set 𝐶 wrt G U and let 𝑛 𝑏 𝑥 ( 𝐶 ) = Ð 𝑎 ∈ 𝐶 𝑛 𝑏 𝑥 ( 𝑎 ) be the neighbour se t including 𝐶 and let 𝑆 𝐶 = 𝑛 𝑏 𝑥 ( 𝐶 ) \ 𝐶 be an 𝑥 -separator f or the sets 𝐶 and 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) , f or any 𝑥 ∈ X . The U -GMP implies t hat 𝑌 𝐶 ( 𝑥 ) ⊥ ⊥ 𝑌 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) ( 𝑥 ) | 𝑌 𝑆 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (S3) N ot e t hat 𝐶 ∪ { 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) } is an 𝑥 -disconnected set, for 𝑥 ∈ X . W e distinguish tw o cases, whether 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) is 𝑥 -connected or 𝑥 -disconnected. In the firs t case, the 𝑥 -connect ed components of 𝐶 ∪ { 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) } ar e 𝐶 and 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) , then the U -CSMP is satisfied. If 𝑉 \ 𝑛 𝑏 𝑥 ( 𝐶 ) is 𝑥 -disconnected wit h 𝐾 1 ∪ · · · ∪ 𝐾 𝑟 connected components, the U -GMP also implies that 𝑌 𝐶 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑆 𝐶 ( 𝑥 ) , 𝑥 ∈ X . (S4) Then the U -GMP implies the U -CSMP wrt G U . Proof 2 of Proposition 1. Consider a profile undirect ed gr aph G U = ( 𝑉 , E U ) associated to the profile outcome v ectors 𝑌 𝑉 | X and the induced class 𝑈 𝑉 | X of multiple undir ected gr aphs. If the pr obability distributions 𝑃 [ 𝑌 𝑉 | X ] 1 satisfy the U -CSMP wrt G U , the U -GMP is also satisfied from Theor em 1. So, given thr ee disjoint subsets 𝐴, 𝐵, 𝐶 of 𝑉 , 𝑌 𝐴 ( 𝑥 ) ⊥ ⊥ 𝑌 𝐵 ( 𝑥 ) | 𝑌 𝐶 ( 𝑥 ) , (S5) wher e 𝐴 and 𝐵 are 𝑥 -separ ated by 𝐶 , with 𝑥 ∈ X . The result follo w s by Definition 5, since 𝐴 and 𝐵 ar e 𝑥 -separated by 𝐶 in G B if and only if they ar e 𝑥 -separ ated by 𝐶 in 𝑈 ( 𝑥 ) ∈ 𝑈 𝑉 | X , with 𝑥 ∈ X . Proof 3 of Proposition 2. Given an undir ected gr aph 𝑈 = ( 𝑉 , 𝐸 𝑈 ) associated to a random v ector 𝑌 𝑉 , the g lobal and the pair wise Marko v pr operties are eq uiv alent if the joint pr obability distribution 𝑃 ( 𝑌 𝑉 ) is strictly positiv e; see Lauritzen [1996]. The proposition f ollow s by applying this result to the strictly positive pr obability distribution 𝑃 [ 𝑌 𝑉 ( 𝑥 ) ] of any profile outcome v ector 𝑌 𝑉 ( 𝑥 ) ∈ 𝑌 𝑉 | X . Proof 4 of Theorem 2. F irs t of all we recall that, given a profile undir ected gr aph G U associated to the profile outcome vect ors 𝑌 𝑉 | X , for Theor em (1) t he probability distributions 𝑃 [ 𝑌 𝑉 | X ] satisfies the U -CSMP if and only if the U -GMP is satisfied. In or der t o pro v e the compatibility, w e distinguish thr ee cases: a) if a set 𝐷 is 𝑥 -disconnected in G U with 𝑥 = ∅ , this means t hat it is a connected set and then the same set 𝐷 is also connected in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) and, ther ef ore, the U -CSMP trivially implies the condition (6) as no independence s tatement is implied f or 𝑌 𝐷 giv en 𝑋 and the remaining set of variables; in this case condition ( 𝑖 𝑖 ) is not inv oked as no independence statements hold f or 𝑌 𝐷 r egar dless of any 𝑌 𝑗 ∈ 𝑌 𝐷 is dependent or independent of 𝑋 giv en the remaining set of variables; b) if a set 𝐷 is 𝑥 -disconnected in G U f or all 𝑥 ∈ X , then the same set 𝐷 is also disconnected in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) and, ther efor e, the U -CSMP implies the condition (6) for the class of induced chain gr aphs; in this case condition ( 𝑖 𝑖 ) is also not invoked as 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , ∀ 𝑥 ∈ X ⇒ 𝑌 𝐾 1 ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } , r egar dless of any 𝑌 𝑗 ∈ 𝑌 𝐷 is dependent or independent of 𝑋 giv en the remaining set of v ariables; c) if a se t 𝐷 is 𝑥 -disconnect ed in G U f or some 𝑥 ∈ Z ⊂ X with 𝑥 ≠ ∅ , then t he same set 𝐷 is connect ed in any 𝐶 𝑈 ∈ C 𝑈 f or condition ( 𝑖 ) ; t he U -CSMP implies condition (6) as 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) , 𝑥 ∈ Z ⊂ X ⇒ 𝑌 𝐾 1 ⊥ ⊥ / . . . ⊥ ⊥ / 𝑌 𝐾 𝑟 | { 𝑌 𝑉 \ 𝐷 , 𝑋 } ; 2 in this case 𝑌 𝐷 cannot be independent of 𝑋 giv en 𝑌 𝑉 \ 𝐷 as the profile conditional distribution 𝑌 𝐷 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) behav es differently f or any 𝑥 ∈ 𝑋 , i.e. 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ . . . ⊥ ⊥ 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) f or 𝑥 ∈ Z and 𝑌 𝐾 1 ( 𝑥 ) ⊥ ⊥ / . . . ⊥ ⊥ / 𝑌 𝐾 𝑟 ( 𝑥 ) | 𝑌 𝑉 \ 𝐷 ( 𝑥 ) for 𝑥 ∈ X \ Z ; condition ( 𝑖𝑖 ) is then r equired so that 𝑌 𝐷 ⊥ ⊥ / 𝑋 | 𝑌 𝑉 \ 𝐷 . C EM algorithm The E-step tak es the e xpectation of the complete data log posterior 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ ( 𝑞 − 1  𝑥 = 0 h log  P ( 𝑌 𝑣 𝑥 | b Ω 𝑥 , b 𝛽 𝑥 , b 𝛼 )  i + 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 h log  P ( b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 )  i + 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1 h log  P ( b 𝛽 𝑖 𝑥 | 𝜃 ∗ 𝑖 )  + log  P ( b 𝜔 𝑖 𝑖 , 𝑥 | 𝜏 )  i ) (S6) Let 𝑄 = 𝑄 ( b Δ ) = E 𝜃 , 𝑅 | 𝑌 , b Δ [ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ] = ⟨ log ( P ( b Δ , 𝜃 , 𝑅 | 𝑌 𝑉 𝑥 ) ⟩ 𝑄 ∝ * 1 2 𝑞 − 1  𝑥 = 0 " 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr ( 𝑛 𝑥  𝑘 = 1  𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )   𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )  ⊤ b Ω 𝑥 ) # + 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1 h log  P ( b 𝛽 𝑖 𝑥 | 𝜃 ∗ 𝑖 )  − 𝜏 b 𝜔 𝑖 𝑖 , 𝑥 i + 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 h log  P ( b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 )  i + , (S7) where tr ( 𝐴 ) is the trace of matrix 𝐴 and det ( 𝐴 ) its deter minant. Setting a Laplace spik e-and-slab prior on 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 as in Y ang et al. [2021], and a Normal spike-and-slab on 𝛽 𝑖 𝑥 | 𝜃 𝑖 as in R ock ov a and Georg e [2014], (S7) becomes 𝑄 ∝ 1 2 𝑞  𝑥 = 1 " 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr ( 𝑛 𝑥  𝑘 = 1  𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )   𝑌 𝑥 , 𝑘 − ( b 𝛼 + b 𝛽 𝑥 )  ⊤ b Ω 𝑥 ) # − 𝑞  𝑥 = 1 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  E 𝜃 | b Δ ,𝑌 𝑉 𝑥  1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0   + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞  𝑥 = 1  𝑖 < 𝑗  | b 𝜔 𝑖 𝑗 , 𝑥 |  E 𝑅 | b Δ ,𝑌 𝑉 𝑥  𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0    (S8) 3 Equation (S8) onl y depends on ( 𝜃 , 𝑅 ) through E 𝜃 | b Δ ,𝑌 𝑉 𝑥  1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0  = E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 𝜃 𝑖 ] 𝜆 1 + E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 1 − 𝜃 𝑖 ] 𝜆 0 = 𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0 (S9) and E 𝑅 | b Δ ,𝑌 𝑉 𝑥  𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0  = E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] 𝜈 1 + E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 1 − 𝑟 𝑖 𝑗 , 𝑥 ] 𝜈 0 = 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 , (S10) The model hierarch y separates 𝜃 from the data 𝑌 𝑉 𝑥 through the coefficients 𝛽 and the precision Ω so that P [ 𝜃 | b Δ , 𝑌 𝑉 𝑥 ] = P [ 𝜃 | b Δ ] = P [ 𝜃 | 𝛽 , Ω ] . This leads to 𝜃 ∗ 𝑖 = E 𝜃 | b Δ ,𝑌 𝑉 𝑥 [ 𝜃 𝑖 ] = E 𝜃 | Δ [ 𝜃 𝑖 ] = P [ 𝜃 𝑖 = 1 | 𝛽 𝑖 , 𝜔 𝑖 · ] = P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] P [ 𝜃 𝑖 = 1 ] P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] P [ 𝜃 𝑖 = 1 ] + P [ 𝛽 𝑖 , 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] P [ 𝜃 𝑖 = 0 ] = P [ 𝛽 𝑖 | 𝜃 𝑖 = 1 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] 𝑝 2 P [ 𝛽 𝑖 | 𝜃 𝑖 = 1 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] 𝑝 2 + P [ 𝛽 𝑖 | 𝜃 𝑖 = 0 ] P [ 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] ( 1 − 𝑝 2 ) = 𝑝 2 P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] Î 𝑞 − 1 𝑥 = 0 P 1 ( 𝛽 𝑖 𝑥 ) 𝑝 2 P [ 𝜔 𝑖 · | 𝜃 𝑖 = 1 ] Î 𝑞 − 1 𝑥 = 0 P 1 ( 𝛽 𝑖 𝑥 ) + ( 1 − 𝑝 2 ) P [ 𝜔 𝑖 · | 𝜃 𝑖 = 0 ] Î 𝑞 − 1 𝑥 = 0 P 0 ( 𝛽 𝑖 𝑥 ) (S11) The P [ 𝜔 𝑖 · | 𝜃 𝑖 ] in (S11) is 4 P ( 𝜔 𝑖 · | 𝜃 𝑖 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝛾 𝑖 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ) ! P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑥 | 𝜃 𝑖 , 𝜃 𝑗 , 𝛾 𝑖 𝑗 ) ∗ P ( 𝜃 𝑗 ) P ( 𝛾 𝑖 𝑗 ) # =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝛾 𝑖 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗  𝑝 𝛾 𝑖 𝑗 1 ( 1 − 𝑝 1 ) 1 − 𝛾 𝑖 𝑗  # =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( 𝑝 1 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + 𝑝 1 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) + ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ) (S12) 5 Then, P ( 𝜔 𝑖 · | 𝜃 𝑖 = 1 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( 𝑝 1 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + 𝑝 1 ( 1 − 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) + ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ) =  𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 1 𝑝 2 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝑝 1 ) 𝑝 2 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) =  𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) n 𝑝 1 𝑝 2 ( 1 − 𝑝 3 ) 𝑞 + 𝑝 1 ( 1 − 𝑝 2 ) ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) 𝑝 2 o + 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 )  𝑝 1 𝑝 2 𝑝 𝑞 3 + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 4  + 𝑝 1 𝑝 2 𝑞 − 1  𝑘 = 1 𝑝 𝑘 3 ( 1 − 𝑝 3 ) 𝑞 − 𝑘  𝑛 ∈ 𝐴 ( 𝑘 ) 𝑞 𝑞 − 1 Ö 𝑙 = 0 P 𝑛 𝑙 ( 𝜔 𝑖 𝑗 ,𝑙 ) # , (S13) with 𝐴 ( 𝑘 ) 𝑞 : = { 𝑛 = ( 𝑛 0 , . . . , 𝑛 𝑞 − 1 : 𝑛 𝑙 ∈ { 0 , 1 } f or all 0 ≤ 𝑙 ≤ 𝑞 − 1 and Í 𝑞 − 1 𝑙 = 0 = 𝑘 ) } , the set of { 0 , 1 } -valued binary seq uences of length q, with 𝑘 elements with P 1 ( 𝜔 𝑖 𝑗 , 𝑘 ) , 𝑘 = 1 , . . . , 𝑞 − 1 . W e note that the cardinality of 𝐴 ( 𝑘 ) 𝑞 is # 𝐴 ( 𝑘 ) 𝑞 =  𝑞 𝑘  and Í 𝑞 𝑘 = 0 # 𝐴 ( 𝑘 ) 𝑞 = Í 𝑞 𝑘 = 0  𝑞 𝑘  = 2 𝑞 . 6 P ( 𝜔 𝑖 · | 𝜃 𝑖 = 0 ) =  𝑗  𝜃 𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗  𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗  ∗ ( ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) # =  𝑗  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( ( 1 − 𝑝 1 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝑝 1 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ) # =  𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 𝑝 1 ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) ) + 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 1 𝑝 4 # (S14) 7 For instance, assuming 𝑞 = 4 , Eq uation (S13) and (S14) are of the f or m P ( 𝜔 𝑖 · | 𝜃 𝑖 = 1 ) =  𝑗 " P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) n 𝑝 1 𝑝 2 ( 1 − 𝑝 3 ) 4 + 𝑝 1 ( 1 − 𝑝 2 ) ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) 𝑝 2 o + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) n 𝑝 1 𝑝 2 𝑝 4 3 + 𝑝 1 ( 1 − 𝑝 2 ) 𝑝 4 o + 𝑝 1 𝑝 2 n 𝑝 3 ( 1 − 𝑝 3 ) 3  P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 )  + 𝑝 2 3 ( 1 − 𝑝 3 ) 2  P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 )  + 𝑝 3 3 ( 1 − 𝑝 3 )  P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 )  o # , P ( 𝜔 𝑖 · | 𝜃 𝑖 = 0 ) =  𝑗 " P 0 ( 𝜔 𝑖 𝑗 , 0 ) P 0 ( 𝜔 𝑖 𝑗 , 1 ) P 0 ( 𝜔 𝑖 𝑗 , 2 ) P 0 ( 𝜔 𝑖 𝑗 , 3 ) ( 𝑝 1 ( 1 − 𝑝 4 ) + ( 1 − 𝑝 1 ) ) + P 1 ( 𝜔 𝑖 𝑗 , 0 ) P 1 ( 𝜔 𝑖 𝑗 , 1 ) P 1 ( 𝜔 𝑖 𝑗 , 2 ) P 1 ( 𝜔 𝑖 𝑗 , 3 ) 𝑝 1 𝑝 4 # . (S15) 8 The conditional e xpectation E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] = 𝑟 ∗ 𝑖 𝑗 , 𝑥 is 𝑟 ∗ 𝑖 𝑗 , 𝑥 = E 𝑅 | b Δ ,𝑌 𝑉 𝑥 [ 𝑟 𝑖 𝑗 , 𝑥 ] = E 𝑅 | b Δ [ 𝑟 𝑖 𝑗 , 𝑥 ] = P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | Δ ) = P ( 𝛾 𝑖 𝑗 = 1 | Δ ) ∗ h P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝑟 𝑖 𝑗 , / 𝑥 = 1 | Δ ) n P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 0 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 0 , Δ ) o i (S16) 1 . P ( 𝛾 i j = 1 | 𝚫 ) in Equation (S16) is P ( 𝛾 𝑖 𝑗 = 1 | Δ ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) P ( 𝛾 𝑖 𝑗 = 1 ) P ( Δ | 𝛾 𝑖 𝑗 = 1 ) P ( 𝛾 𝑖 𝑗 = 1 ) + P ( Δ | 𝛾 𝑖 𝑗 = 0 ) P ( 𝛾 𝑖 𝑗 = 0 ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑝 1 P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑝 1 + P ( Δ | 𝛾 𝑖 𝑗 = 0 ) ( 1 − 𝑝 1 ) . , (S17) P ( Δ | 𝛾 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝜃 𝑖  𝜃 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 ) ! P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑥 | 𝜃 𝑖 , 𝜃 𝑗 , 𝛾 𝑖 𝑗 ) ∗ 𝑞 − 1 Ö 𝑥 = 0 P ( 𝛽 𝑖 , 𝑥 | 𝜃 𝑖 ) P ( 𝛽 𝑗 , 𝑥 | 𝜃 𝑗 ) ! P ( 𝜃 𝑖 ) P ( 𝜃 𝑗 ) # , (S18) 9 P ( Δ | 𝛾 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1  𝜃 𝑖  𝜃 𝑗 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ 𝛾 𝑖 𝑗 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ∗ 𝑞 − 1 Ö 𝑥 = 0  𝜃 𝑖 P 1 ( 𝛽 𝑖 𝑥 ) + ( 1 − 𝜃 𝑖 ) P 0 ( 𝛽 𝑖 𝑥 )   𝜃 𝑗 P 1 ( 𝛽 𝑗 𝑥 ) + ( 1 − 𝜃 𝑗 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ∗ 𝑝 𝜃 𝑖 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑖 𝑝 𝜃 𝑗 2 ( 1 − 𝑝 2 ) 1 − 𝜃 𝑗 # =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝛾 𝑖 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ! + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ∗ ( 1 − 𝛾 𝑖 𝑗 ) 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! + 𝛾 𝑖 𝑗 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ) # (S19) 10 Set 𝛾 𝑖 𝑗 = 0 and let 𝑔 0 ( 𝜔 𝑖 𝑗 ) = P ( Δ | 𝛾 𝑖 𝑗 = 0 ) 𝑔 0 ( 𝜔 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) + 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) ! ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) # = 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) " 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! # . (S20) Set 𝛾 𝑖 𝑗 = 1 and let 𝑔 1 ( 𝜔 𝑖 𝑗 ) = P ( Δ | 𝛾 𝑖 𝑗 = 1 ) 𝑔 1 ( 𝜔 𝑖 𝑗 ) =  𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 " 𝑞 − 1 Ö 𝑥 = 0 𝑟 𝑖 𝑗 , 𝑥 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ! ∗ ( 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  ∗ 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ∗ 𝑝 𝑟 𝑖 𝑗 , 0 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 0 𝛿 𝑟 𝑖 𝑗 , 1 , . . . , 𝑟 𝑖 𝑗 , 𝑞 − 1 ( 𝑟 𝑖 𝑗 , 0 ) ! ) # 11 𝑔 1 ( 𝜔 𝑖 𝑗 ) = 𝑞 − 1 Ö 𝑥 = 0 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) ∗ ( 𝑝 𝑞 3 ∗ 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑝 4 ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) + 𝑞 − 1 Ö 𝑥 = 0 P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ∗ ( ( 1 − 𝑝 3 ) 𝑞 ∗ 𝑝 2 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + ( 1 − 𝑝 4 ) ∗ ( 1 − 𝑝 2 ) 2 ∗ 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  + 𝑝 2 ( 1 − 𝑝 2 ) 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 1 ( 𝛽 𝑗 𝑥 )  + 𝑞 − 1 Ö 𝑥 = 0  P 1 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  ! ! ) + 𝑝 2 2 𝑞 − 1 Ö 𝑥 = 0  P 0 ( 𝛽 𝑖 𝑥 ) P 0 ( 𝛽 𝑗 𝑥 )  𝑞 − 1  𝑘 = 1 𝑝 𝑘 3 ( 1 − 𝑝 3 ) 𝑞 − 𝑘  𝑛 ∈ 𝐴 ( 𝑘 ) 𝑞 𝑞 − 1 Ö 𝑙 = 0 P 𝑛 𝑙 ( 𝜔 𝑖 𝑗 ,𝑙 ) . (S21) 12 2 . P ( r i j , x = 1 | 𝛾 i j = 1 , 𝜃 i , 𝜃 j , 𝚫 ) = h 𝜃 i , 𝜃 j ( 𝜔 i j , x ) is equal to = P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( Δ | 𝑟 𝑖 𝑗 , 𝑥 = 0 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 = P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 0 , 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 = P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) h P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 1 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) + P ( 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 𝑖 𝑗 , 𝑥 = 0 ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 0 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) i − 1 (S22) Setting P ( 𝑟 𝑖 𝑗 , 0 , . . . , 𝑟 𝑖 𝑗 , 1 | 𝛾 𝑖 𝑗 , 𝜃 𝑖 , 𝜃 𝑗 ) as defined in Equation ( ?? ) w e compute P ( 𝑟 𝑖 𝑗 , 𝑥 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 , 𝜃 𝑗 ) =  𝑟 𝑖 𝑗 , / 𝑥 " 𝜃 𝑖 𝜃 𝑗 𝑞 − 1 Ö 𝑥 = 0 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 ! + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 𝛿 𝑟 𝑖 𝑗 , / 𝑥 ( 𝑟 𝑖 𝑗 , 𝑥 ) # = 𝜃 𝑖 𝜃 𝑗 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 + ( 1 − 𝜃 𝑖 𝜃 𝑗 )  𝑟 𝑖 𝑗 , / 𝑥 h 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 ( 1 − 𝑝 4 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 𝛿 𝑟 𝑖 𝑗 , / 𝑥 ( 𝑟 𝑖 𝑗 , 𝑥 ) i = 𝜃 𝑖 𝜃 𝑗 𝑝 𝑟 𝑖 𝑗 , 𝑥 3 ( 1 − 𝑝 3 ) 1 − 𝑟 𝑖 𝑗 , 𝑥 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) h 𝑝 𝑟 𝑖 𝑗 , 𝑥 4 + ( 1 − 𝑝 4 ) ( 1 − 𝑟 𝑖 𝑗 , 𝑥 ) i = 𝜃 𝑖 𝜃 𝑗 Bernoulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 3 ) + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) Ber noulli ( 𝑟 𝑖 𝑗 , 𝑥 | 𝑝 4 ) (S23) P ( 𝑟 𝑖 𝑗 , 𝑥 | 𝛾 𝑖 𝑗 = 0 , 𝜃 𝑖 , 𝜃 𝑗 ) =  𝑟 𝑖 𝑗 , / 𝑥 𝑞 − 1 Ö 𝑥 = 0 𝛿 0 ( 𝑟 𝑖 𝑗 , 𝑥 ) (S24) 13 Substituting Equation (S23), Equation (S22) ℎ 𝜃 𝑖 , 𝜃 𝑗 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 𝑝 3 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 4 i P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 𝑝 3 + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) 𝑝 4 i + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) h 𝜃 𝑖 𝜃 𝑗 ( 1 − 𝑝 3 ) + ( 1 − 𝜃 𝑖 𝜃 𝑗 ) ( 1 − 𝑝 4 ) i (S25) Thus 𝑟 ∗ 𝑖 𝑗 , 𝑥 is equal to 𝑟 ∗ 𝑖 𝑗 , 𝑥 = P ( 𝛾 𝑖 𝑗 = 1 | Δ ) ∗ h P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝑟 𝑖 𝑗 , / 𝑥 = 1 | Δ ) n P ( 𝜃 𝑖 = 1 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 1 , 𝜃 𝑗 = 0 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 1 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 1 , Δ ) + P ( 𝜃 𝑖 = 0 | Δ ) P ( 𝜃 𝑗 = 0 | Δ ) P ( 𝑟 𝑖 𝑗 , 𝑥 = 1 | 𝛾 𝑖 𝑗 = 1 , 𝜃 𝑖 = 0 , 𝜃 𝑗 = 0 , Δ ) o i = 𝛾 ∗ 𝑖 𝑗 " 𝜃 ∗ 𝑖 𝜃 ∗ 𝑗 ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) + 𝑟 ∗ 𝑖 𝑗 , / 𝑥 ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) n 𝜃 ∗ 𝑖 ( 1 − 𝜃 ∗ 𝑗 ) + ( 1 − 𝜃 ∗ 𝑖 ) 𝜃 ∗ 𝑗 + ( 1 − 𝜃 ∗ 𝑖 ) ( 1 − 𝜃 ∗ 𝑗 ) o # (S26) In which 𝛾 ∗ 𝑖 𝑗 = 𝑔 1 ( 𝜔 𝑖 𝑗 ) 𝑝 1 𝑔 1 ( 𝜔 𝑖 𝑗 ) 𝑝 1 + 𝑔 0 ( 𝜔 𝑖 𝑗 ) ( 1 − 𝑝 1 ) , (S27) From Equation (S25) w e see that ℎ 1 , 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 , 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 , 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) . W e then defined ℎ 1 , 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) , g etting: ℎ 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 3 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 3 + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 1 − 𝑝 3 ) ℎ 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) = P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 4 P 1 ( 𝜔 𝑖 𝑗 , 𝑥 ) 𝑝 4 + P 0 ( 𝜔 𝑖 𝑗 , 𝑥 ) ( 1 − 𝑝 4 ) (S28) In the M-step we maximise 𝑄 ( Δ ) w .r .t. ( Ω 𝑥 , 𝛽 𝑥 , 𝛼 ) . Without lost of g enerality , we consider that w e ha v e mean centered observations bef ore proceeding, i.e. 𝛼 = { 0 , . . . , 0 } . Let 𝑌 𝑥 ∈ R 𝑛 × 𝑝 a matrix with column 𝑘 equal to the 𝑌 𝑥 , 𝑘 f or individual 𝑘 , 𝛽 ∈ R 𝑝 × 1 = [ 𝛽 𝑖 𝑥 ] 𝑖 ∈ 𝑉 and 1 𝑛 𝑥 ∈ R 1 × 𝑛 𝑥 a column v ector of ones. Setting E 𝜃 | b Δ ,𝑌 𝑉 𝑥 h 1 𝜃 𝑖 𝜆 1 + 1 ( 1 − 𝜃 𝑖 ) 𝜆 0 i = 𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0 and E 𝑅 | b Δ ,𝑌 𝑉 𝑥 h 𝑟 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 𝑖 𝑗 , 𝑥 𝜈 0 i = 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 , Equation (S8) can be seen as: 14 𝑄 ∝ 1 2 𝑞 − 1  𝑥 = 0  𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr   𝑌 ⊤ 𝑥 − 1 𝑛 𝑥 b 𝛽 𝑥   𝑌 ⊤ 𝑥 − 1 𝑛 𝑥 b 𝛽 𝑥  ⊤ b Ω 𝑥   − 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0  + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # 𝑄 ∝ 1 2 𝑞 − 1  𝑥 = 0 h 𝑛 𝑥 log ( det ( b Ω 𝑥 ) ) − tr n  𝑌 ⊤ 𝑥 𝑌 𝑥 − 2 b 𝛽 𝑥 1 𝑛 𝑥 𝑌 ⊤ 𝑥 − b 𝛽 𝑥 1 𝑛 𝑥 1 ⊤ 𝑛 𝑥 b 𝛽 ⊤ 𝑥  b Ω 𝑥 o i − 𝑞 − 1  𝑥 = 0 𝑝  𝑖 = 1  1 2 b 𝛽 2 𝑖 𝑥  𝜃 ∗ 𝑖 𝜆 1 + 1 − 𝜃 ∗ 𝑖 𝜆 0  + 𝜏 b 𝜔 𝑖 𝑖 , 𝑥  − 𝑞 − 1  𝑥 = 0  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # (S29) T o 𝑄 w .r .t. 𝛽 𝑥 , we set the par tial derivativ e 𝛽 𝑥 , to 0: 𝜕 𝑄 𝜕 𝛽 𝑥 = b Ω ⊤ 𝑥 𝑌 ⊤ 𝑥 1 ⊤ 𝑛 𝑥 − b Ω ⊤ 𝑥 b 𝛽 𝑥 1 𝑛 𝑥 1 ⊤ 𝑛 𝑥 − diag  𝜃 ∗ 1 𝜆 1 + 1 − 𝜃 ∗ 1 𝜆 0 . . . 𝜃 ∗ p 𝜆 1 + 1 − 𝜃 ∗ p 𝜆 0  b 𝛽 x = 0 , (S30) sol ving Equation (S30) b 𝛽 𝑥 =  𝑛 𝑥 b Ω 𝑥 + 𝐷 Θ ∗  − 1 b Ω 𝑥 𝑌 ⊤ 𝑥 1 ⊤ 𝑛 𝑥 , (S31) with 𝐷 Θ ∗ = diag n 𝜃 ∗ 1 𝜆 1 + 1 − 𝜃 ∗ 1 𝜆 0 . . . 𝜃 ∗ p 𝜆 1 + 1 − 𝜃 ∗ p 𝜆 0 o . Equation (S31) has the f or m of a ridge regression estimator with penalty 𝐷 Θ ∗ . Maximising Q w .r .t b Ω 𝑥 f or eac h 𝑥 = 0 , . . . , 𝑞 − 1 , implies optimizing the f ollowing objectiv e function: 𝑄 ( b Ω 𝑥 ) = 𝑛 𝑥 2 log ( det ( b Ω 𝑥 ) ) − 𝑛 𝑥 2 tr n b 𝑆 𝑥 b Ω 𝑥 o − 𝑝  𝑖 = 1 𝜏 b 𝜔 𝑖 𝑖 , 𝑥 −  𝑖 < 𝑗 " | b 𝜔 𝑖 𝑗 , 𝑥 | 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 1 + 1 − 𝑟 ∗ 𝑖 𝑗 , 𝑥 𝜈 0 ! # , (S32) with b 𝑆 𝑥 = 1 𝑛 𝑥 Í 𝑛 𝑥 𝑘 = 1 ( 𝑌 𝑥 , 𝑘 − b 𝛽 𝑥 ) ( 𝑌 𝑥 , 𝑘 − b 𝛽 𝑥 ) ⊤ . W e optimize 𝑄 ( b Ω 𝑥 ) subject to the constraints that b Ω 𝑥 ≻ 0 and | | b Ω 𝑥 | | 2 ≤ 𝐵 , with a reasonabl y larg e 𝐵 to obtain an objective function 𝑄 ( b Ω 𝑥 ) strictly conv e x, and guaranteed that the local solution b Ω 𝑥 is the unique 15 solution as in Y ang et al. [2021]. In order to optimize 𝑄 ( b Ω 𝑥 ) , we f ollo w the algorithm sugges ted in Gan et al. [2019]. D Simulations: Data generating mechanism W e generate obser vations from a set 𝑌 𝑉 | X of random v ectors associated to a profile undirected graph G U with 𝑝 = 20 , 50 and 100 nodes and 𝑞 = 4 lev els of 𝑋 , such that 𝑥 ∈ X = { 0 , 1 , 2 , 3 } . Follo wing Peterson et al. [2015], w e first construct Ω 0 , the precision matr ix of the baseline lev el 𝑥 = 0 . W e set Ω 0 to be a 𝑝 × 𝑝 symmetric matrix with main diagonal entries 𝜔 𝑎 𝑎 , 0 = 𝑎 , with 𝑎 = 1 , . . . , 𝑝 , and off-diagonal entries 𝜔 ( 𝑎 + 1 ) 𝑎 , 0 = 𝜔 𝑎 ( 𝑎 + 1 ) , 0 = 0 . 5 with 𝑎 = 1 , . . . , 19 and 𝜔 ( 𝑎 + 2 ) 𝑎 , 0 = 𝜔 𝑎 ( 𝑎 + 2 ) , 0 = 0 . 4 with 𝑎 = 1 , . . . , 18 . For all 𝑎 ∈ 𝑉 , w e set both 𝛼 𝑎 and 𝜁 𝑎 𝑥 to zero. F or 𝑥 = { 1 , 2 , 3 } , w e also set 𝜁 𝑎 𝑥 = 0 f or 𝑎 = 5 , . . . 20 and 𝜁 𝑎 𝑥 = 1 f or 𝑎 = 1 , . . . 4 , i.e., the e xter nal f actor 𝑋 affects only the first f our response v ar iables. The remaining precision matrices Ω 𝑥 f or 𝑥 = { 1 , 2 , 3 } are obtained as f ollo w , first we set Ω 𝑥 = Ω 0 , then with probability 0.5 we set to zero its non-zero entr ies.W e then change the sparcity lev el of the precision matrices, varying the 𝑠 parameter from 0.0010 to 0.0050 to ha v e increasing number of non-zero elements. Data are generated by dra wing a random sample of size 𝑛 𝑥 = 50 from the distr ibution N ( 𝛽 𝑥 , Σ 𝑥 ) where Σ 𝑥 = Ω − 1 𝑥 and 𝛽 𝑥 = Σ 𝑥 𝜁 𝑥 , for all 𝑥 ∈ X . E More results 16 T able 1: A ccuracy , sensitivity , specificity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50 Scenar io 1: different g raph A ccuracy Sensitivity Specificity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.914 0.000 0.837 0.026 0.914 0.000 0.875 0.006 GemBag 0.996 0.000 0.572 0.064 1.000 0.000 0.786 0.016 FGL 0.983 0.001 0.603 0.096 0.986 0.001 0.794 0.023 GGL 0.982 0.001 0.665 0.083 0.984 0.001 0.825 0.020 S = 0.025 BPUGM 0.899 0.000 0.431 0.007 0.915 0.000 0.673 0.002 GemBag 0.972 0.000 0.182 0.007 1.000 0.000 0.591 0.002 FGL 0.943 0.001 0.275 0.022 0.967 0.002 0.621 0.004 GGL 0.948 0.001 0.298 0.021 0.971 0.002 0.635 0.004 S = 0.050 BPUGM 0.804 0.000 0.195 0.002 0.915 0.000 0.555 0.000 GemBag 0.851 0.000 0.033 0.000 0.999 0.000 0.516 0.000 FGL 0.831 0.001 0.110 0.008 0.962 0.003 0.536 0.001 GGL 0.828 0.001 0.129 0.007 0.955 0.003 0.542 0.001 𝑝 = 50 S = 0.010 BPUGM 0.948 0.000 0.790 0.051 0.949 0.000 0.869 0.013 GemBag 0.999 0.000 0.623 0.150 1.000 0.000 0.811 0.037 FGL 0.998 0.000 0.353 0.190 0.998 0.000 0.676 0.047 GGL 0.997 0.000 0.490 0.194 0.997 0.000 0.744 0.048 S = 0.0025 BPUGM 0.948 0.000 0.596 0.018 0.949 0.000 0.773 0.004 GemBag 0.998 0.000 0.362 0.023 1.000 0.000 0.681 0.006 FGL 0.995 0.000 0.340 0.075 0.997 0.000 0.668 0.018 GGL 0.995 0.000 0.380 0.061 0.997 0.000 0.688 0.015 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.988 0.000 0.140 0.015 0.995 0.000 0.568 0.003 GGL 0.991 0.000 0.271 0.033 0.997 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.520 0.008 0.966 0.000 0.743 0.002 GemBag 0.999 0.000 0.378 0.013 1.000 0.000 0.689 0.003 FGL 0.999 0.000 0.223 0.047 1.000 0.000 0.611 0.012 GGL 0.999 0.000 0.277 0.043 1.000 0.000 0.638 0.011 S = 0.0025 BPUGM 0.965 0.000 0.546 0.004 0.966 0.000 0.756 0.001 GemBag 0.998 0.000 0.368 0.006 1.000 0.000 0.684 0.002 FGL 0.997 0.000 0.285 0.031 0.999 0.000 0.642 0.008 GGL 0.997 0.000 0.374 0.024 0.999 0.000 0.687 0.006 S = 0.0050 BPUGM 0.959 0.000 0.235 0.001 0.966 0.000 0.600 0.000 GemBag 0.991 0.000 0.123 0.000 1.000 0.000 0.561 0.000 FGL 0.989 0.000 0.100 0.004 0.998 0.000 0.549 0.001 GGL 0.990 0.000 0.127 0.003 0.998 0.000 0.563 0.001 17 T able 2: Accuracy , sensitivity , specificity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 2: { 𝐺 ( 0 ) = 𝐺 ( 1 ) } ≠ { 𝐺 ( 2 ) = 𝐺 ( 3 ) } A ccuracy Sensitivity Specificity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.913 0.000 0.837 0.025 0.914 0.000 0.875 0.006 GemBag 0.996 0.000 0.567 0.052 1.000 0.000 0.783 0.013 FGL 0.974 0.002 0.708 0.069 0.976 0.002 0.842 0.016 GGL 0.968 0.002 0.708 0.084 0.970 0.002 0.839 0.019 S = 0.025 BPUGM 0.894 0.000 0.399 0.006 0.914 0.000 0.657 0.001 GemBag 0.967 0.000 0.174 0.006 1.000 0.000 0.587 0.002 FGL 0.944 0.001 0.254 0.023 0.972 0.002 0.613 0.004 GGL 0.946 0.001 0.266 0.017 0.974 0.002 0.620 0.003 S = 0.050 BPUGM 0.804 0.000 0.213 0.001 0.915 0.000 0.564 0.000 GemBag 0.849 0.000 0.048 0.000 0.999 0.000 0.524 0.000 FGL 0.845 0.000 0.083 0.005 0.987 0.001 0.535 0.001 GGL 0.837 0.001 0.097 0.006 0.976 0.001 0.537 0.001 𝑝 = 50 S = 0.0010 BPUGM 0.948 0.000 0.815 0.069 0.948 0.000 0.881 0.017 GemBag 0.999 0.000 0.505 0.220 1.000 0.000 0.752 0.055 FGL 0.998 0.000 0.225 0.158 0.998 0.000 0.612 0.039 GGL 0.995 0.000 0.415 0.197 0.995 0.000 0.705 0.049 S = 0.0025 BPUGM 0.948 0.000 0.601 0.014 0.949 0.000 0.775 0.004 GemBag 0.998 0.000 0.383 0.017 1.000 0.000 0.691 0.004 FGL 0.996 0.000 0.531 0.099 0.998 0.000 0.765 0.025 GGL 0.995 0.000 0.446 0.079 0.997 0.000 0.722 0.020 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.989 0.000 0.136 0.014 0.996 0.000 0.566 0.003 GGL 0.992 0.000 0.270 0.034 0.998 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.549 0.008 0.965 0.000 0.757 0.002 GemBag 0.999 0.000 0.415 0.014 1.000 0.000 0.707 0.003 FGL 0.999 0.000 0.528 0.038 0.999 0.000 0.764 0.009 GGL 0.999 0.000 0.507 0.040 0.999 0.000 0.753 0.010 S = 0.0025 BPUGM 0.965 0.000 0.594 0.004 0.965 0.000 0.780 0.001 GemBag 0.998 0.000 0.412 0.008 1.000 0.000 0.706 0.002 FGL 0.998 0.000 0.423 0.044 0.999 0.000 0.711 0.011 GGL 0.998 0.000 0.413 0.045 0.999 0.000 0.706 0.011 S = 0.0050 BPUGM 0.959 0.000 0.269 0.001 0.966 0.000 0.617 0.000 GemBag 0.991 0.000 0.147 0.001 1.000 0.000 0.573 0.000 FGL 0.990 0.000 0.240 0.012 0.997 0.000 0.619 0.003 GGL 0.989 0.000 0.186 0.005 0.997 0.000 0.592 0.001 18 T able 3: Accuracy , sensitivity , specificity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 3: { 𝐺 ( 0 ) = 𝐺 ( 1 ) = 𝐺 ( 2 ) } ≠ { 𝐺 ( 3 ) } A ccuracy Sensitivity Specificity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.913 0.000 0.843 0.039 0.913 0.000 0.878 0.010 GemBag 0.998 0.000 0.677 0.091 1.000 0.000 0.839 0.023 FGL 0.990 0.000 0.698 0.167 0.991 0.000 0.844 0.042 GGL 0.985 0.001 0.703 0.164 0.986 0.001 0.844 0.040 S = 0.025 BPUGM 0.892 0.000 0.297 0.005 0.915 0.000 0.606 0.001 GemBag 0.967 0.000 0.139 0.004 0.999 0.000 0.569 0.001 FGL 0.947 0.001 0.200 0.011 0.976 0.001 0.588 0.002 GGL 0.948 0.001 0.198 0.011 0.978 0.001 0.588 0.002 S = 0.050 BPUGM 0.797 0.000 0.156 0.001 0.914 0.000 0.535 0.000 GemBag 0.849 0.000 0.023 0.000 1.000 0.000 0.511 0.000 FGL 0.832 0.001 0.076 0.005 0.970 0.002 0.523 0.000 GGL 0.826 0.001 0.085 0.006 0.960 0.002 0.523 0.000 𝑝 = 50 S = 0.0010 BPUGM 0.948 0.000 0.740 0.194 0.948 0.000 0.844 0.048 GemBag 0.999 0.000 0.260 0.194 1.000 0.000 0.630 0.049 FGL 0.999 0.000 0.040 0.039 1.000 0.000 0.520 0.010 GGL 0.997 0.000 0.140 0.122 0.997 0.000 0.569 0.030 S = 0.0025 BPUGM 0.948 0.000 0.626 0.017 0.949 0.000 0.788 0.004 GemBag 0.998 0.000 0.394 0.023 1.000 0.000 0.697 0.006 FGL 0.997 0.000 0.440 0.088 0.998 0.000 0.719 0.022 GGL 0.995 0.000 0.438 0.074 0.997 0.000 0.717 0.018 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.989 0.000 0.136 0.014 0.996 0.000 0.566 0.003 GGL 0.992 0.000 0.270 0.034 0.998 0.000 0.634 0.008 𝑝 = 100 S = 0.010 BPUGM 0.965 0.000 0.515 0.008 0.965 0.000 0.740 0.002 GemBag 0.999 0.000 0.397 0.013 1.000 0.000 0.698 0.003 FGL 0.999 0.000 0.485 0.039 0.999 0.000 0.742 0.010 GGL 0.998 0.000 0.467 0.037 0.999 0.000 0.733 0.009 S = 0.0025 BPUGM 0.965 0.000 0.545 0.005 0.966 0.000 0.755 0.001 GemBag 0.998 0.000 0.393 0.008 1.000 0.000 0.696 0.002 FGL 0.998 0.000 0.419 0.064 0.999 0.000 0.709 0.016 GGL 0.998 0.000 0.405 0.049 0.999 0.000 0.702 0.012 S = 0.0050 BPUGM 0.963 0.000 0.492 0.001 0.966 0.000 0.729 0.000 GemBag 0.996 0.000 0.354 0.003 1.000 0.000 0.677 0.001 FGL 0.992 0.000 0.493 0.046 0.995 0.000 0.744 0.011 GGL 0.994 0.000 0.525 0.012 0.997 0.000 0.761 0.003 19 T able 4: Accuracy , sensitivity , specificity , balanced accuracy and A UC o v er 100 datasets, K=4, N=50, Scenar io 4: Same g raph A ccuracy Sensitivity Specificity A UC Mean SE Mean SE Mean SE Mean SE 𝑝 = 20 S = 0.010 BPUGM 0.915 0.000 0.840 0.020 0.915 0.000 0.878 0.005 GemBag 0.996 0.000 0.642 0.052 1.000 0.000 0.821 0.013 FGL 0.966 0.001 0.575 0.092 0.970 0.001 0.773 0.020 GGL 0.977 0.001 0.772 0.078 0.980 0.001 0.876 0.019 S = 0.025 BPUGM 0.898 0.000 0.502 0.007 0.915 0.000 0.709 0.002 GemBag 0.969 0.000 0.264 0.008 1.000 0.000 0.632 0.002 FGL 0.907 0.003 0.420 0.026 0.928 0.003 0.674 0.004 GGL 0.923 0.003 0.470 0.023 0.943 0.004 0.706 0.005 S = 0.050 BPUGM 0.815 0.000 0.286 0.002 0.919 0.000 0.602 0.000 GemBag 0.852 0.000 0.095 0.001 1.000 0.000 0.547 0.000 FGL 0.792 0.001 0.319 0.009 0.884 0.002 0.602 0.001 GGL 0.799 0.001 0.312 0.009 0.894 0.003 0.603 0.001 𝑝 = 50 S = 0.0010 BPUGM 0.950 0.000 0.812 0.038 0.950 0.000 0.881 0.009 GemBag 0.999 0.000 0.693 0.089 1.000 0.000 0.846 0.022 FGL 0.998 0.000 0.193 0.057 0.999 0.000 0.596 0.014 GGL 0.999 0.000 0.590 0.212 0.999 0.000 0.795 0.053 S = 0.0025 BPUGM 0.949 0.000 0.603 0.015 0.950 0.000 0.777 0.004 GemBag 0.998 0.000 0.384 0.018 1.000 0.000 0.692 0.004 FGL 0.994 0.000 0.193 0.032 0.997 0.000 0.595 0.008 GGL 0.997 0.000 0.390 0.081 0.999 0.000 0.694 0.020 S = 0.0050 BPUGM 0.946 0.000 0.429 0.005 0.950 0.000 0.690 0.001 GemBag 0.993 0.000 0.240 0.007 1.000 0.000 0.620 0.002 FGL 0.988 0.000 0.143 0.015 0.995 0.000 0.569 0.003 GGL 0.992 0.000 0.275 0.033 0.997 0.000 0.636 0.008 𝑝 = 100 S = 0.010 BPUGM 0.966 0.000 0.609 0.010 0.966 0.000 0.787 0.002 GemBag 0.999 0.000 0.466 0.012 1.000 0.000 0.733 0.003 FGL 0.998 0.000 0.127 0.015 0.999 0.000 0.563 0.004 GGL 0.999 0.000 0.444 0.069 1.000 0.000 0.722 0.017 S = 0.0025 BPUGM 0.965 0.000 0.591 0.003 0.966 0.000 0.779 0.001 GemBag 0.998 0.000 0.432 0.005 1.000 0.000 0.716 0.001 FGL 0.995 0.000 0.221 0.019 0.997 0.000 0.609 0.005 GGL 0.997 0.000 0.530 0.046 0.999 0.000 0.764 0.012 S = 0.0050 BPUGM 0.960 0.000 0.313 0.001 0.966 0.000 0.640 0.000 GemBag 0.991 0.000 0.190 0.001 1.000 0.000 0.595 0.000 FGL 0.986 0.000 0.146 0.003 0.995 0.000 0.571 0.001 GGL 0.990 0.000 0.264 0.009 0.998 0.000 0.631 0.002 20

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment