A survey of discrete methods in (algebraic) statistics for networks

A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS F OR NETW ORKS SONJA PETR OVI ´ C Abstract. Sampling algorithms, hypergraph degree sequences, and p olytop es pla y a crucial role in statistical analysis of netw ork data. This article oﬀers a brief ov erview of op en problems in this area of discrete mathematics from the point of view of a particular family of statistical mo dels for net works called exp onential random graph mo dels. The problems and underlying constructions are also related to w ell-kno wn concepts in comm utativ e algebra and graph-theoretic concepts in computer science. W e outline a few lines of recent work that highlight the natural connection betw een these ﬁelds and unify them into some open problems. While these problems are often relev ant in discrete mathematics in their own righ t, the emphasis here is on statistical relev ance with the hop e that these lines of researc h do not remain disjoint. Suggested speciﬁc op en problems and general research questions should adv ance algebraic statistics theory as well as applied statistical to ols for rigorous statistical analysis of netw orks. 1. Introduction The developmen t of a ric h literature on netw orks in the past decade has left ample opp ortunities for complemen tary mathematically rigorous results that should serv e as the foundation for statistical mo deling and fast computation of netw ork features. In this review, we focus on exp onential families for random graphs, mo dels ov er equiv alence classes of graphs summarized by a selected set of graph or netw ork summary statistics. Here, the word ‘mo del’ is used in the statistical sense, and the terms ‘graph’ and ‘net w ork’ are used interc hangeably . In tro ductory paragraphs of Sections 3 and 4 oﬀer an answer to the following question: Wh y consider statistics when studying random graphs and net works? A summarized answer can be phrased as follo ws: how a netw ork is generated is crucial to prop erly calculate statistical net work prop erties and sp ecify the distributions b eing sampled. This includes prop erties that dep end on b oth degrees of no des in the netw ork, higher-order degree correlations, or other, possibly global, summary statistics; cf. an excellent example and the concluding paragraph in [Jac08, Section 4.2.1]. The reader may w onder why the fo cus on exp onential families for random graphs, or ER GMs. While b eing able to write a random graph mo del as an exp onential family is not impressive in and of itself, understanding the geometry , algebra, and the discrete structures supp orting the mo del oﬀer v arious statistical insights that are the theme of this c hapter. A recent computer science article [BDE] sho ws that ERGMs are ‘hard’, that is, their normalizing constants are incomputable in general. While this imp ortant result ab out inapproximabilit y in p olynomial time formalizes the inheren t complexit y of this family of mo dels, as a statistical model family they are still very broad and ric h and ha v e desirable prop erties. As such, they in the v ery least oﬀer the theoretical foundation for studying random graphs and netw orks, and a b edro ck for algorithmic exploratory analysis of sampling distributions of v arious graph summary statistics. The statistics literature on understanding and developing netw ork mo dels is ever-gro wing; a par- tial list of references is oﬀered b elow in con text. Within this realm, though, the basic problem of Date : August 2015; revised January 2016. Key wor ds and phr ases. random graphs, netw ork mo dels, alternating cone, balanced graphs, balanced hypergraphs, v ertex bi-coloring, Marko v bases, algebraic statistics, exp onen tial families. 1 2 SONJA PETRO VI ´ C establishing rigorous pro cedures for statistical inference is still a challenge, due to the v arying com- plexit y of the mo dels and types of prop erties of data they capture, as w ell as netw orks being a no v el data type in terms of traditional statistics. The motiv ating problem for this discussion is thus a basic one of statistic al infer enc e , tightly related to several fundamental tasks required for statistical analysis of (any kind of ) data: param- eter estimation 1 , sampling from the distributions in the model, testing model/data ﬁt, and mo del selection. The broad goal of statistical inference is to decide, with a high degree of conﬁdence, whether an observed data sample x = x 1 , . . . , x N can b e regarded as a draw from a distribution p θ 0 ∈ M coming from a candidate statistical mo del M , sp eciﬁed b y the unknown parameter θ 0 . Of course, this entails t wo crucial steps: 1) estimation: use the observed data x to pro duce an optimal estimate ˆ θ = ˆ θ ( x ) of θ 0 ; and 2) go o dness-of-ﬁt testing: assess whether p ˆ θ can b e consid- ered a satisfactory generative mo del for the observed data x . Surprisingly , these fundamen tal tasks p ose a family of op en problems, in particular for discrete sparse small-sample data suc h as net w orks [KK15, HGH08, HL81, Hab81, Agr92, YLZ, CKHG15, CDS11]. These problems and their natural connection to discrete structures are the fo cus of this short ov erview. In the remainder of the c hapter, w e will use the β model for random graph, deﬁned in Example 2.1, as a running example to illustrate the structure and implications of v arious statistical mo deling questions that discrete mathematics to ols can answer. 2. Preliminaries W e begin with some tec hnical preliminaries on linear ER GMs and statistical considerations ab out these mo dels. A statistic al mo del M is a family of probabilit y distributions indexed by a set of parameters Θ ⊂ R n . In exp onen tial random graph mo dels, or ERGMs for short, one ﬁrst selects the net w ork characteristic T of interest in a particular problem. This selection is done so that T represen ts some interpretable and meaningful summary statistic of a net work. The resulting mo del is then the collection of probabilit y measures M = { p θ : θ ∈ Θ } , indexed b y points in Θ ⊆ R n suc h that, for any θ ∈ Θ, the probabilit y of observing a giv en net w ork 2 G = g tak es on the exp onential form (2.1) p θ ( g ) = exp {h T ( g ) , θ i − ψ ( θ ) } , where ψ ( θ ) := P g exp {h T ( g ) , θ i} is a normalizing function called the log-partition function, and T ( g ) is the v ector of minimal suﬃcien t statistics for the model. The exponential form of the probabilities ab o ve is a central theme in statistical theory , and statistical models of this form, known as exponential families, are kno wn to exhibit optimal statistical p erformance and app ear with a growing app eal in the machine learning communit y as well. Consisten t with the running examples, we are considering line ar exp onen tial families for now, that is, h T ( g ) , θ i is a linear map of the state space. General exp onen tial families can use non-linear maps, of course. F or a more detailed statistical treatment of this family of mo dels the reader is referred to classical references [Bro86], [BN14] and [LC83, LC98], as w ell as a more recen t b o ok [BD15]. ER GMs are, eﬀectively , mo dels o v er e quivalenc e classes of networks , where tw o net w orks g 1 and g 2 are regarded as probabilistically equiv alent whenever T ( g 1 ) = T ( g 2 ). Conditioning on the v alues of T permits the reduction of the data through suﬃcient statistics and th us eliminates n uisance parameters [Agr92], which can b e v ery helpful in applications. 1 Sometimes called ‘ﬁtting’ in computer science or statistical physics literature, but it is diﬀerent from mo del ﬁt testing. 2 As is standard in statistics, G represents a random v ariable and the lo wercase g its realization. A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 3 Example 2.1 (The β model for graphs) . The β mo del for graphs is a well-kno w n statistical mo del for random undirected graphs. In its original form, the β mo del considers simple graphs, that is, do es not allo w loops or m ultiple edges; but the general version allows edges to app ear with b ounded m ultiplicity . It essen tially assigns parameters β 1 , . . . , β n to the n vertices in the net w ork that measure their ‘friendliness’, or prop ensity to attract edges. The edges are then assumed to b e indep enden t and appear with probabilit y prop ortional to the product of the parameters of its v ertices; in sym b ols: P r ob ( G = g ) ∝ Y { i,j }∈ E ( g ) β i β j = n Y i =1 β deg( i ) i , where ‘ ∝ ’ refers to the fact that the resulting quantit y should b e normalized b y the log-partition function. In this mo del, the suﬃcien t statistics v ector T ( g ) is the degree sequence of the graph g . Indeed, the mo del is usually given directly in the exp onential family form (2.1) with T ( g ) = ( d 1 , . . . , d n ) b eing the degree sequence of g : (2.2) p β ( g ) = exp { n X i =1 d i β i − ψ ( β ) } . F or a brief history of the mo del, see for example the in tro duction and references given in [RPF13], whic h deals with the general version of the β mo del and the sp ecial case of the simple graph version studied in [CDS11]. The β mo del (2.2) is indeed one of the simplest interpretable undirected random graph mo dels of relev ance in statistical applications. In fact, the study of the degree sequences and, in particular, of the degree distributions of real netw orks is a classical topic in netw ork analysis, which has received extensiv e treatmen t in the statistics literature (starting with, e.g., [HL81], [FW81], [FMW85]), the ph ysics literature (e.g. [NSW01], [New03], [PN04], [W AD09]) as well as in the so cial netw ork litera- ture (e.g., [RPKL07], [Goo07], [HM07], and references therein). See also the monograph b y [GZF A09] and the b o oks b y [Kol09] and [New10]. Its known prop erties can b e used as a blueprint for consid- eration of more complex mo dels. Sp eciﬁcally , it is a v ery nice and w ell-understo o d ERGM from the follo wing p oin ts of view that simultaneously serv e as an outline of the remainder of this c hapter: (1) Exact inference for mo del ﬁtting: exact testing is used for testing go o dness of ﬁt of a mo del when applicability of standard statistical asymptotic metho ds is unclear. Typically , this is the case for small sample sizes N , and also largely remains a problem for netw ork data in particular. These topics are the conten t of Section 3. (a) F or the β mo del, exact testing dep ends on sampling graphs with a prescrib ed degree sequence, for which there are sev eral algorithms throughout the statistics, computer science and graph theory literatures. The general problem is wide op en for other ERGMs. (b) T o construct an appropriate Mark o v c hain for exact testing, sampling graphs with ﬁxed prop erties should b e done within the con text of a statistical mo del. (c) As we will see, al l linear ERGMs reduce to hypergraph degree sequence problems. (2) Parameter estimation and noisy data: Viewing random graph mo dels through the lens of exp onen tial families oﬀers a w ay to capture the issue of existence of maximum likelihoo d estimators (MLE), a fundamen tal problem that is largely unexplored. The problem and its statistical implications are explained in Section 4. (a) F or the β mo del, the geometry of MLE existence is captured by the con v ex hull of all degree sequences. The extreme p oints of that p olytop e corresp ond to threshold graphs and facet-deﬁning inequalities hav e b een c haracterized. In general, the geometry of MLE existence is captured b y the model p olytop e, whose lattice p oints are realizable suﬃcien t statistics vectors. These p olytop es are not known for other ERGMs. 4 SONJA PETRO VI ´ C (b) In data priv acy considerations, or in dealing with noisy data, observed suﬃcient statistics ma y contain errors and thus in particular may not b e realizable b y any graph. F or the β mo del, it is kno wn when an integer sequence is graphical, i.e., when there exists a graph that realizes the sequence as its degree sequence. The general problem of characterizing realizable suﬃcient statistic vectors is wide open y et crucial for establishing reliabilit y of inference. (c) Eﬃcient facet description of the mo del polytop e for other examples of ER GMs would b e crucial to dev elop appropriate to ols for dealing with noise in the data. (3) The mo del has desirable asymptotic prop erties [CDS11], closely related to graphons [LS06]. As the asymptotics are diﬀeren t than the asymptotics mentioned in item (1) ab ov e, we will not discuss these issues in this Chapter, but instead point the interested reader to [WO] and for a quick list of references to [RPF13, Introduction]. Here are examples of some other mo dels that build on the β graph model and for whic h man y of these questions remain op en. In the interest of space and readabilit y , equations rep eating the structure (2.1) for each case are omitted but can easily b e found in the references. Example 2.2. The joint de gr e e matrix mo del is the ERGM of the form (2.1) where the suﬃcien t statistic T ( g ) is the joint degree matrix, or JDM for short, of g . The JDM coun ts the n um b er of edges betw een no des of given degrees, for all degree pairs. The model, as an ER GM, was introduced to the statistics literature in recen t w ork [SR14b], motiv ated by previous lines of researc h outside of mainstream statistics, such as [EMT15], which oﬀers a fast mixing algorithm and a fan tastic in tro duction and ov erview of the joint degree matrix as a netw ork statistic, and [SP12]. As p oin ted out in [EMT15], ﬁxing the v alue of the JDM of a netw ork is stronger than just ﬁxing the degree sequence, though it uniquely deﬁnes the degree sequence. Example 2.3. The β mo del for hyp er gr aphs was introduced in [SSR + 14]. It lo oks iden tical to the β mo del for graphs, except the edges in the netw ork g can b e of size larger than 2. In particular, there are three v ariants: uniform, la y ered uniform, and general. In the uniform v arian t, hyperedges of ﬁxed size k o ccur indep endently with probabilit y prop ortional to β i 1 · · · β i k for all k -tuples i 1 , . . . , i k of vertices on the graph. The suﬃcient statistic in the this case is the hypergraph k -degree sequence, that is, the num b er of edges of size k to which each v ertex b elongs. In the la yered v ariant, the set of p ossible hyperedges is extended to include edges of all sizes up to and including some ﬁxed size k . The suﬃcien t statistics are thus lay ered hypergraph degree sequences: the n um b er of hyperedges of e ach size , 2 , . . . , k , to which a vertex b elongs. In the general v ariant, all edge sizes are allow ed, and the mo del is more complicated; for details see [SSR + 14]. Example 2.4. The p 1 mo del for r andom gr aphs assigns probabilities to dir e cte d edges in a random graph according to the prop ensity of the no des to receive, send, or recipro cate edges. The suﬃcient statistics vector T ( g ) of the mo del consists of: the in-degrees of all no des, out-degrees of all no des, and the n umber of recipro cated edges, where a recipro cated edge is of the form i ↔ j . The mo del has a history in statistic and applications, see for example [HL81], [FW81] and [PRF10]. The last reference in terprets the mo del through the lens of algebraic statistics and studies its linear ERGM structure in terms of algebra and geometry; cf. [RPF13]. A fairly recen t extended review article on statistical netw ork mo dels [GZF A09] oﬀers man y other examples of models, b eyond linear ERGMs, whose geometric and com binatorial prop erties are y et to b e explored. 3. Sampling algorithms in exact testing T esting the ﬁt of the mo del means deciding whether the mo del provides a plausible probabilistic represen tation of the a v ailable data. Unfortunately , most tests for go o dness of ﬁt are based on large A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 5 sample appro ximations that are not applicable to data suc h as sparse con tingency tables used in, for example, cross-classiﬁcation of categorical v ariables, or data such as random graphs that can b e naturally thought of as con tingency tables through incidence matrices. F undamen tal examples of inadequacy of the use of asymptotic approximations w ere already p oin ted out ov er tw o decades ago in statistics literature [Hab88], [Agr92]: when sample size N is small, or for example if some data table cell en tries are muc h smaller than others, exact testing should b e p erformed instead. Even so, a large part of the literature on, say , netw ork computation and mo deling do es not address the lack of mo del ﬁtting and testing metho dology 3 b ey ond heuristic algorithms [Han03, CKHG15, HGH08]. This is largely due to the inheren t mo del complexit y or degeneracy and the lack of to ols that can handle netw ork mo dels and sparse small-sample data. The role and relev ance of discrete mathematics in exact testing is as follows. In an exact test for a mo del with suﬃcient statistics v ector T , the observed data x with T ( x ) = t obs is compared to a reference set F t obs := { y : T ( y ) = t obs } of data with the same v alue of the suﬃcient statistic. This reference set is also called the ﬁb er in algebraic statistics literature. It is a ﬁb er of the algebraic map that computes the suﬃcien t statistics; in the β mo del case the map is linear, since no de degrees can b e computed as row/column sums of the incidence matrix. Note that, by deﬁnition of suﬃciency , the probabilit y of an y data point in the ﬁb er is determined completely b y the v alue t obs , hence the ﬁb er is a go o d reference set to consider for testing. Using a suitable choice of a go o dness-of-ﬁt statistic that measures, for example, the distance of the data x from the exp ected v alue under the mo del, the simulated data p oin ts in F t obs are compared to the observed data. If a relativ ely large n umber of those are closer to the exp ected v alue than the observ ed data, than one declares p o or mo del ﬁt. These are, roughly , the essen tial elemen ts of an exact test, whose name refers to the exact conditional distribution of data p oints in the ﬁb er F t obs , where we are conditioning on the observ ed v alue t obs of the suﬃcient statistics. T o visualize the elemen ts of an exact conditional test, consider the follo wing example. Supp ose the observ ed graph is g 1 in Figure 1; see [OHT13, Figure 13] and [GPS, Section 4.1]. Given the degree sequence of g 1 , the exp ected v alue under the model is visualized as a weigh ted graph whose edge w eights are the exp ected v alues of the random v ariables representing the edges. This is computed using the MLE of the parameters; see Equation (4.1). One ma y view these exp ected v alues as computing the relativ e frequency of the edges in the set of all graphs whose degree sequence equals that of the graph g 1 . F or example, the edge { 3 , 4 } app ears in 15% of graphs that realize this degree sequence, and since the conditional distribution on the ﬁb ers of the β mo del is uniform, this frequency do esn’t need to b e weigh ted to compute the exp ected v alues. Intuitiv ely , the exact test is meant to assesses the relative closeness of the observed graph to the exp ected graph, and th us it answers the question: is g 1 more like the exp ected graph than the graphs g 2 , g 3 , g 4 from Figure 2 and the other 587 graphs in the ﬁb er? There are tw o imp ortan t p oints to b e made b efore we discuss the key problem. First, choosing a go o d go o dness-of-ﬁt statistic is in general an op en problem, and is diﬃcult in particular in random graph mo dels for which edges are not indep enden t random v ariables. F or a discussion of the sp ecial case of 2-w a y contingency table mo dels, see [Agr92, § 3.1]. Second, a goo d example of a go o dness- of-ﬁt statistics to use for random graphs with independent edges is the chi-square statistic, whic h simply computes the square of the weigh ted Euclidean distance in R n b et ween the observed graph 3 The reader is urged to distinguish b etw een this statistical questions of mo del ﬁtting (‘is the mo del c orr e ct ?’) and parameter estimation (’assuming the mo del is correct, ﬁnd the parameter v alue(s) that best ﬁt the data’). The latter is often alluded to within the context of ‘mo del v alidation’ in the computer science literature. 6 SONJA PETRO VI ´ C 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Figure 1. Observ ed graph g 1 (left) with degree sequence (2,1,1,2,3,1,2,1), and the exp ected probability-w eigh ted graph under the β model. 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Figure 2. Three out of the 590 other graphs in the ﬁb er F (2 , 1 , 1 , 2 , 3 , 1 , 2 , 1) : g 2 (left), g 3 (cen ter), g 4 . and the exp ected graph (or MLE). It is deﬁned as: χ 2 ( g ) := X i,j ( ˆ g ij − g ij ) 2 ˆ g ij , where g ij is 1 if the edge { i, j } is presen t in g and 0 otherwise, and ˆ g ij ∈ (0 , 1) is the exp ected v alue of the random v ariable representing the edge { i, j } under the MLE, as illustrated in Figure 1. The v alues of the chi-square statistic for the four graphs ab o ve are: χ 2 ( g 1 ) = 19 . 49 , χ 2 ( g 2 ) = 17 . 42 , χ 2 ( g 3 ) = 21 . 56 , χ 2 ( g 4 ) = 26 . 53 . T o determine whether the v alue χ 2 ( g 1 ) is to o large, one should en umerate the ﬁb er and compute the v alue of the statistic for all 591 graphs in it. As it is computationally prohibitive to en umerate the ﬁb er except in very small cases, one tries to instead sample from the ﬁber and estimate the n umber of data points that are more extreme than the observ ed, with resp ect to the chosen go o dness-of-ﬁt statistic. Th us the k ey problem in the exact testing pro cedure is that of exploring the ﬁber through sampling from the conditional distribution giv en t obs . Problem 3.1 (General Problem 1) . Fix a r andom gr aph mo del M with suﬃcient statistics T . Develop an eﬃcient algorithm that samples fr om the sp ac e of gr aphs with arbitr ary ﬁxe d value of T ( g ) = t obs . A ﬁb er sampling algorithm can b e a direct sampler or it can b e based on a Mark o v chain and used through the Metrop olis algorithm, which is a standard algorithm and can b e found, for example, in [R C99]. The algebraic statistics literature fo cuses on the latter. The reason for this is the natural connection to algebraic geometry . Namely , the theoretical feasibility of constructing Marko v c hains to solv e General Problem 1 for arbitrary line ar ERGMs, as well as ﬁniteness of the complexity of steps needed to p erform the Marko v c hain random w alk on ﬁb ers of every log-linear mo del, form a cornerstone of traditional algebraic statistics through a fundamen tal theorem from [DS98]; see also [DSS09, Theorem 1.3.6]. Its applied solution and relev ance in exact testing for any linear ERGM A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 7 hinges up on the developmen t of eﬃcien t sampling algorithms for graphs and h ypergraphs, as outlined here. The steps or mov es needed to p erform the Mark ov c hain ab o ve for a a given mo del are called Markov moves . A Markov b asis for a mo del is a collection of mov es guaranteed to connect each ﬁb er of that mo del. A mov e is an element of the Gr aver b asis of the mo del if it contains no prop er sub-mo ves; see Figure 3 for a running example. R emark 3.2 . In order to b e of statistical relev ance, and make an y inference from the observed data, Problem 3.1 should b e solved within the con text of a statistical mo del. In other w ords, when a net work feature, or a set thereof, is ﬁxed in sampling, it is understo o d that the feature can serve as a suﬃcient statistic of some mo del; the statisticians then ask to identify the mo del and study its v arious prop erties in order to determine feasibility and reliability of inference and mo del ﬁtting. Example 3.3. F or the β model for random graphs from Example 2.1, the reference set F t obs is the set of graphs whose degree sequence d equals the degree sequence t obs := d ( g ) of the observ ed graph g . It is easy to see that in the β mo del the conditional probabilit y distribution of graphs giv en a ﬁxed observ ed degree sequence is uniform; in general, of course, this distribution dep ends on the mo del. Therefore, in the β mo del, as w ell as others we discuss in this chapter as the reader may easily v erify , graphs should b e sampled uniformly , b ecause conditioning on the suﬃcient statistics results in the uniform distribution on the ﬁb er. Of course, Problem 3.1 has b een solv ed for the β mo del for graphs. Indeed, v arious algorithms for constructing random graphs with a ﬁxed degree sequence - in other w ords, sampling the ﬁbers of the β mo del - hav e b een prop osed ov er the years, with or without the context of the mo del, and the reader is probably familiar with the graph theory literature on this topic, most relev an t part of whic h is cited throughout this Chapter. On the statistics side, [BD10] deriv e a sequential importance sampling algorithm for this mo del. A diﬀerent random generation algorithm for simple connected graphs can b e found in [VL05]. As men tioned ab o ve, one could think of solving the sampling problem in diﬀeren t w a ys: b y an algorithm that constructs, uniformly at random, a graph that realizes the giv en degree sequence (where care should b e tak en that such an algorithm can, in fact, disco v er the entire ﬁb er!); or b y ﬁnding a set of mov es, sometimes also called ‘edge sw aps’ in graph theory , that can serve as a basis for a Mark o v chain Monte Carlo (MCMC) sampling algorithm on the ﬁber. Here the word ‘basis’ is used in a technical sense but can also b e understo o d in tuitiv ely: the mov es should guarantee to disco ver the en tire ﬁber and deﬁne a symmetric, ap erio dic c hain on any given ﬁb er of the model. The Marko v chain approac h oﬀers a nice alternativ e and an interesting connection to other areas of mathematics. As such, it is largely the fo cus of the remainder of this section. 3.1. Ingredien ts for a Mark o v c hain ﬁb er sampler. In order to use the MCMC metho d to sample the ﬁb er of an observ ed graph, three imp ortan t questions should b e answ ered. First, a Markov b asis should b e sp eciﬁed: a set of mo ves that the algorithm uses to go from an y graph to any other graph in an y ﬁb er. Second, the stationary distribution of the prop osed Marko v c hain should b e the correct conditional distribution; recall that in the β mo del example, it should b e uniform. Third, mixing time considerations cannot b e a v oided in this approac h - one can easily prop ose seemingly straigh tforward Marko v c hains that hav e extremely unreasonable b eha vior in that they will take a v ery long time to conv erge or to explore the ﬁb er. One w ay to answ er the ﬁrst tw o questions simultaneously is through the use of algebraic statistics. The third question on mixing time is discussed subsequen tly . There is an algebraic geometry or, if you will, comm utativ e algebra equiv alent to a Mark o v ba- sis for a linear ER GM stemming from the fundamen tal theorem from [DS98]. Namely , ev ery suc h 8 SONJA PETRO VI ´ C statistical mo del lies in a natural algebraic v ariety: A variety is the set of all solutions to a sys- tem of p olynomial equations, and those enco ding parametric statistical mo dels can equiv alen tly b e describ ed parametrically – there is an algebra-geometry duality here; see the excellent introductory explanation in [SKKT00, § 2.4 and 2.5]. F o cusing on the algebra, the parametrization is enco ded via a homomorphism b etw een tw o p olynomial rings, sa y φ : C [ E ] → C [ θ 1 , . . . , θ n ], where E is some set of v ariables, and θ i are parameters. The indeterminates in the tw o p olynomial rings carry a sp ecial meaning in algebraic statistics: E represent random v ariables of interest, while θ = ( θ 1 , . . . , θ n ) is the vector of unkno wn mo del parameters. F or example, in the case of ER GMs whose edges are indep enden t random v ariables, E = E ( G ) are edges of the random graph G . The k ernel of the map φ is an ideal in the p olynomial ring C [ E ], and this ideal is the deﬁning ideal of the algebraic v ariety whose real positive part con tains the statistical model. Readers in terested in further algebraic details of this corresp ondence ma y consult [DSS09, page 25]. Example 3.4. In the speciﬁc case of the β mo del for graphs with no statistical sampling constrain ts, the graph G is the complete graph b ecause all edges are allow ed to app ear in the random graph realization g . The homomorphism φ maps eac h edge e ij to the pro duct of the parameters β i β j . F or example, consider the random graph β mo del on n = 8 no des. The co ordinate map corresponding to the parameterization of the mo del is φ K 8 : C [ e 12 , e 13 , . . . , e 78 ] → C [ β 1 , . . . , β 8 ] e ij 7→ β i β j . Note that here we ha v e forgotten the exponential and the normalizing constant b ecause they do not aﬀect the algebraic structure of the parametrization, so b y abuse of notation w e drop the ‘exp’ from the mo del sp eciﬁcation. An example of an equation that v anishes on the model is e 17 e 24 e 34 e 58 − e 14 e 23 e 45 e 78 . The algebraic relation holds because φ K 8 ( e 17 e 24 e 34 e 58 ) = φ K 8 ( e 14 e 23 e 45 e 78 ) since, of course, the tw o graphs on edge sets { 17 , 24 , 34 , 58 } and { 14 , 23 , 45 , 78 } ha v e the same degree sequence, whic h is the suﬃcien t statistic for the mo del; refer to Figure 3. Note that the equation v anishing on the mo del is equiv alen t to s a ying that equation is in the kernel of the co ordinate map: e 17 e 24 e 34 e 58 − e 14 e 23 e 45 e 78 ∈ k er φ K 8 . T o summarize: probability distributions in the mo del corresp ond to real p ositive p oints on an algebraic v ariety , where the parametric description of the mo del gives rise to the parametrization of the v ariety . This v ariety is deﬁned implicitly b y the equations that v anish on the mo del. Co ordinates of the p olynomial ring that is the domain of φ corresp ond to the joint probabilities of random v ariables in the mo del. Equations that v anish on the v ariety represent algebraic relations among the join t probabilities. T o visualize what these relations mean, consider Q { i,j }∈ E + e ij − Q { i,j }∈ E − e ij , for t w o sets of edges E + and E − . This is an equation that v anishes for all points on the model (i.e., a relation on the edges) if and only if the graphs whose edges sets are E + and E − ha ve the same probabilit y under the mo del. In turn this happ ens if and only if the tw o subgraphs ha ve the same v alues of the suﬃcien t statistics vector. In the linear ER GM case, the v ariety has a sp ecial structure called toric, due to linearity , and as a consequence the deﬁning equations are all binomials. The crucial observ ation is then that suc h binomials correspond to mo ves on the ﬁbers of the mo del, and in fact the moves c omprise a Markov b asis if and only if the binomials suﬃc e to gener ate al l e quations that vanish on the variety [DS98]. This fact is sometimes called the fundamen tal Theorem of Marko v Bases. This corresp ondence should b e interpreted in the follo wing wa y: h 1 − h 2 is a deﬁning equation of the v ariet y corresp onding to the mo del if and only if replacing the subgraph h 2 ⊂ g b y the subgraph h 1 ⊂ g is a mo v e on the ﬁb er of the mo del, meaning that the mov e does not c hange the v alue of the suﬃcient statistics vector, as illustrated in Figure 3. In fact, the corresp ondence applies to general log-linear mo dels for discrete data, not only net w orks, and is one of the cornerstones of algebraic statistics. In the general case the mo ves h 1 − h 2 need not A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 9 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 0.34 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.34 0.54 0.15 0.34 0.15 0.15 0.29 0.06 0.15 0.06 0.54 0.15 0.34 0.15 0.29 0.54 0.29 0.15 0.06 0.15 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Figure 3. A Marko v mov e under the β mo del for the graph g = g 1 from Figure 1: the graph g 1 with edges h 2 to b e remov ed in blue (left), and the resulting graph g 2 from Figure 2 after h 2 is replaced b y edges in h 1 whic h is sho wn in red. Note that this mo ve h 1 − h 2 is not a minimal one, cf. Figure 4. Instead, it is an example of a Gra ver mov e. If structural zeros were present in the mo del, mov es like this one ma y b e needed for connecting the ﬁb er. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Figure 4. A walk using only quadratic mov es from g 1 to g 2 on the ﬁb er F (2 , 1 , 1 , 2 , 3 , 1 , 2 , 1) . b e squarefree and need not represen t graphs; instead each h 1 and h 2 can be an arbitrary monomial in the indeterminates E and this machinery w orks for general con tingency table mo dels. Simple graphs are just 0/1 contingency tables, while graphs can in general b e represented by non-negative tables, where eac h p ositive entry indicates presence of an edge, with m ultiplicity if the mo del allows for it. Giv en the very extensiv e literature on this topic, for example summarized in the recen t b o ok [AHT12] and references giv en therein, let us not sp end more time studying details of this equiv alence here. Instead, we pause to note the imp ortan t consequences: By a fundamental theorem in commutativ e algebra called the Hilb ert basis theorem, a Marko v basis exists and is ﬁnite for every linear ERGM. F urthermore, using the mo ves from a Marko v basis given b y the algebraic corresp ondence theorem to generate mo ves in the Metrop olis algorithm automatic al ly conv erges to the correct stationary distribution on the ﬁb er. Note that, b y deﬁnition, any set of mo v es containing a minimal Mark o v basis has the same prop erty . The β mo del for graphs from Example 3.3 someho w seems to ha ve a parallel history in v arious literature, as reﬂected up on in the following Remark. R emark 3.5 . In comm utativ e algebra, Mark o v bases for the β mo del exist under the name of gener a- tors of toric ide als of gr aphs . This name comes from the fact that the mo v es from the β model can b e enco ded b y close d alternating walks on gr aphs : sets of edges partitioned into tw o sets, E + and E − , suc h that they constitute a closed w alk when tra versed in alternating order e 1 , . . . , e m with e i ∈ E + for even i and e i ∈ E − for o dd i . F or example, the mo v e corresp onding to remo ving blue and adding red edges in Figure 3 represents one suc h alternating walk, with E + and E − corresp onding to red and blue edges, resp ectively . The reader will note that this mo ve can b e obtained as a sequence of three mov es represented by closed 4-cycles illustrated in Figure 4. Equiv alen tly , the p olynomial e 17 e 24 e 34 e 58 − e 14 e 23 e 45 e 78 lies in the ideal generated by the three p olynomials representing the 4- cycles: e 17 e 24 e 34 e 58 − e 14 e 23 e 45 e 78 = e 24 e 34 ( e 17 e 58 − e 15 e 78 ) + e 45 e 78 ( e 13 e 24 − e 14 e 23 ) + e 24 e 78 ( e 15 e 34 − e 13 e 45 ). The commutativ e algebra comm unit y has known the structure of these generators, without calling them mov es or relating them to MCMC metho ds, for at least 20 years, for example, in [Vil95, Vil00] and [OH99], as well as [Stu96, Chapter 9] and [R TT12]. 10 SONJA PETRO VI ´ C F urther, these results hav e b een known in the algebraic statistics literature for some time as w ell. Notable examples include w orks such as [SS06, Theorem 4] on toric geometry of compatible full conditionals, [Mor13, Theorem 3.2] on relations among conditional probabilities, [SW12] that studies algebra and geometry of statistical ranking mo dels and the ascending model in particular, and [PRF10] that study the structure of Marko v and Grav er bases for the directed mo del p 1 whose restriction (to the part of the random graph consisting only of recipro cated edges) is the β mo del. More recen tly , [OHT13] oﬀer the exact test implementation whose main ingredient is a random generation of Marko v and Gra ver bases elements for the β mo del. In terestingly , bicolored or alternating closed w alks on graphs mak e a later appearance in computer science literature as w ell. [BPS09] deﬁne the alternating c one of a graph to be the set of c haracteristic v ectors of alternating walks on the graph, cf. [BPS07] and the subsequent b o ok chapter [Bha12]. In algebraic terms, this is the cone of exp onen t v ectors of the binomials in the toric ideal of that graph. Results from toric algebra of graphs and the deﬁnition of the alternating cone should easily imply that [BPS09, Conjecture 3.1] is actually true, and can furthermore be generalized for h yp ergraphs appropriately . [Sri] is a very nice set of notes on op en problems related to threshold graphs; cf. Conjecture 4. T o prov e the conjecture, note that a ‘balanced sum of cycles’ is certainly a balanced graph, th us it is a lattice p oint in the cone of balanced subgraphs. Each of these lattice p oints is an integer exp onent of some binomials in the toric ideal of the graph. It is well known that the Gra ver basis is an in tegral generating set of the lattice, in that ev ery lattice p oin t in the cone can b e written as a in tegral sum of the Grav er exponents. Finally , Gra v er bases of these cones for graphs are enco ded by alternating walks by [Vil95], see also recent literature cited abov e. More generally , for hypergraphs, the walks are interpreted as bi-colored [PS14] and again form a Gra v er basis; see [PTV] for the general non-uniform case. By deﬁnition, these are exactly balanced subgraphs. Th us, the exponent vector of a b alanc e d sum of cycles is an inte gr al sum of b alanc e d sub gr aphs represen ted b y Grav er elements. The present author is not aw are of whether the conjecture has b een prov ed in other literature; only the fr actional sum part of the conjecture is prov ed in the works on alternating cones cited ab ov e. Apart from the β mo del for random graphs, sampling algorithms for the space of graphs with ﬁxed prop erties hav e evolv ed disjointly from statistics. A t this p oin t, solutions to the following tw o subproblems of Problem 3.1 are of sp ecial imp ortance: Problem 3.6 (An instance of General Problem 1: p 1 mo del mov es) . Determine a set of moves suﬃcient to c onne ct the sp ac e of gr aphs, c onsisting of b oth dir e cte d and bidir e cte d (r e cipr o c ate d) e dges, with a given dir e cte d de gr e e se quenc e and numb er of r e cipr o c ate d e dges. The problem abov e was solv ed in algebraic statistics [PRF10] and an implemen tation of a dynamic algorithm w as oﬀered in [GPS]; s ee the subsection on mixing times b elow for a related open problem. Problem 3.7 (An instance of General Problem 1: Hyp ergraph β mo del mo v es) . Determine a set of moves suﬃcient to c onne ct the sp ac e of hyp er gr aphs, uniform, layer e d or gener al, with a given de gr e e se quenc e. Of c ourse, one can ‘solv e’ these problems in v arious wa ys. F rom the p oint of view of (algebraic) statistics applied to netw orks, the most interesting types of solutions to Problems 3.7 and 3.1 in general will at le ast addr ess general mo dels with sampling constrain ts and forbidden edges, or derive bases of b ounded complexit y with go o d mixing time prop erties for a sub class of the mo dels. The literature oﬀers some examples of such solutions, but man y problems remain op en. These issues are discussed next. Sampling constraints. If, in applications, the random sampler of the mo del ﬁb ers is allow ed to step through the extended state space of non-simple graphs, where edges can hav e a multiplicit y , A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 11 then the problem is solved. F or example, simple-edge-swap solution to Problem 3.7 follo ws from standard textbo ok results in algebraic geometry; namely , the generating set of the toric ideal for the mo del is kno wn to consist of quadratic binomials, as the algebraic closure of the mo del is deﬁned b y the toric ideal of the second hypersimplex, kno wn to b e generated by quadrics. How ev er, inherent sampling constrain ts can comp ound the issue of computing a set of mov es guaran teed to connect eac h ﬁb er, for example, the sampler may need to step through only simple graphs, or graphs with b ounded m ultiplicit y . In some statistical mo deling considerations other constraints may come in the form of forbidden edges or forbidden neighbors; suc h constrain ts in statistics are called structur al zer os of the mo del. Of course sampling and mo deling constrain ts restrict the ﬁb er, as exp ected, but also make the required mov es more complicated: consider the situation when the edges used in the in termediate steps of the walk in Figure 4, such as the edge { 1 , 3 } for example, are structural zeros. In terestingly , [HTE + 09] c haracterize when a sequence is realizable as a degree sequence of a simple graph such that a given set of edges from an arbitrary no de is a voided. This allo ws the authors to construct a swap-free algorithm for sampling the restricted ﬁb er of the β mo del. If one is interested in sampling a restricted ﬁb er for an y general linear ER GM or any log-linear mo del, [OHT13] and [AHT12] show that one needs a muc h larger set of mo ves to guarantee connectivity , i.e. minimal Mark ov bases may not do the job. F or example, the set known to algebraists to b e inheren tly more complicated, called the Grav er basis of the toric ideal [DSS09, § 1.3], [AHT12, § 4.6] will suﬃce to connect the ﬁb er, ho wev er it is also so complex and large that it is not feasible to compute it for an y reasonable example b ey ond toy-size net w orks; to give an idea of suc h size, for the p 1 mo del this basis can only be directly computed for net works on less than 7 no des. What [OHT13] then do is pro vide an algorithm that constructs one step in the c hain – a Marko v mo v e – in a dynamic fashion and show its p erformance on several small netw ork datasets. Since the mo ves generated b elong to the Grav er basis , the metho d is pro v ably applicable in the case of structural zeros and sampling constrain ts. This metho d do es rely on the understanding of the algebraic machinery underlying the mo del, but appears to be ineﬃcien t in discov ering the ﬁber and the resulting Mark o v c hain has a high rejection rate. In con trast, simulations indicate that the c hain from [GPS, § 4.1] do es not hav e that problem, but there is no formal pro of of its go o d prop erties. This leads us to the next topic regarding sampling ﬁb ers. Mixing time considerations. Considerations of mixing times/conv ergence are crucial in any MCMC sampling sc heme [RC09] [BGJM11]. Notably , using minimal Marko v bases may not pro- duce go o d mixing as seen in some exp erimen ts and p ointed out recently in [Win], whic h formalizes ho w mixing prop erties of Marko v bases dep end on the structure and connectedness in particular of the underlying ﬁb er gr aph . Moreov er, it explains that results on mixing times of chains built on minimal Marko v bases are in general prohibitive to obtain, and suggests instead the construction of related expander graphs for the problem - a worth while pursuit. In earlier w ork, [GPS] construct a c hain on the ﬁb er graph for the p 1 mo del in in such a w a y that it is, in fact, a complete graph, with lo ops having seemingly low probabilit y . In simulations, the c hain b ehav es quite w ell and indicates fast mixing. Nev ertheless, a formal pro of of rapid mixing or any mixing prop erties are left as op en questions, as the authors there did not study the structure of the transition matrix for that c hain. This is where recent adv ances in discrete literature, for some reason disjoin t from statistics liter- ature on netw orks, oﬀer example results for the β mo del that, in this author’s opinion, should be connected to statistics - algebraic statistics in particular - and used as a guide for studying Marko v bases of other linear and general ER GMs. Namely , [KTV99] conjectured that the Mark o v chain based on sw aps mixes rapidly , and the pro ofs for arbitrary regular graphs and half-regular bipar- tite graphs were given in [CDG07] and [MES13], resp ectively , while irregular graphs are studied in [Gre15]. Recently , [EMT15] oﬀer insight to the general case via the study of swap-based chains on the ﬁbers of the JDM model from Example 2.2; they pro v e fast mixing on a subset of the JDM-mo del 12 SONJA PETRO VI ´ C ﬁb ers and oﬀer an excellen t review of related literature, while irreducibilit y of the Mark ov c hain on the JDM realizations w as pro ved in [CDEM15]. T o translate what this means for Mark ov bases, note that in algebraic statistics language, ‘swaps’ of tw o edges at a time are quadratic Marko v mov es, or quadratic generators of the toric ideal, since taking tw o edges at a time and replacing them b y tw o other edges translates to a quadratic monomial in Example 3.4. Crucially , how ev er, the diﬀerence is not just linguistic: in general in algebraic statistics, Mark o v mo ves are sampled from a set of all p ossible mo v es that apply to the mo del, that hav e been computed a priori. Such sampling can clearly lead to more rejections. On the other hand, ‘sw aps’ are constructed from observed edges, so rarely is a non-applicable swap prop osed! In particular, the literature cited here provides a set of examples of r apid ly mixing se quenc es of ﬁb er gr aphs , and thus also hop e that Marko v-mo ve-based constructions, adapted appropriately , can still lead to go o d c hains for ﬁb er exploration and go o dness-of-ﬁt tests; compare to examples rep orted in [GPS] or [OHT13] for an illustration. In the remainder of this subsection, let us clarify how these chains should b e constructed ‘with care’, and why an out-of-the-b ox Mark o v basis may not b e go o d enough. F or example, [Win] observes that enlarging minimal Mark o v bases to, sa y , Grav er bases doesn’t impro ve mixing time. Intuitiv ely , if the c hain can b e constructed so that the ﬁb er graph has a relatively small n um b er of lo ops, then rapid mixing may b e p ossible. Such a ﬁb er graph would resemble the idea of using mov es corresp onding to graph-theoretic ‘swaps’: namely , not attempting to construct or apply Mark ov mov es that work on just some ﬁb er, but, instead, focus on the giv en observ ed graph and its particular ﬁb er. W e refer to such mov es as data-dep endent . Incidentally , the chain on the p 1 mo del used a large set of mo ves that is guaran teed to con tain the Gra ver bases in c ombination with constructing data-dep endent mo ves dynamically , in v olving b oth ‘large’ and ‘small’ steps on the ﬁb ers of the p 1 and β mo dels, sho ws excellen t mixing properties in sim ulations on net w orks of v arious sizes. In fact, the Gra v er mo ve from Figure 3 w as constructed using the particular implemen tation from [GPS], without going through the in termediate steps in Figure 4. In the language of Example 3.5, what we construct is a random w alk on the realizable lattice p oints of the alternating cone using a sup erset of the Gra ver basis for the toric ideal of the complete graph, or for an arbitrary graph whose missing edges represen t structural ze ros in the p 1 mo del. This leads us to the ﬁnal remark ab out Mark o v bases construction, which is intimately related to mixing time, but deserv es to b e singled out. Data-dep enden t samplers. That kno wing an en tire Mark o v basis for a model may still not b e suﬃcien t to run go o dness-of-ﬁt tests eﬃcien tly is a well-kno wn fact in algebraic statistics. Namely , Mark ov bases are data-indep endent [DFR + 08, Problem 5.5]. T o paraphrase [AHT12]: since a Mark ov basis is common for ev ery ﬁb er, the set of mov es connecting the particular ﬁb er of the observed data will usually b e signiﬁcan tly smaller than the en tire basis for the mo del. The subsequen t pap er [Win] oﬀers another intuitiv e explanation of why data-dep endent ﬁb er samplers are necessary: “The conclusions w e dra w [...] are that an adaptation of the Mark o v basis has to tak e place depending on the right-hand side”, meaning that distinct ﬁb ers require diﬀerent adaptations, conﬁrming previous observ ations. T o handle precisely this issue, [Dob12] suggested generating only mov es needed to complete one step of the random walk, and then co vering the ﬁb er b y sets of lo cal mo v es; this w as the ﬁrst scalable metho d for exact tests on tables b ey ond decomp osable mo dels [DS04] and applicable to log-linear mo dels where suﬃcient statistics are table marginals. How ever, man y mo dels of in terest are not captured by marginals [SZP14, HL81, KPP + , GZF A09], lea ving a gaping hole in the metho dology for other categorical mo dels, particularly for sparse data. This leads us to consider more general mo dels for netw orks, with a long-term aim to design a data-dep enden t sampler b eyond the one in [GPS] and [OHT13] for the p 1 and β models. 3.2. General linear ER GMs. The situation ma y no w seem hop eless for general non-degree-based mo dels, b ecause even if w e establish fast mixing for Marko v c hains on β mo del ﬁbers, there are A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 13 inﬁnitely man y other models that are left to b e considered. Ho w ev er, not all is lost: it turns out that as long as the mo del is log-linear and the suﬃcient statistics are giv en by a linear map from the sample space, sampling h yp ergraphs with prescrib ed degree sequences, uniform or not, and with prescrib ed forbid den edges actually suﬃces for performing the exact test. This tak es care of a n umber of interesting mo dels! The answer lies in the structure all log-linear models including linear ERGMs: eac h of these mo dels is enco ded b y a hypergraph, deﬁned in [PS14], Section 3.1 of which pro vides further details and a simple example for the indep endence mo del. Deﬁnition 3.8. The p ar ameter hyp er gr aph H M of any log-linear mo del M for discrete data, and an y linear ERGM in particular, is the hypergraph whose vertex set θ 1 , . . . , θ n corresp onds to the parameters of the statistical model, and whose edge set is determined by joint probabilities of all p ossible states of the random v ariables Z 1 , . . . , Z m . More precisely , { θ j } j ∈ J is an edge in the param- eter h ypergraph if the index set J describes one of the probabilities in the model, that is, there exist v alues i 1 , . . . , i m suc h that P r ob ( Z 1 = i 1 , . . . , Z m = i m ) = Q j ∈ J θ j . Example 3.9. Let M β b e the β model for random graphs on n vertices v 1 , . . . , v n . The parameter h yp ergraph M β is the complete graph on v ertices β 1 , . . . , β n . T o see wh y , recall the probabilit y form ula of the edges in Example 2.1: the random v ariables Z 1 , . . . , Z m represen t the m =  n 2  edges of the graph. The probabilit y of the joint state where Z k = 1 and Z l = 0 for l 6 = k , that is, the ev ent of the occurrence of only the edge Z k := { i, j } in the graph, is proportional to β i β j . 4 Note that H M b eing complete do es not mean that all edges { v i v j } in the random graph are observ ed, but instead that all edges { β i β j } of M β corresp ond to random graph edges that are al lowe d . In particular this is wh y the graph on the righ t of Figure 1 is complete with non-zero probabilit y w eights on every edge- there are no structural zeros in that example. Similarly , for the hypergraph β mo del, the parameter h yp ergraph is the complete hypergraph, either uniform, la y ered, or general, resp ectiv ely in each of the there cases of the mo del. Recently , the h yp ergraph H p 1 of the p 1 random graph model was describ ed in [GPS] and consists of edges of sizes 1, 3 and 7, for probabilities of no connection b et ween t wo nodes, or a directed edge, or a recipro cated edge, resp ectiv ely; for example, a directed edge i → j o ccurs with probability λ ij α i β j and is thus enco ded by a 3-edge on H p 1 . R emark 3.10 . The parameter h yp ergraph runs the danger of b eing lost in translation, so to under- stand its construction, recall that the suﬃcient statistics of M are computed using a linear map. Let M b e the matrix of that linear map. The p ar ameter hyp er gr aph H M of the mo del is simply the hyp er gr aph whose vertex-e dge incidenc e matrix is this matrix M . F or the β mo del, the deﬁnition may app ear trivialized, but further log-linear mo del examples show this isn’t the case in general; see for example Figure 2 in [GPS] for t w o further examples of parameter h yp ergraphs. What ab out the ﬁb ers of M ? Consider again the β model for graphs: t w o random graphs g 1 and g 2 o ccur with the same probability and are in the same ﬁb er of the model if and only if their images under the linear map M are the same. If g 1 and g 2 are enco ded by a set of red and blue edges on M β , resp ectively , they ha ve the same probability if and only if the same multiset of parameters β i ’s are cov ered b y the red and blue edges. In other w ords, the red edge set and the blue edge set are graphs with the same degree sequence. This degree sequence corresp ondence for M β seems trivial; remark ably , it holds for all linear ERGMs. Deﬁnition 3.11. Let E b e a multiset collection of edges in a h yp ergraph H . E is b alanc e d with resp ect to a giv en bicoloring of H if for eac h v ertex v cov ered by E , the num ber of red edges con taining 4 Of course we tak e the minimal presentation here; knowing P r ob ( Z 1 = 1 , Z 2 = 0) and P r ob ( Z 1 = 0 , Z 2 = 1), we need not consider the joint probability P rob ( Z 1 = 1 , Z 2 = 1). 14 SONJA PETRO VI ´ C v equals the n umber of blue edges con taining v . In other words, the ‘blue degree’ of E equals its ‘red degree’. The deﬁnition ab ov e app ears as Deﬁnition 2.7 in [PS14] for the case of uniform h yp ergraphs, and Deﬁnition 5.7 in [PTV] for an arbitrary hypergraph. The motiv ation for this deﬁnition and in terpretation of the binomials as bicolored edge sets was to study toric algebra of hypergraphs by generalizing the algebraic literature on graphs. The bicoloring construction idea simply generalizes Villarreal’s [Vil95] ‘closed alternating walks’ on graphs, see [R TT12] for a further classiﬁcation and c haracterization of the w alks. A straightforw ard argument no w shows the following. Theorem 3.12 (See Theorem 2.8 of [PS14] for the uniform case) . L et H M b e the p ar ameter hyp er- gr aph of the mo del M . Then any b alanc e d c ol le ction of e dges E ⊂ E ( H ) c onstitutes a move in the toric ide al asso ciate d to H . In p articular, the set of balanced bicolored subgraphs c onne ct al l ﬁb ers of the mo del M . Note that a balanced bicolored h yp ergraph is simply a set of tw o hypergraphs, not necessarily simple, that ha ve the same degree sequence! Examples and structure of balanced edge sets on a h yp ergraph, as well as extension to the non-uniform case, can b e found in [PTV, § 5.1 and 5.2]. Th us far, w e hav e seen ho w all linear ERGMs that are enco ded b y 0 / 1 matrices M - ‘legal’ incidence matrices - giv e rise to parameter h ypergraphs. But what ab out mo dels whose probabilit y form is more general? Surprisingly , by a recen t result [PTV, Theorem 6.2], al l toric ideals and therefore all linear ER GMs and log-linear mo dels for discrete data are suc h that there exists a hypergraph that enco des their Grav e r bases and preserves the complexity . Therefore, we arriv e at the following crucial fact: The p ar ameter hyp er gr aph enc o des suﬃcient statistics of any line ar ERGM by a hyp er gr aph de gr e e se quenc e. This further reinforces the presen t author’s view that discrete mathematics literature on sampling graphs and hypergraphs with ﬁxed prop erties should b e imported to statistics and considered a crucial tool in linear ER GM mo del ﬁtting in particular. The follo wing General Problem summarizes the discussion of this section: Problem 3.13 (A restatement of General Problem 1 with constraints) . Fix a gener al line ar ERGM with mo del hyp er gr aph H , so that the incidenc e matrix of H c omputes the suﬃcient statistics ve ctor T ( g ) for the r andom gr aph g . Develop an eﬃcient algorithm that samples fr om the set of sub- hyp er gr aphs of H with arbitr ary ﬁxe d de gr e e se quenc e. Of course, as discussed ab ov e, we kno w that ‘2-switc hes’, or quadratic Mark ov mov es, suﬃce to connect all realizations of hypergraph degree sequences b ecause they corresp ond to the the deﬁning ideal of the v ariety of V eronese-t yp e corresponding to the r-th hypersimplex [Stu96, Chapter 14], but intermediate steps ma y go through non-simple hypergraphs, that is, those that hav e edges with m ultiplicities larger than 1. F urther, this construction ignores p ossible forbidden edges. A related problem for the restricted class of complete k -uniform h yp ergraphs can b e found in the problem collection [W es, Problem 2]: F or k ≥ 4, determine whether there exists a function f ( k ) and a set of operations, each on at most f ( k ) vertices, that can be used to transform one k -realization of a k -graphic sequence in to another. Even if this problem is solved, it would only impact the uniform case of the h yp ergraph β mo del, for which H M is complete, that is, there are no forbidden edges. F or applications, statistical sampling constraints and forbidden edges really impact the solution. F or example, the famous ‘bad news Theorem’ for Marko v bases in algebraic statistics from [DLO06] sho ws that Marko v bases for already the simplest non-trivial mo del on 3 discrete random v ariables, called the no-three-w a y interaction mo del, are arbitrarily complicated. What this means is that there are ﬁbers of observ able data p oints that require mo v es of arbitrarily large degree if the n um b er A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 15 of states of at least tw o of the 3 v ariables are not bounded. F or completeness, let me add also the complemen ting ‘go o d news Theorem’ for Marko v bases famous in algebraic statistics: [HS07] sho w that the Mark o v bases for that same no-three-w ay in teraction mo del hav e b ounded complexit y if the n umber of states of t wo of the 3 v ariables is b ounded. Inciden tally , the parameter h yp ergraph of the no-three-wa y mo del is 3-uniform and 3-partite, it is not complete: there are as many edges as in K p,q where p and q are sizes of t w o of its parts, and it is regular in eac h part. So the presence of forbidden edges in H M comp ound the issue of connecting the ﬁb er in the worst p ossible w ay . Ho wev er, p ositiv e results can b e obtained for restricted classes of problems, references to which are scattered throughout this section. 4. Pol ytopes pla y an impor t ant r ole in ER GMs The parameter estimation problem. In the estimation problem, giv en an observ ation of the join t states of the random v ariables, the goal is to estimate the unkno wn probabilit y distribution p θ 0 from a mo del M that ‘b est explains’ the data. A basic metho d commonly used is that of maxim um lik eliho o d estimation: a maxim um likelihoo d estimate (MLE) of θ 0 is a parameter vector ˆ θ ∈ Θ that mak es the given data most lik ely to hav e b een observed. It is deﬁned as: (4.1) b θ = argmax θ ∈ R n p θ ( x ) . Computing it amoun ts to maximizing the log-lik eliho o d function, which maps the parameters in- dexing the probability distributions in a mo del to the likelihoo d of observing the given data. While there are several estimators that can b e used, an MLE is a v ery straigh tforw ard one that enjo ys nice statistical prop erties such as consistency and asymptotic eﬃciency , and computing it is a seemingly simple optimization problem. Ho w ev er, there are sev eral obstacles. One if them is that the MLE of the natural parameters of the model ma y not exist, meaning that w e are unable to mak e infer- ences ab out any or a subset of the parameters from the given data. As exp ected, the estimation problem has b een studied in detail in the statistics literature for man y classes of mo dels. In the last decade, it has b een shown that for discrete exp onential families, the geometric structure of the mo del captures imp ortan t information about parameter estimation including MLE existence; see [RFZ09], [DFR + 08], cf. [Gey09], [Han03]. Luc kily , in case of exponential families, 5 if the MLE exists it is unique. The follo wing p olyhedral ob ject captures the MLE information in exp onential family mo dels. Deﬁnition 4.1. Let T = { T ( g ) ∈ R d } b e the range of suﬃcient statistics T , where g ranges ov er the set of all observ able net w orks in the sample space. The mo del p olytop e asso ciated with the family M is P = con vhull( T ) ⊂ R d . W e ma y assume P to b e full-dimensional. Example 4.2. The mo del p olytop e P for the β mo del for random graphs on n vertices is the p olytop e of degree sequences of graphs on n vertices. [RPF13] use the explicit description of facets from [MP95] to giv e a characterization of when the suﬃcien t statistic vector is on the b oundary of the p olytop e, and study the statistical consequences. 5 The general case is not so lucky at all: an MLE may not exist b ecause the likelihoo d function is not globally b ounded; further, if an MLE exists, it ma y not b e unique. Also, the lik elihoo d function may be highly multimodal; a simple example is pro vided already b y a standard example of mixture of normal distributions. In that case, standard to ols such as hill-climbing algorithms (e.g. Newton-Rhapson) may fail to ﬁnd the maximum and conv erge to a lo cal optim um only , and would not be aw are of it. Finally , MLE may b e lo cated on the b oundary of the mo del, which is p o orly understo o d except in few cases in very recent literature [KRS15]. 16 SONJA PETRO VI ´ C The complexit y of the MLE problem for the model M is captured by the geometry of the p olytop e P and b y the combinatorics of its face lattice as follows: the MLE for the observe d data exists if and only if the observe d suﬃcient statistic ve ctor lies in the r elative interior of the mo del p olytop e P . F or instance, [RPF13] studies MLE existence for the β mo del for graphs and links to the graph theory literature on degree sequences. Nonetheless, partly b ecause the link with p olyhedral geometry has remained largely unexplored in b oth the mathematical and statistical literature, metho dologies for estimation and mo del v alidation under a non-existent MLE with prov en statistical p erformance hav e y et to b e developed. A ﬁrst step in this direction is to solve the following. Problem 4.3 (General Problem 2) . Fix a r andom gr aph mo del with suﬃcient statistics ve ctor T ( g ) and sample sp ac e G n . Often, G n wil l b e the set of simple gr aphs on n vertic es. Determine the mo del p olytop e P = conv { T ( g ) : g ∈ G n } . What ar e the extr eme p oints? What is the facial structur e? What is the pr op ortion of r e alizable lattic e p oints on the b oundary vs. in the interior? The following can be though t of a subproblem of 4.3, but due to its diﬃculty and relev ance for dev eloping algorithms to detect MLE non-existence, w e single it out. Problem 4.4 (A sequel to General Problem 2) . Develop an eﬃcient algorithm that c an determine whether a given p oint lies on the r elative interior of P ; i.e., obtain a fac et description of the mo del p olytop e that is ‘eﬃcient’. While the general theoretical link b et ween statistical estimation and p olyhedral geometry is in place for discrete exp onential families as outlined in [RFZ09] and references therein, statistics now relies on discrete metho ds to provide the required to ols. Let us gather here some of the relev ant results. W e emphasize again that for discrete exp onen tial families if an MLE exists then it is unique. Estimation algorithms will b eha ve well when oﬀ the b oundary of the p olytop e. F or example, [SSR + 14] deﬁne the h yp ergraph β mo del and sho w that ﬁxed p oin t algorithms will con v erge to the MLE geometrically fast as long as the suﬃcien t statistic is on the relativ e in terior of the p olytop e. [RPF13] use the polytop e of graphical degree sequences to oﬀer a complementary study to [CDS11] of the estimation problem in the β mo del. Another simple of in terest is called the edge-triangle mo del. Its suﬃcien t statistic vector consists of tw o num bers: the n umber of edges and the num ber of triangles in the graph. Its p olytop e and estimation issues are addressed in [RFZ09], the asymptotics of the MLE existence problem, p olytop e and its fan in [YRF13], while [HCT15] study its degeneracy from the p oin t of view of statistical mechanics, dealing with the issue of non-realizable p oin ts that exist within the mo del p olytop e in suc h a wa y that they are mean t to represen t data av erages but are by deﬁnition non-interpretable. Ho wev er, Problem 4.3 has not been solv ed for most mo dels listed in this review. F or example, the p olytop e of hypergraph degree sequences has b een studied in the literature and some partial results are kno wn: [Liu13] sho ws that the set of h yp ergraph degree sequences for uniform hypergraphs is non-con vex, and th us Erd¨ os-Gallai-t yp e theorems do not hold. [MS02] also oﬀers a very nice review of the problem, and sho ws that vertices of the p olytop e are known by Theorem 2.5. Namely , they are r -threshold sequences, where r is the uniformity of the hypergraph. F urther, some of the facet inequalities are known. Data priv acy and noisy suﬃcien t statistics. Statistical analyses of netw ork data sometimes require pro jection of a noisy degree sequence on to the model p olytop e, in particular its realizable p oin ts. The noisy sequence need not b e a realizable degree sequence vector at all. The example b elo w is considered from the p oint of view of data priv acy and conﬁdentialit y , but note that the problem is also relev an t for noisy sampling, for example when there are measuremen t errors and the rep orted observed suﬃcient statistic do es not represent a realizable sequence. The basic concept of data priv acy and conﬁden tiality is to minimize the risk of releasing sensitiv e information about, say in our case, p eople in a so cial net w ork, while maximizing statistical utilit y of A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 17 the data. Having statistical utilit y means that the released information can b e used for statistical analyses. A straigh tforward example of suc h information that one ma y w an t to release is the observ ed v alue of the suﬃcien t statistic. How ev er, in the age where information about any particular data set may b e gathered from diﬀerent sources, releasing a suﬃcien t statistic, e.g., the degree sequence in the β mo del case, ma y not b e ‘priv ate enough’. F or this reason, in the interest of preserving data priv acy , noise is added to the observed suﬃcien t statistic in order to hav e it releasable to the public while not jeopardizing sensitiv e net w ork information. Of course this presen ts v arious problems for statistical inference; th us the noise is added in a principled wa y . There are sev eral well-kno wn priv acy-preserving mechanisms. The statistical task is then to determine how to p erform reliable inference using the noisy statistic; for an example and ov erview of the relev an t priv acy literature for ER GMs see [SKK14]. One of the steps in [KS16] that is required for computing a priv ate parameter estimator of the released data, with goo d statistical prop erties, is to pro ject the noisy suﬃcient statistic on to the lattice p oin ts in the mo del p olytop e. T o this end, one needs to understand the follo wing. Problem 4.5 (General Problem 3) . Fix a r andom gr aph mo del with suﬃcient statistics ve ctor T ( g ) , sample sp ac e G n . Often, G n wil l b e the set of gr aphs on n vertic es. L et P = conv { T ( g ) : g ∈ G n } b e the mo del p olytop e. Develop an (eﬃcient) algorithm to c ompute the pr oje ction of a noisy suﬃcient statistic ve ctor onto the r e alizable ve ctors in P . Ob viously , determining if a lattice p oin t is a realizable suﬃcien t statistic for the mo del M is a problem on its own. In fact realizability results of Hav el-Hakimi [Hav55, Hak62] t yp e can b e used to solv e Problem 4.5: Example 4.6. F or the β mo del, [KS16, KS12] use the Hav el-Hakimi decomposition to compute the realizable lattice point in the polytop e of degree sequences that is closest to the giv en noisy sequence. In order to ac hiev e a more eﬃcien t implemen tation, they in fact use the p olytop e of degree partitions [BSS06], that is, of sorted degree sequences. This p olytop e can b e thought of as an asymmetrized v ersion of the degree sequence p olytop e. This directly leads to the analogous problems for the other mo dels, which are singled out for con venience. Problem 4.7 (An instance of General Problem 3) . Gener alize Havel-Hakimi to the p 1 mo del p olytop e. Remark ably , the directed part of this problem has been solv ed in [EMT10], where they also study the case of some forbidden edges. The authors there also pro ve a result useful to design an MCMC algorithm to ﬁnd random realizations of prescrib ed directed degree sequences, which relates bac k to Problem 3.6 (solved in [PRF10]). How ev er, this sampling algorithm oﬀers a partial solution to the problem in the general case, though it should b e easy to mo dify it to ﬁt the statistics framework. Namely , in the general v ariant of the p 1 mo del, recipro cated edges are of the form i → j and i ← j sim ultaneously com bined in to one bidirected edge i ↔ j . The suﬃcient statistics of the p 1 mo del include the num b er of reciprocated edges! This means that in sampling or ﬁnding random realizations, this num b er also matters: reciprocated edges are allow ed to b e ‘pulled apart’ while sampling, but the total n umber of them is to remain constant. Recall again that forbidden edges in this case corresp ond to structural zeros in the p 1 mo del. Problem 4.8 (An instance of General Problem 3) . Gener alize Havel-Hakimi to the thr e e variants of the β hyp er gr aph mo del p olytop e: uniform, layer e d or gener al hyp er gr aph de gr e e se quenc es. The uniform instance of this problem is essentially [BEF + 13, Problem 1.1], who also emphasize that the only known c haracterization of k -graphic sequences is due to Dewdney in 1975, but it do es not yield an eﬃcien t algorithm. Two of the v arian ts of the β h yp ergraph mo del w ork with nonuniform h yp ergraphs. 18 SONJA PETRO VI ´ C 5. Concluding remarks There exist, of course, families of ER GMs beyond the ones discussed in this brief o v erview, in- cluding those based on global summary statistics not related to no de degrees, for example the cores decomp osition [KPP + ], or other types of mo dels such as graphical mo dels for netw orks [SR14a]. A uniform sampler for the ﬁb ers of the former mo del family has not b een developed, while the alge- bra and geometry of the latter mo del family has not b een explored. Clearly these are interesting problems in the statistical analysis of netw orks and, in particular, algebraic statistics. But the main p oin ts of this chapter are: • Several basic examples of linear ERGMs already oﬀer many in teresting op en problems in discrete mathematics, instances and sp ecial cases of which are already known and interesting in their own righ t from the graph-theoretic p oint of view. • These mo dels also pro vide a theoretical foundation for exploring distributions of v arious net work summary statistics that could serve to develop a rigorous testing and mo del ﬁt- ting framework for netw orks, now adays generally lacking despite the explo ding literature on net work analysis. The ﬁrst t yp e of problem discussed here is the sampling problem giv en a ﬁxed set of c haracteristics of a net w ork. Within the context of statistical mo deling of netw orks, the v alue of sampling and therefore of exact testing can b e summarized as follo ws. Comparing to the reference set F t a voids the use of asymptotics, oﬀers an alternativ e to model ﬁt testing, and mak es it unnecessary to use large- sample appro ximations to sampling distributions, in particular when their adequacy has not b een determined. As demonstrated on several cases in the literatue, to ols from algebraic statistics oﬀer a v alid and critical approach for analyzing suc h mo dels. How ever, within this realm of mathematical researc h, several problems remain, including problems related to scalability and applicability of the algebraic metho ds. The second type of problem discussed here is related to the mo del p olytop e deﬁned to b e the conv ex h ull of all suﬃcient statistics for the mo del. The p olytop e captures the diﬃcult y of parameter estimation and MLE existence and is a crucial step in statistical inference. Both problems are key steps in the general scheme of testing goo dness of ﬁt of the model and mo del selection, detailed statistical considerations of whic h are b eyond the scop e of the present chapter. T o close, it is w orth p ointing out that algebraic statistics is in fact a muc h broader ﬁeld, not conﬁned to the study of log-linear mo dels for discrete data. Indeed, the ﬁeld builds up on the ric h and long history of the use of algebraic to ols in statistics, starting with Fisher [Fis25] and algebra for confounding [PW83], but has expanded to encompass at least three distinct ﬁelds of mathematics: commutativ e and computational algebra, via solving systems of polynomial/rational equations; com binatorics of graphs, h ypergraphs, and simplicial complexes; and geometry , algebraic, con vex, and p olyhedral. The ﬁeld is roughly tw o decades old, y et it has many facets, and recent theoretical adv ances show that there is a signiﬁcant impact on applicability and b ehavior of v arious statistical metho ds, including but not limited to the following problems: parameter estimation and reliabilit y of inference for discrete exponential families [RFZ09, RPF10] and Gaussian mo dels [ZUR]; general metho ds for exp erimental design [PR W01]; parameter identiﬁabilit y studied in numerous references of [APRS11], [FSD11] and complexity of parameter estimation for Gaussian graphical mo dels [DFD12], [DGP12], [GS]; Ba yesian mo del selection and singular learning theory [W at10, W at09], sampling from conditional or marginal distributions on contingency tables with implications to cell b ounds and data priv acy [Sla10, SZP14], exact tests for marginal table mo dels [Dob03], and mo del ﬁtting via exact tests for degree-based netw ork mo dels [OHT13, SSR + 14], as well as oﬀering new mo dels and algorithms for netw orks not based on degrees [KPP + ]. This bibliograph y is far from complete, but should give enough information on some recen t adv ances in the ﬁeld to the in terested reader. A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 19 A cknowledgements Supp orted in part b y AFOSR Grant #F A9550-14-1-0141. The author is endlessly grateful to Stephen E. Fien b erg for his guidance in the study of statis- tical netw ork mo dels, and Alessandro Rinaldo, Despina Stasi and Elizab eth Gross for contin ued enligh tening collab orations and motiv ating discussions. Man y thanks to T obias Windisch for a thorough reading and many imp ortan t corrections of the initial v ersion of this man uscript. In addition, tw o anonymous referees hav e help ed impro v e the text tremendously . The author is also grateful to Zolt´ an T oro czk ai and ´ Ev a Czabark a for in tro duction to the relev ant graph theory literature, and P´ eter L. Erd¨ os and Istv´ an Mikl´ os for clarifying questions and remarks. References [Agr92] Alan Agresti, A survey of exact infer enc e for c ontingency tables , Statistical Science 7 (1992), no. 1, 131–153. [AHT12] Satoshi Aoki, Hisayuki Hara, and Akimichi T akem ura, Markov b ases in algebr aic statistics , Springer Series in Statistics, Springer New Y ork, 2012. [APRS11] Elizab eth Allman, Sonja Petro vi´ c, John Rho des, and Seth Sulliv an t, Identiﬁability of two-tr e e mixtur es under gr oup-b ase d mo dels , IEEE/A CM T ransactions in Computational Biology and Bioinformatics 8 (2011), no. 3, 710–722. [BD10] Joseph Blitzstein and Persi Diaconis, A se quential imp ortanc e sampling algorithm for gener ating r andom gr aphs with pr escrib e d degr e es , In ternet Math 6 (2010), 489–522. [BD15] P eter J. Bick el and Kjeli A. Doksum, Mathematic al statistics, b asic ide as and sele cte d topics, vol. 1 , 2 ed., Chapman & Hall/CR C T exts in Statistical Science, 2015. [BDE] Mic hael J. Bannister, William E. Dev anny , and David Eppstein, ERGMs ar e Har d , Preprint, arXiv:1412.1787 [cs.DS]. [BEF + 13] Sarah Behrens, Catherine Erb es, Michael F errara, Stephen G. Hartk e, Benjamin Reiniger, Hannah Spinoza, and Charles T omlinson, New r esults on degr e e se quenc esof uniform hyp er gr aphs , Electronic Journal of Com- binatorics 20 (2013), no. 4. [BGJM11] Steve Bro oks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng (eds.), Handb o ok of Markov Chain Monte Carlo , Handb o ok of Mo dern statistical metho ds, Chapman & Hall/CRC, 2011. [Bha12] Amita v a Bhattachary a, Alternating r e achability and inte ger sum of close d alternating tr ails , Graph- Theoretic Concepts in Computer Science: 38th In ternational W orkshop WG 2012 (Martin Charles Golum bic, Michael Stern, Avivit Levy , and Gila Morgenstern, eds.), Lecture Notes in Computer Science, Springer, 2012. [BN14] Ole Barndorﬀ-Nielsen, Information and exp onential families in statistic al the ory , Wiley , 1978, reprinted in 2014. [BPS07] Amita v a Bhattachary a, Uri P eled, and Murali K. Sriniv asan, Cones of close d alternating walks and tr ails , Linear Algebra and its Applications 423 (2007), no. 2-3, 351–365. [BPS09] , The c one of b alanc e d sub gr aphs , Linear Algebra and its Applications 431 (2009), no. 1-2, 266–273. [Bro86] La wrence Bro wn, F undamentals of statistic al exp onential families , Monograph Series, v ol. 9, IMS Lecture Notes, Ha yward, CA, 1986. [BSS06] Amita v a Bhattachary a, Siv aramakrishnan Siv asubramanian, and Murali K. Sriniv asan, The p olytop e of de gr e e p artitions , Electronic Journal of Combinatorics 13 (2006), no. 1. [CDEM15] ´ Ev a Czabark a, Aaron Dutle, P´ eter L. Erd¨ os, and Istv an Mikl´ os, On r e alizations of a joint degr e e matrix , Discrete Applied Mathematics 181 (2015), 283–288. [CDG07] Colin Co op er, Martin Dyer, and Catherine Greenhill, Sampling r e gular gr aphs and a p e er-to-p e er network , Com binatorics, Probability and Computing 16 (2007), no. 4, 557–593. [CDS11] Soura v Chatterjee, Persi Diaconis, and Allan Sly , R andom gr aphs with a given de gr e e se quenc e , Ann. Appl. Probab. 21 (2011), no. 4, 1400–1435. [CKHG15] Nicole Bohme Carnegie, Pa vel N. Krivitsky , David R. Hunter, and Steven M. Go o dreau, An approximation metho d for impr oving dynamic network mo del ﬁtting , Journal of Computational and Graphical Statistics 24 (2015), no. 2, Journal of Computational and Graphical Statistics, in press. [DFD12] Mathias Drton, Rina F oygel, and Jan Draisma, Half-tr ek criterion for generic identiﬁability of line ar struc- tur al e quation mo dels , Annals of Statistics 40 (2012), no. 3, 1682–1713. 20 SONJA PETRO VI ´ C [DFR + 08] Adrian Dobra, Stephen E. Fienberg, Alessandro Rinaldo, Aleksandra Slavk o vi´ c, and Yi Zhou, Algebr aic statistics and c ontingency table pr oblems: L o g-linear mo dels, likeliho o d estimation and disclosur e limitation. , In IMA V olumes in Mathematics and its Applications: Emerging Applications of Algebraic Geometry , Springer Science+Business Media, Inc, 2008, pp. 63–88. [DGP12] Mathi as Drton, Elizab eth Gross, and Sonja Petro vi ´ c, Maximum likeliho o d de gr e e of variance c omp onent mo dels , Electronic Journal of Statistics 6 (2012), no. 0, 993–1016. [DLO06] Jesus A. De Loera and Shmuel Onn, Markov b ases of thr e e-way tables ar e arbitr arily c omplic ate d , J. Symbolic Comput. 41 (2006), no. 2, 173–181. [Dob03] Adrian Dobra, Markov bases for de c omp osable gr aphic al mo dels , Bernoulli 9 (2003), no. 6, 1093–1108. [Dob12] , Dynamic Markov b ases , Journal of Computational and Graphical Statistics (2012), 496–517. [DS98] P ersi Diaconis and Bernd Sturmfels, A lgebr aic algorithms for sampling fr om c onditional distributions , An- nals of Statistics 26 (1998), no. 1, 363–397. [DS04] Adrian Dobra and Seth Sulliv ant, A divide-and-c onquer algorithm for gener ating Markov b ases of multi-way tables , Computational Statistics 19 (2004), 347–366. [DSS09] Mathias Drton, Bernd Sturmfels, and Seth Sulliv ant, L e ctur es on algebr aic statistics , Oberwolfac h Seminars, v ol. 39, Birkh¨ auser, 2009. [EMT10] P ´ eter L. Erd¨ os, Istv an Mikl´ os, and Zolt´ an T oro czk ai, A simple Havel–Hakimi typ e algorithm to r e alize gr aphic al de gr e e se quenc es of dir e cte d gr aphs , Electronic Journal of Combinatorics 17 (2010), no. 1. [EMT15] , A de c omposition b ase d pr o of for fast mixing of a markov chain over b alanc e d r e alizations of a joint de gr e e matrix , SIAM Journal of Discrete Math 29 (2015), no. 481-499. [Fis25] Sir Ronald Aylmer Fisher, Statistic al metho ds for r ese ar ch workers , Edinburgh: Oliver & Boyd, 1925. [FMW85] Stephen E. Fienberg, Michael M. Meyer, and Stanley S. W asserman, Statistic al analysis of multiple so cio- metric relations , Journal of the American Statistical Asso ciation 80 (1985), 51–67. [FSD11] Rina F oygel, Seth Sulliv ant, and Mathias Drton, Glob al identiﬁability of line ar structur al e quation mo dels , Annals of Statistics 39 (2011), 865–886. [FW81] Stephen E. Fien berg and Stanley S. W asserman, Discussion of Hol land, P. W. and Leinhar dt, S. “An exp onential family of pr ob ability distributions for dir e cte d gr aphs.” , Journal of the American Statistical Asso ciation 76 (1981), 54–57. [Gey09] Charles J. Gey er, Likeliho o d inferenc e in exp onential families and dir ections of r e c ession , Electronic Journal of Statistics 3 (2009), 259–289. [Go o07] Steven M. Go o dreau, A dvanc es in exp onential r andom gr aph ( p ∗ ) mo dels applied to a lar ge so cial network , So cial Net works 29 (2007), no. 2, 231–248. [GPS] Elizab eth Gross, Sonja P etro vi´ c, and Despina Stasi, Go o dness-of-ﬁt for lo g-line ar network models: Dynamic Markov bases using hyp er gr aphs , Annals of the Institute of Statistical Mathematics, to app ear. [Gre15] Catherine Greenhill, The switch Markov chain for sampling irr e gular gr aphs , 26th Ann ual ACM-SIAM Symp osium on Discrete Algorithms (New Y ork-Philadelphia), A CM, 2015, pp. 1564 – 1572. [GS] Elizab eth Gross and Seth Sulliv an t, The maximum likeliho o d thr eshold of a gr aph , Preprint av ailable at [GZF A09] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi, A survey of statistic al network models , F oundations and T rends in Machine Learning 2 (2009), no. 2, 129–233. [Hab81] Shelby J. Hab erman, An exp onential family of pr ob ability distributions for dir e cte d gr aphs: Comment , Journal of the American Statistical Asso ciation 76 (1981), no. 373, 60–61. [Hab88] , A warning on the use of chi-squar e d statistics with fr e quency tables with smal l exp e cte d c el l counts. , Journal of the American Statistical Asso ciation 83 (1988), 555–560. [Hak62] Seifollah Louis Hakimi, On r e alizability of a set of inte gers as de gr e es of the vertic es of a line ar gr aph, i , Journal of the So ciety for Industrial and Applied Mathematics 10 (1962), 496–506. [Han03] Mark S. Handcock, Assessing de gener acy in statistical mo dels for so cial networks , W orking pap er 39., Center for Statistics and the So cial Sciences, Universit y of W ashington, Seattle, 2003. [Ha v55] V´ aclav Hav el, A r emark on the existenc e of ﬁnite graphs , ˇ Casopis pro pˇ estov´ an ´ ı matematiky 80 (1955), no. 477-480. [HCT15] Szab olcs Horv´ at, ´ Ev a Czabark a, and Zolt´ an T oro czk ai, R e ducing de gener acy in maximum entr opy mo dels of networks , Physical Review Letters 114 (2015), no. 158701. [HGH08] Da vid R. Hunter, Steven M. Go o dreau, and Mark S. Handcock, Go o dness of ﬁt of social network mo dels , Journal of the American Statistical Asso ciation 103 (2008), no. 481. [HL81] P aul W. Holland and Sam uel Leinhardt, A n exp onential family of pr ob ability distributions for dir e cte d gr aphs (with discussion) , Journal of the American Statistical Association 76 (1981), no. 373, 33–65. A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 21 [HM07] Mark S. Handco ck and Martina Morris, A simple mo del for c omplex networks with arbitr ary de gr e e dis- tribution and clustering , Statistical Netw ork Analysis: Mo dels, Issues and New Directions (Edo Airoldi, D. Blei, Stephen E. Fien b erg, A. Goldenberg, E. Xing, and A. Zheng, eds.), Lecture Notes in Computer Science, v ol. 4503, Springer, 2007, pp. 103–114. [HS07] Serk an Ho¸ sten and Seth Sulliv an t, A ﬁniteness the or em for markov b ases of hier archic al mo dels , J. Combin. Theory Ser. A 114 (2007), no. 2, 311–321. [HTE + 09] Kim Hyunju, Zolt´ an T oro czk ai, P´ eter L. Erd¨ os, Istv an Mikl´ os, and L´ aszl´ o A. Sz´ ekely , Degr e e-b ase d gr aph c onstruction , Journal of Physics A: Mathematical and Theoretical 42 (2009), no. 39. [Jac08] Matthew O. Jackson, So cial and e c onomic networks , Princeton Universit y Press, 2008. [KK15] P av el N. Krivitsky and Eric D. Kolaczyk, On the question of eﬀe ctive sample size in network mo deling: A n asymptotic inquiry , T o app ear in Statistical Science., 2015. [Kol09] Eric D. Kolaczyk, Statistic al analysis of network data , Springer, 2009. [KPP + ] Vishesh Karw a, Mic hael P elsma jer, Sonja Petro vi´ c, Despina Stasi, and Dane Wilburne, Statistic al mo dels for cor es de c omp osition of an undir e cte d r andom graph , Submitted; in revision. [KRS15] Kaie Kub jas, Elina Rob ev a, and Bernd Sturmfels, Fixe d p oints of the EM algorithm and nonne gative r ank b oundaries , Annals of Statistics 43 (2015), no. 1, 422–461. [KS12] Vishesh Karwa and Aleksandra Sla vko vi ´ c, Diﬀerentially private gr aphic al de gr e e se quenc es and synthetic gr aphs , Priv acy in Statistical Databases. (2012), 273–285. [KS16] , Infer enc e using noisy de gr e es: Diﬀer ential ly private β -mo del and synthetic gr aphs , Annals of Sta- tistics 44 (2016), no. 1, 87–112. [KTV99] Ravi Kannan, Prasad T etali, and Santosh V empala, Simple Markov-chain algorithms for gener ating bip artite gr aphs and tournaments , Random Structures and Algorithms 14 (1999), no. 1, 293–308. [LC83] Eric h Leo Lehmann and George Casella, Theory of point estimation , 1 ed., London: Chapman and Hall, 1983. [LC98] , The ory of p oint estimation , 2 ed., Springer-V erlag, New Y ork, 1998. [Liu13] Ricky Ini Liu, Nonc onvexity of the set of hyp er gr aph de gr e e se quenc es , Electronic Journal of Combinatorics 20 (2013), no. 1. [LS06] L´ aszl´ o Lov´ asz and Bal´ azs Szegedy , Limits of dense graph se quenc es , Journal of Com binatorial Theory , Series B 96 (2006), no. 6, 933–957. [MES13] Istv an Mikl´ os, P´ eter L. Erd¨ os, and La jos Soukup, T owar ds random uniform sampling of bip artite graphs with given de gr e e se quenc e , Electronic Journal of Combinatorics 20 (2013). [Mor13] Jason Morton, R elations among c onditional pr ob abilities , Journal of Sym bolic Computation 50 (2013), 478–492. [MP95] N. V. R. Mahadev and Uri N. Peled, Thr eshold gr aphs and r elate d topics , Elsevier, 1995. [MS02] N.L. Bhanu Murthy and Murali K. Sriniv asan, The p olytop e of de gr e e se quenc es of hyp ergr aphs , Linear Algebra and its Applications 350 (2002), no. 1-3, 147–170. [New03] Mark E. J. Newman, The structur e and function of c omplex networks , SIAM reiv ew 45 (2003), no. 2, 167–256. [New10] , Networks: An intro duction , Oxford Universit y Press, 2010. [NSW01] Mark E. J. Newman, S. H. Strogatz, and D. J. W atts, R andom gr aphs with arbitr ary de gr e e distributions and their applic ations , Ph ysical Review E 026118 (2001). [OH99] Hidefumi Ohsugi and T ak ayuki Hibi, T oric ide als gener ate d by quadr atic binomials , Journal of Algebra 218 (1999), no. 2, 509–527. [OHT13] Mitsunori Ogaw a, Hisayuki Hara, and Akimichi T akem ura, Gr aver basis for an undir e cte d gr aph and its applic ation to testing the beta mo del of r andom gr aphs , Annals of Institute of Statistical Mathematics. 65 (2013), 191–212. [PN04] Juy ong P ark and Mark E. J. Newman, Statistic al me chanics of netowrks , Ph ysical Review E 066117 (2004). [PRF10] Sonja Petro vi´ c, Alessandro Rinaldo, and Stephen E. Fienberg, Algebr aic statistics for a dir e cte d random gr aph mo del with r e cipro c ation , Algebraic Metho ds in Statistics and Probability II (Marlos A. G. Viana and Henry Wynn, eds.), Contemporary Mathematics, vol. 516, American Mathematical So ciety , 2010. [PR W01] Gio v anni Pistone, Ev a Riccomagno, and Henry Wynn, Computational c ommutative algebr a in discrete statistics , Con temp orary Mathematics 287 (2001), 267–282. [PS14] Sonja Petro vi´ c and Despina Stasi, T oric algebr a of hyp er gr aphs , Journal of Algebraic Combinatorics 39 (2014), no. 1, 187–208. [PTV] Sonja Petro vi ´ c, Ap ostolos Thoma, and Marius Vladoiu, Bouquet algebr a of toric ide als. , Submitted. Preprint a v ailable at h 22 SONJA PETRO VI ´ C [PW83] Gio v anni Pistone and Henry Wynn, Gener alise d c onfounding with Gr¨ obner b ases , Biometrik a 96 (1983), 653–666. [R C99] Christian Robert and George Casella, Monte Carlo statistical metho ds , Springer T exts in Statistics, Springer-V erlag, New Y ork, 1999. [R C09] , Intr o ducing Monte Carlo metho ds with R , Springer-V erlag, 2009. [RFZ09] Alessandro Rinaldo, Stephen E. Fien b erg, and Yi Zhou, On the geometry of discr ete exp onential families with application to exp onential r andom gr aph mo dels , Electronic Journal of Statistics 3 (2009), 446–484. [RPF10] Alessandro Rinaldo, Sonja P etro vi´ c, and Stephen E. Fienberg, On the existenc e of the MLE for a dir e cte d r andom gr aph network mo del with r e cipr o c ation , T ech. report, 2010, [RPF13] , Maximum likeliho o d estimation in the Beta mo del , Annals of Statistics 41 (2013), no. 3, 1085–1110. [RPKL07] Garry Robins, Pip P attison, Y uv al Kalish, and Dean Lusher, A n intr o duction to exp onential r andom gr aph ( p ∗ ) models for so cial networks , Social Netw orks 29 (2007), no. 2, 173–191. [R TT12] Enrique Reyes, Christos T atakis, and Apostolos Thoma, Minimal gener ators of toric ide als of gr aphs , Adv ances in Applied Mathematics 48 (2012), no. 1, 64–78. [SKK14] Aleksandra Slavk o vi´ c, Vishesh Karwa, and Pa v el N. Krivitsky , Diﬀer ential ly private exp onential r andom gr aphs , Priv acy in Statistical Databases, Lecture Notes in Computer Science, vol. 8744, Springer, 2014, pp. 143–155. [SKKT00] Karen Smith, Lauri Kahanp¨ a¨ a, Pekk a Kek¨ al¨ ainen, and William T rav es, An invitation to algebr aic geometry , Univ ersitext, Springer, 2000. [Sla10] Aleksandra Sla vko vi ´ c, Partial information r ele ases for c onﬁdential c ontingency table entries: Pr esent and futur e r ese ar ch eﬀorts , Journal of Priv acy and Conﬁdentialit y 1 (2010), no. 2. [SP12] Isab elle Stan ton and Ali Pinar, Constructing and sampling gr aphs with a pr escrib e d joint de gr e e distribution , ournal of Exp erimental Algorithmics (JEA) 17 (2012). [SR14a] Ka yv an Sadeghi and Alessandro Rinaldo, Hier ar chic al mo dels for indep endenc e structur es of networks , T o app ear in the Journal of the American Statistical Asso ciation, 2014. [SR14b] , Statistic al mo dels for de gr e e distributions of networks , NIPS 2014 Workshop “F rom Graphs to Rich Data”, a v ailable at arXiv:1411.3825, 2014. [Sri] Murali K. Sriniv asan, Some pr oblems motivate d by the notion of thr eshold gr aphs , Online, av ailable at h ttp://www.math.iitb.ac.in/ mks/pap ers/threshold.p df. [SS06] Aleksandra B. Sla vko vi ´ c and Seth Sulliv ant, The sp ac e of c omp atible ful l c onditionals is a unimo dular toric variety. , Journal of Symbolic Computation 41 (2006), no. 2, 196–209. [SSR + 14] Despina Stasi, Kayv an Sadeghi, Alessandro Rinaldo, Sonja Petro vi´ c, and Stephen E. Fienberg, Beta mo dels for r andom hyp er gr aphs with a given de gr e e se quenc e , Proceedings of 21st International Conference on Computational Statistics, 2014. [Stu96] Bernd Sturmfels, Gr¨ obner b ases and c onvex p olytop es , Universit y Lecture Series, no. 8, American Mathe- matical So ciet y , 1996. [SW12] Bernd Sturmfels and V olkmar W elker, Commutative algebr a of statistic al r anking , Journal of Algebra 361 (2012), 264–286. [SZP14] Aleksandra Slavk o vi´ c, Xiaotian Zhu, and Sonja Petro vi ´ c, Fib ers of multi-way c ontingency tables given c onditionals: r elation to mar ginals, c el l b ounds and Markov b ases , Annals of the Institute of Statistical Mathematics (2014). [Vil95] Rafael H. Villarreal, R e es algebr as of e dge ide als , Communications in Algebra 23 (1995), no. 9, 3513–3524. [Vil00] , Monomial algebras , CRC Press, 2000. [VL05] F. Viger and M. Latapy , Eﬃcient and simple gener ation of r andom simple c onne cte d gr aphs with pr escrib e d de gr e e se quenc e , Computing and Combinatorics 3595 (2005), 440–449. [W AD09] W. Willinger, D. Alderson, and J. C. Doyle, Mathematics and the internet: a sour ce of enormous c onfusion and gre at p otential , Notices of the American Mathematical So ciety 56 (2009), no. 2, 586–599. [W at09] Sumio W atanab e, Algebr aic ge ometry and statistic al le arning the ory , Cam bridge Universit y Press, 2009. [W at10] , Asymptotic e quivalenc e of b ayes cr oss validation and widely applic able information criterion in singular learning the ory , Journal of Machine Learning Researc h 11 (2010), 3571–3594. [W es] Douglas W est, Hyp er gr aphic Se quenc es (2012) - (pr esente d by Mike Ferr ar a - REGS 2012) http: // www. math. illinois. edu/ ~ dwest/ regs/ hypergraphic. html . [Win] T obias Windisc h, R apid mixing and Markov b ases , Preprin t av ailable at [W O] Patric k W olfe and Soﬁa C. Olhede, Nonp ar ametric gr aphon estimation , Preprint, av ailable at [YLZ] Ting Y an, Chenlei Leng, and Ji Zhu, Asymptotics in dir e cte d exp onential r andom gr aph mo dels with an incr e asing bi-de gr e e se quenc e , Preprint av ailable at http://arxiv.org/pdf/1408.1156.p df. A SUR VEY OF DISCRETE METHODS IN (ALGEBRAIC) ST A TISTICS FOR NETWORKS 23 [YRF13] Mei Yin, Alessandro Rinaldo, and Sukhada F adnavis, Asymptotic quantization of exp onential random gr aphs , Submitted. Preprint a v ailable at h ttp://arxiv.org/abs/1311.1738, 2013. [ZUR] Piotr Zwiernik, Caroline Uhler, and Donald St. P . Richards, Maximum likeliho o d estimation for line ar Gaussian covarianc e mo dels , Submitted, preprint av ailable at Dep ar tment of Applied Ma thema tics, Illinois Institute of Technology, Chica go, Illinois 60616 E-mail address : sonja.petrovic@iit.edu

A survey of discrete methods in (algebraic) statistics for networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment