Spatial Temporal Exponential-Family Point Process Models for the Evolution of Social Systems
We develop a class of exponential-family point processes based on a latent social space to model the coevolution of social structure and behavior over time. Temporal dynamics are modeled as a discrete Markov process specified through individual trans…
Authors: Joshua D. EmBree, Mark S. H, cock
Spatial T emporal Exponential-Family Point Pr ocess Models for the Evolution of Social Systems Joshua D. EmBree ∗ Mark S. Handcock † October 4, 2016 Abstract W e develop a class of exponential-family point processes based on a latent social space to model the coevolution of social structure and behavior over time. T emporal dynamics are modeled as a discr ete Markov process specified through individual transition distributions for each actor in the system at a given time. W e prove that these distributions have an analytic closed form under certain conditions and use the r esult to develop likelihood-based inference. W e provide a computational framework to enable both simulation and inference in practice. Finally , we demonstrate the value of these models by analyzing alcohol and drug use over time in the context of adolescent friendship networks. Keywords : STEPP , social network analysis, spatial-temporal, point process, longitudinal, latent space, Markov , substance use. 1 Introduction Social systems play a fundamental role in the dynamics of human behavior and interest in studying these systems is growing. For example, Fujimoto and V alente (2012) investigate contagion mechanisms for the transmission of drinking and smoking behaviors through adolescent social networks. However , work of this nature is often limited by a lack of realistic stochastic models for the phenomena of interest. For such models to be applicable, they must adequately represent the complexity of social relations and behavior as they coevolve over time. Most often, social relations are measured with dyadic tie variables, for example friendship, and then assembled to form networks. There are numerous stochastic models for the evolution of social networks. Holland and Leinhardt (1977) pr ovide one of the earliest continuous-time Markov models for the pr ocess by which social structur e affects individual behavior . Arguably , the most popular subclass of these continuous- time Markov models is the so called stochastic actor-oriented model (SOAM) described in Snijders (2005) and Snijders, V an de Bunt, and Steglich (2010) which ar e framed in the context of individual actors making decisions to form or break ties with other actors. Snijders, Steglich, and Schweinberger (2007) extend the SAOMs to jointly model selection (individuals’ network-related choices) and influence (effect of actors on each other ’s attributes). The SAOM’s ar e accessible for practitioners through the RSiena (Ripley , Boitmanis, and Snijders, 2013) software package. In addition to the continuous-time Markov models, exponential-family random graph models (ERGMs) provide a uniquely differ ent view of social networks. Holland and Leinhar dt (1981) intr oduce the first ∗ Adjunct Researcher , RAND Corporation, Santa Monica, CA 90401 (E-mail: jembree@rand.org ). † Professor of Statistics, Department of Statistics, University of California, Los Angeles, CA 90095-1554 (E-mail: handcock@ucla.edu ). exponential-family of probability distributions for directed graphs which are applicable only to cross- sectional social networks. However , Robins and Pattison (2001) naturally extend this framework by al- lowing for dependence between graphs across discr ete time steps. Moreover , Hanneke, Fu, Xing, et al. (2010) define a discrete T emporal ERGM (TERGM) which assumes an exponential-family model for the tran- sitions between graphs. Krivitsky and Handcock (2013) further specify TERGMs with the Separable T emporal ERGM (STERGM) by postulating that the processes by which actors form and dissolve ties are independent or separable conditional on the previous state of the network. V arious other discrete-time models for social network dynamics provide means for data driven analyses of social systems but drawbacks persist. Observed social networks are typically repr esented by directed (or undirected) graphs where edges in- dicate the presence of a r elationship, e.g., friendship. As a result, complex r elations are r educed to a binary indicator . Advances in latent space models for rank data (Gormley and Murphy, 2007) provide new context for conceptualizing this information. Hoff, Raftery , and Handcock (2002) summarize general latent space approaches to social network analysis while Handcock, Raftery , and T antrum (2007) describe an unob- served Euclidean social space where the actors’ locations arise stochastically from a mixture of distributions corresponding to different clusters. These strategies are appealing for their flexibility and interpretability but have only been developed for cross-sectional networks. Since latent space approaches to social network analysis postulate the existence of an unobserved space where points repr esent actors, a natural extension would be to propose a spatial-temporal point process for the underlying dynamics. A major drawback in the curr ent models for social network evolution is the assumption that the set of actors remains fixed over time. In real social systems, e.g., an urban high school, the set of actors is constantly changing so this assumption can be pr oblematic. Spatial birth-death processes (Moller and W aagepetersen, 2003) offer a stochastic framework for the positions of actors as they enter or exit the system over time. Unfortunately , these process cannot model changes in persistent (present at several consecutive time points) actors’ positions. Hence, we seek a stochastic model that can reasonably describe the positions of actors as they enter , navigate, and exit the social space. In Section 2, we formally define the social space and derive a discrete-time Markov process to describe fundamental social phenomena. In Section 3, we present analytic results, develop likelihood-based inferen- tial methods, and discuss computation. In Section 4, we apply the methodology to a longitudinal study of adolescent students to explore changes in risky behavior in the context of friendship networks. In Section 5, we discuss the relevance of this work in broader social science resear ch and consider extensions to the modeling framework. 2 Point Process Models for Social Systems 2.1 Conceptualization T o conceptualize the methods presented here, consider the population of people in a fixed location over time, e.g., students at an urban middle school. W e want to understand the social and behavior dynamics of these people over time. For example, we might ask how a student’s social ties affect her propensity to drink alcohol or engage in risky sex. T o do so, we need a rich representation of the time sensitive social landscape. Note that this appr oach is distinctly dif fer ent from a traditional panel survey wher e we attempt to follow a fixed cohort over time. Instead, we focus on the interactions of a dynamic population in a fixed location where we may observe significant composition change within the group between waves of data collection. That is, we do not expect to observe the same set of people at every wave. Generally , consider a set of actors in social space at time t . In the example above, the actors are students and the social space is the school where they interact. Also consider the positions of actors in social space at time t . While we formalize this below , the intuition is straightforward: the set of distances between positions in social space r epresent social relations. For example, two people who have been friends for years tend to be very close to one another in the space whereas casual acquaintances tend to be considerably further apart. The major advantage of this conceptualization is flexibility . Complex and nuanced relationships can be 1 accurately r epresented by a distance metric. Conventionally , we study social networks where r elationships are binary , e.g., 1 indicates a friendship nomination and 0 the absence of such a nomination. 2.2 Specification For t = 0 , 1 , . . . , let N t = { 1 , . . . , n t } be the set of unique actor labels up to time t with N 0 ⊆ N 1 ⊆ · · · and let S t ⊆ N t denote the set of actors present at time t . Further , let ( S , k·k ) be a normed space where Z t = { Z t i ∈ S : i ∈ S t } is the set of actor locations at time t and X t is an n t × q matrix of actor covariates. W e say that { S t , X t , Z t } t ≥ 0 defines a social space . Next, suppose that S t (and implicitly N t ), X t , and Z t are random variables that jointly form a stochastic process. If { S t , X t , Z t } t ≥ 0 satisfies the Markov property in time and the transition probability P ( S t , X t , Z t | S t − 1 , X t − 1 , Z t − 1 ) is an exponential family , then we call { S t , X t , Z t } t ≥ 0 a Spatial T emporal Exponential-Family Point Process (STEPP). Next, we construct a fundamental class of STEPPs by making a few assumptions about the social space and deriving transition distributions. Assumption 1: { S t } t ≥ 0 is a process exogenous to ( X t , Z t ) . Recall, S t is the set of actors who are cur- rently in the system at time t , e.g., students in a classr oom. While one can imagine many scenarios in which the actors who enter or exit the social space is endogenous, e.g., delinquent students are more likely to be expelled, we focus here on the exogenous cases. Assumption 2: Actor positions in social space, { Z t i : i ∈ S t } , are conditionally independent given the previous positions, Z t − 1 . This assumes that actors move through the social space based on the information available at the current time. Assumption 1 makes modeling the composition change of actor sets between waves distinctly separate from the changes in actor positions and their corresponding covariates. W e refer to { S t } t ≥ 0 as a migration process where the actors who enter the system are immigrants and the actors who exit the system are emi- grants . Assumption 2 implies that we can marginalize the transition distributions at the actor-level. Thus, we derive a general class of STEPPs below by specifying the form of P ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) where it is implicit that i ∈ S t . W e refer to this as the ego transition distribution (ETD) and it is specified by a series of increasingly complex processes. These processes are basic drift, atomic drift, homophilous attraction, homophilous repulsion, heter ophilous attraction, and heterophilous r epulsion. A basic drift process describes actor positions only and is determined by a single parameter δ 0 ≥ 0 . The ETD is given by P δ 0 ( Z t i | Z t − 1 , S t − 1 ) = exp − δ 0 Z t i − Z t − 1 i c ( δ 0 ) (1) where c ( δ 0 ) = Z S exp − δ 0 z − Z t − 1 i · µ ( dz ) is the normalizing constant. Note that given the space ( S , k·k ) , the underlying measure µ must be chosen to ensur e c ( δ 0 ) < ∞ . A basic drift process is the simplest stochastic model for actor mobility in social space. Along these lines, we also have behavior persistence . For m = 1 , . . . , q and for every i ∈ S t − 1 ∩ S t , let ρ m = P ( X t im = x | X t − 1 im = x ) (2) denote the probability that behavior m persists through a single transition. Note that this alone does not completely specify a probability distribution except in the case of a Bernoulli random variable. Also, note that in the case where a covariate is str ucturally non-random, we can set ρ m = 1 . T o derive more complex processes, we need to formalize the notion of closeness in social space. For any z ∈ S and k ∈ N , consider a set E ⊂ S with | E | < ∞ and z / ∈ E , where | · | denotes the counting measur e. Let I 1 = arg min z 0 ∈ E k z − z 0 k . 2 For j = 2 , . . . , k , let J j − 1 = E \ S j − 1 l =1 I l where I j = arg min z 0 ∈ J j − 1 k z − z 0 k . Then we say that B k ( z , E ) = k [ j =1 I j (3) defines a neighbor set for z where E is the defining expression. Next, let w : S × S → [0 , 1] be a weighting function for two positions in social space. If w satisfies (i) w ( z , z ) = 1 ; (ii) there exists a z 0 6 = z such that w ( z , z 0 ) = 1 ; (iii) w ( z , z 0 ) → 0 as k z − z 0 k → ∞ ; (iv) w ( z , z 0 ) → 0 as k z − z 0 k → 0 . then we say that it is an atomic weighting . For motivation of this definition, see Section 2.3 below . Similar to a basic drift process, an atomic drift process describes actor positions only and is determined by a single parameter δ 1 ≥ 0 . However , the ETD is considerably more complex. For an atomic weighting w , atomic drift is defined by P δ 1 ( Z t i | Z t − 1 , S t − 1 ) = exp − δ 1 P j ∈ S t − 1 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , Z t − 1 − i )) w ( Z t − 1 i , Z t − 1 j ) Z t i − Z t − 1 j c ( δ 1 ) (4) where Z t − 1 − i = Z t − 1 \ { Z t − 1 i } and 1 ( · ) is the indicator function. As specified above, B k ( Z t − 1 i , Z t − 1 − i ) is, with some exceptions, the set of k nearest neighbors of ego i at time t − 1 . In the event that | S t − 1 | ≤ k , this neighbor set will have fewer than k members and in the event that multiple actors occupy the exact same position at t − 1 , it could have more than k members. Nonetheless, we refer to this as the set of k nearest neighbors for ego i at time t − 1 . Finally , we combine basic drift and atomic drift to define the general drift process which we denote P δ ( Z t i | Z t − 1 , S t − 1 ) = P δ 0 ( Z t i | Z t − 1 , S t − 1 ) P δ 1 ( Z t i | Z t − 1 , S t − 1 ) Next, we introduce homophilous and heterophilous attraction processes. For a discrete covariate X t m and ego i , let A t im = { Z t − 1 l ∈ Z t − 1 − i : l ∈ S t − 1 , X t im = X t − 1 lm } and U t im = { Z t − 1 l ∈ Z t − 1 − i : l ∈ S t − 1 , X t im 6 = X t − 1 lm } . Note that natural extensions exist for continuous covariates but we do not explicitly define them here. For the sake of this construction, assume that all covariates are discrete. Given a set of parameters α 1 , . . . , α q ≥ 0 and an atomic weighting w , which we write w t − 1 ij = w ( Z t − 1 i , Z t − 1 j ) for simplicity , the ETD of a homophilous attraction process on the m th covariate is P α m ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) = exp − α m P j ∈ S t − 1 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t im )) w t − 1 ij Z t i − Z t − 1 j c ( α m ) . (5) 3 The ETD for homophilous attraction on all covariates is defined by P α ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) = q Y m =1 P α m ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) (6) ∝ exp − q X m =1 X j ∈ S t − 1 α m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t im )) w t − 1 ij Z t i − Z t − 1 j . (7) Here, we omit the normalizing constant in the definition and use the proportional symbol, ∝ . Given a set of parameters υ 1 , . . . , υ q ≥ 0 and an atomic weighting w , the ETD of a heterophilous attraction process on the m th covariate is P υ m ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) = exp − υ m P j ∈ S t − 1 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t im )) w t − 1 ij Z t i − Z t − 1 j c ( υ m ) , (8) which is similar to homophilous attraction but with the neighbor set U t im . Naturally , the ETD for het- erophilous attraction on all covariates is defined by P υ ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) = q Y m =1 P υ m ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) (9) ∝ exp − q X m =1 X j ∈ S t − 1 υ m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t im )) w t − 1 ij Z t i − Z t − 1 j . (10) Last, we introduce homophilous and heterophilous repulsion. If ( S , k·k ) is a linear space, we can alter the ETD for attraction to obtain an opposing effect which we refer to as repulsion. Given the determining parameters ˜ α 1 , . . . , ˜ α q ≥ 0 , the ETD of a homophilous r epulsion process is given by P ˜ α ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) (11) ∝ exp − q X m =1 X j ∈ S t − 1 ˜ α m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t im )) w t − 1 ij Z t i − (2 Z t − 1 i − Z t − 1 j ) . Note that r epulsion-like distributions are possible in non-linear spaces but are not addressed here. Ho- mophilous repulsion is structurally very similar to homophilous attraction except we replace Z t i − Z t − 1 j with Z t i − (2 Z t − 1 i − Z t − 1 j ) in the ETD. In a linear space, this has the effect of reflecting the point Z t − 1 j through Z t − 1 i and considering the attraction toward the r eflected point which can be viewed as a repulsion away from the original point Z t − 1 j . Similarly , for parameters ˜ υ 1 , . . . , ˜ υ q , the ETD of a heterophilous repulsion process is given by P ˜ υ ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) (12) ∝ exp − q X m =1 X j ∈ S t − 1 ˜ υ m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t im )) w t − 1 ij Z t i − (2 Z t − 1 i − Z t − 1 j ) . The complete specification for this class of STEPPs is a combination of the pr ocesses derived above and an exponential-family model for P λ ( S t | S t − 1 ) where λ is a parameter vector that determines the dis- tribution. Recall that we assume an exogenous migration process which may take many forms, e.g., the number of emigrants follows a binomial distribution and the number of immigrants a Poisson distribution. T o preserve generality , we do not further specify this distribution. For homophily (heterophily), either at- traction or repulsion can be used but not both simultaneously . Assuming homophilous and heterophilous attraction, we let θ = ( δ 0 , δ 1 , ρ 1 , . . . , ρ q , α 1 , . . . , α q , υ 1 , . . . , υ q , λ > ) > 4 denote the complete parameter vector for this class of STEPPs. The complete transition pr obability is given by P θ ( S t , X t , Z t | S t − 1 , X t − 1 , Z t − 1 ) = P λ ( S t | S t − 1 ) Y i ∈ S t P δ ( Z t i | Z t − 1 , S t − 1 ) (13) × exp q X m =1 1 ( X t im = X t − 1 im ) log ρ m + 1 ( X t im 6 = X t − 1 im ) log(1 − ρ m ) ! (14) × P α ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) P υ ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) , (15) which is an exponential family . Although many other specifications exist for STEPPs, when we write ( S t , X t , Z t ) ∼ STEPP ( θ ) it is in refer ence to this particular class. 2.3 Description In this section, we further describe and interpr et the class of STEPPs constructed above. The previous sec- tion provides a formal specification. This section expands the intuition and motivation for each individual process as well as a complete view of the entir e model class. The drift processes should be regar ded as foundational elements for this class of STEPPs. Basic drift is governed by δ 0 , a parameter that dictates the magnitude of actors’ movements between transitions, and has the simplest ETD. The probability mass in the ETD is symmetric about the ego’s previous position and the rate of decay is proportional to δ 0 . That is, larger values of δ 0 place more mass near the previous position than would smaller values. The mode of the ETD is always the ego’s previous position so basic drift reinfor ces the notion that actors tend to navigate the social space with r espect to their current position rather than jump around sporadically . A STEPP with basic drift alone results in actors generally drifting around the space making pr edictable, symmetric movements between transitions. The ETD of atomic drift is considerably more complex than that of basic drift, but this is essential for ensuring that a specification resembles actual social processes. In essence, the atomic drift process allows other actors to impact the movement of the ego through a transition with the caveat that only a fixed number of them may have an actual effect and their distance relative to the ego lar gely determines the magnitude of said ef fect. W e use neighbor sets to fix the number of actors in the social space who may have an effect on the ego because it’s impractical to assume that the ego is affected by every other actor at a given time. For example, if the social space is a large corporate office with thousands of employees, any one person cannot possibly know everyone else let alone be significantly influenced by them socially . It is more likely that an employee is awar e of a few hundr ed others and noticeably influenced by one or two dozen of them. Thus, we only sum over the k nearest neighbors in the atomic drift ETD. Focusing on the effect of a single neighbor j on the ego i , the functional form would be exp − δ 1 w ( Z t − 1 i , Z t − 1 j ) Z t i − Z t − 1 j . This is strikingly similar to the ETD for basic drift with the inclusion of a weight. This is where using atomic weights is crucial. Newton’s law of universal gravitation tells us that any two bodies will attract one another with a force that is inversely proportional to the square of the distance between them. In particle physics, this force is considered negligible due to the fact that individual atomic masses are extremely small in comparison to surrounding bodies, e.g., the Earth. However , there is a repulsive electromagnetic force between two atoms when the distance between them is small. This force exists due to the negative char ge of the electrons associated with each atom. One can imagine a universe where there are no large bodies to dwarf the mass of individual atoms so these forces can coexist. The observable result would be a weak attractive force between atoms that incr eases as the distance between them decreases. Once the distance becomes sufficiently small, there is a weak repulsive force that increases as the distance between the atoms decreases. Thus, a natural balance arises. 5 Schopenhauer (1974) cleverly describes this as the porcupine dilemma : “a number of porcupines huddled together for warmth on a cold day in winter; but, as they began to prick one another with their quills, they were obliged to disperse. However the cold dr ove them together again, when just the same thing happened. At last, after many turns of huddling and dispersing, they discovered that they would be best off by r emaining at a little distance fr om one another . In the same way the need of society drives the human porcupines together , only to be mutually repelled by the many prickly and disagreeable qualities of their nature.” As such, we incorporate atomic weights in the ETD for an atomic drift pr ocess to provide general attrac- tion between actors while providing stability in the social space over time. In the complete ETD for atomic drift, we combine the effects of each properly weighted neighbor and scale the overall effect by δ 1 . Intu- itively , the nearest neighbors have the largest effects and the furthest neighbors have the smallest effects except in cases when near neighbors are too close to the ego. Recall that we only require atomic weights to approach zero in the r espective limits so the specific functional form may dramatically impact the dynamics of a social space. Homophilous and heterophilous attraction are similar to atomic drift but the primary dif ference is in the specification of neighbor sets. In homophilous (heterophilous) attraction, the set A t im ( U t im ) is constructed based on the random state of X t im which provides a crucial dependence between the ego’s social position and behavior . Given the random state of X t im , we consider the set of nearest homophilous (heterophilous) neighbors based on the behavior of those neighbors at time t − 1 in order to compute the ETD. That is, the ego does not speculate about the future behavior of others. Homophilous (heterophilous) repulsion is similar to attraction since we use the same neighbor set A t im ( U t im ) in the ETD but the position adjustment is fundamentally different. Recall that for repulsion, we re- place Z t i − Z t − 1 j with Z t i − (2 Z t − 1 i − Z t − 1 j ) . In the ETD, the term Z t i − Z t − 1 j places some mass of the distribution centered around the position of actor j at time t − 1 . It follows that the term Z t i − (2 Z t − 1 i − Z t − 1 j ) places the same mass centered around the position 2 Z t − 1 i − Z t − 1 j . In a linear space, this point is equivalent to the r eflection of Z t − 1 j through Z t − 1 i . In this form, it is clear that r epulsion is actually an opposing attraction. In this model class, each process is straightforward and motivated by basic social forces. As a result, it may be difficult to grasp the gravity of a complete specification. Since we cannot include attraction and repulsion on the same covariate, consider a STEPP with (basic and atomic) drift, homophilous attraction on each covariate and heterophilous repulsion on each covariate. This complete process is extremely complex in its raw functional form but at the core, the ETD has a summation over dif ferent effects fr om neighboring actors to the ego. Each effect is slightly differ ent depending on time-sensitive information (relative distances between actors and behavior) and global properties determined by each parameter . By construction, each parameter is non-negative so we can focus on their relative dif ferences for interpretation. For example, the largest of α 1 , . . . , α q indicates the covariate which exhibits the strongest attraction between similar actors. Alternatively , one of the ˜ υ 1 , . . . , ˜ υ q being very small or 0 indicates a covariate which exhibits little to no repulsion between dissimilar actors. It is crucial to note that these parameters determine global properties of the social space as opposed to time dependent or individual properties which are the focus of future work. Simulated examples of various STEPPs are available at http://tinyurl.com/STEPPMODELS . 3 Statistical Inference 3.1 Analysis for a general Euclidean social space By slightly restricting the general STEPP model of Section 2, we can derive closed form ETDs and inferential methods. In this section, we show that the ETD for Z t i conditional on X t i for any subset of the processes described above is multivariate normal if S = R d and the norm is Euclidean distance squared, i.e., k z k = P d i =1 z 2 i . Based on this result, we derive the marginal ETD for X t i and provide a closed form distribution for this class of STEPP models. First, assume that S = R d and for z ∈ S , k z k = P d i =1 z 2 i . Using a general Euclidean space is somewhat 6 restrictive in a mathematical sense but practically , it pr ovides a flexible, comprehensible foundation for the social space. From this point on, when we say ”distance” it is in reference to standard Euclidean distance whereas the norm is specified above. T o motivate using the square of Euclidean distance for a norm, we appeal to physics and the inverse square law which generally states Intensity ∝ 1 distance 2 . In practice, we use the atomic weighting function w ( z 1 , z 2 ) = 1 if z 1 = z 2 k z 1 − z 2 k if k z 1 − z 2 k < c and z 1 6 = z 2 k z 1 − z 2 k − 1 if k z 1 − z 2 k ≥ c, where 0 < c ≤ 1 is some threshold. Then when the distance between actors exceeds √ c , the effect on the ETD is inversely proportional to said distance squared. For shorter distances, we cannot apply the same relationship because it leads to instability as previously discussed. Since many physical phenomena, e.g., Newton’s law of universal gravitation, follow an inverse square law , it provides a natural foundation for a Euclidean social space. It is important to note that we need not specify an atomic weighting for the r esults in the section to hold, but it is necessary to properly motivate this specification. Next, we will prove that the ETD has an analytic closed form thr ough a series of lemmas leading up to the final theorem. First, we adopt some notation. For functions h, g : R d → R , if h ( z ) = g ( z ) + c 0 where c 0 is a constant, we say h ( z ) g ( z ) . Lemma 1: For w 1 , . . . , w n ≥ 0 and µ 1 , . . . , µ n ∈ R d , n X j =1 w j k z − µ j k w ∗ k z − µ ∗ /w ∗ k where w ∗ = n X j =1 w j and µ ∗ = n X j =1 w j µ j . Proof. See the Supplementary Materials. Lemma 2: Let Z ∈ R d be a random vector with µ 1 , . . . , µ n ∈ R d , and w 1 , . . . , w n ≥ 0 wher e w ∗ = P n i =1 w i > 0 and µ ∗ = P n i =1 w i µ i If P ( Z = z ) ∝ exp {− P n i =1 w i k z − µ i k} , then Z ∼ MV N µ ∗ w ∗ , 1 2 w ∗ I d . Proof. See the Supplementary Materials. Theorem: For each i ∈ S t , [ Z t i | X t i , Z t − 1 , X t − 1 , S t − 1 ] ∼ MV N ( µ t i , Σ t i ) where µ t i = P j θ > H t ij w t ij Z t − 1 j P j θ > H t ij w t ij Σ t i = 1 2 P j θ > H t ij w t ij ! I d and an explicit expression for the elements of H t ij is given in the Supplementary Materials. Proof. See the Supplementary Materials. 7 Since we assume that the covariates are discrete, it is straightforwar d to calculate the mar ginal distribu- tion of [ X t i | Z t − 1 , X t − 1 ] based on the theorem. W e know P θ ( Z t i , X t i | Z t − 1 , X t − 1 ) up to a normalizing constant and P θ ( Z t i , | X t i , Z t − 1 , X t − 1 ) completely so it is possible to integrate out Z t i for each value of X t i . Since the marginal distribution of Z t i is multivariate normal, this integral is a function of the variance Σ t i and the value of X t i . Then we know P ( X t i | Z t − 1 , X t − 1 ) up to a normalizing constant for every value of X t i and can renormalize these values to obtain the complete distribution. Thus, the complete ETD can be written in closed form as P θ ( Z t i , X t i | Z t − 1 , X t − 1 ) = P θ ( Z t i | X t i , Z t − 1 , X t − 1 ) P θ ( X t i | Z t − 1 , X t − 1 ) . Recall that the migration process { S t } t ≥ 0 is an exogenous exponential family so we have the necessary components for a complete, closed form likelihood. 3.2 Likelihood-Based Inference In this section, we use the analytic results from the previous section to develop a likelihood-based infer- ential framework. At this juncture, it is also possible to develop a full Bayesian inferential framework, however here we focus on a Frequentist inference framework. This pr ovides straightforward calculations of parameter estimates and standard errors that might otherwise be computational complex and conceptu- ally challenging to interpret. The natural extension to a full Bayesian framework will appear elsewhere. Suppose that ( S t , X t , Z t ) ∼ STEPP ( θ ) for t = 0 , . . . , τ . That is, this is one STEPP with τ transitions. For brevity , we suppress the superscripts and simply write ( S, X , Z ) to denote the complete data over all time steps. Then the likelihood is given by L ( θ | S, X, Z ) = τ Y t =1 P θ ( S t , X t , Z t | S t − 1 , X t − 1 , Z t − 1 ) = τ Y t =1 Y i ∈ S t P θ ( Z t i , X t i | Z t − 1 , X t − 1 , S t ) ! P θ ( S t | S t − 1 ) = τ Y t =1 Y i ∈ S t P θ ( Z t i | X t i , Z t − 1 , X t − 1 , S t ) P θ ( X t i | Z t − 1 , X t − 1 , S t ) ! P θ ( S t | S t − 1 ) . It is implicit in this formulation that the initial state ( S 0 , X 0 , Z 0 ) is fixed and not random. It is natural to extend this model class to allow for a random initial state but it is not explored here. However , we must note that the parameters in this class of STEPPs determine transitions between states rather than isolated states so a model for ( S 0 , X 0 , Z 0 ) may be difficult to align conceptually . As shown above, the likelihood function has a computationally closed form so we can use standard op- timization r outines to obtain parameter estimates and standard errors. However , calculating the likelihood can be cumbersome due to the inherent complexity of each ETD. In the next section, we address these issues and provide a general computational framework for performing likelihood-based infer ence. 3.3 Computation In this section, we describe the computational challenges of implementing likelihood-based inference for STEPP data. The likelihood function provided above is straightforward to calculate but doing so may be computationally expensive. Since the migration process { S t } t ≥ 0 is exogenous, we focus on elements of the likelihood that involve actor positions and covariates. Explicitly , we need to calculate τ Y t =1 Y i ∈ S t P θ ( Z t i | X t i , Z t − 1 , X t − 1 , S t ) P θ ( X t i | Z t − 1 , X t − 1 , S t ) . As shown previously , [ Z t i | X t i , Z t − 1 , X t − 1 ] follows a multivariate normal distribution so calculating P θ ( Z t i | X t i , Z t − 1 , X t − 1 , S t ) 8 given parameters µ t i and Σ t i is extremely fast. However , calculating these parameters can be computation- ally demanding. For each time period t and ego i ∈ S t , we must compute multiple pairwise distances, weights, neighbor sets and sum over every element. Above all, computing neighbor sets is the most de- manding. Given q covariates, a full specification requires computing up to 2 q + 1 neighbor sets for each ego. While we have shown that one can calculate the distribution of [ X t i | Z t − 1 , X t − 1 ] for an arbitrary dis- crete covariate, we focus on the case where the support of each is finite. Recall that X t i is a random vector with q components and for each of them, we calculate every value in the probability mass function. Each calculation has a closed from but crucially depends on the variance Σ t i which is computationally demand- ing. Thus, we must calculate Σ t i conditional on each element in the full support of the vector X t i to obtain P θ ( X t i | Z t − 1 , X t − 1 , S t ) as requir ed. In practice, we maximize the log likelihood function ` ( θ ) = τ X t =1 X i ∈ S t log P θ ( Z t i | X t i , Z t − 1 , X t − 1 , S t ) + log P θ ( X t i | Z t − 1 , X t − 1 , S t ) since it is slightly more stable numerically . Explicitly , ˆ θ = arg max θ ∈ Θ ` ( θ ) is the maximum likelihood estimator . 3.4 Analysis In this sub-section we consider the properties of the maximum likelihood estimator . The asymptotic prop- erties of the MLE will depend on the framework the inference is embedded in. If we could observe a sequence of independent and identically distributed (IID) STEPPs, standard large sample theory would imply that ˆ θ is consistent and asymptotically efficient (Casella and Ber ger, 2002). Moreover , one can derive a standard Central Limit Theorem for this situation. However , in r eality , we rarely can observe a sequence of IID STEPPs since social systems ar e constantly evolving, and we do not detail analytic results pertaining to this case. 9 MLE Performance for Simulated STEPPs Deviation from true parameter -1 0 1 Basic Drift- δ 0 Atmoic Drift- δ 1 Homophilous Attraction- α 1 Heterophilous Repulsion- υ 1 Behavior Persistence- ρ 1 Figure 1: Maximum likelihood estimates of parameters over 100 simulated STEPPs with a kernel density of each sample overlayed and sample means marked with dashed lines. Here we explore the properties for the most common situation where a single STEPP is observed through a moderate number of time points. Specifically , we generated 100 independent STEPPs each with one binary covariate and 50 actors across five time periods and with no migration. Further , we used ho- mophilous attraction and heterophilous repulsion on the single covariate. The parameter values were set to moderate effects, as specified in the first row of T able 1. For each generated STEPP , we computed the MLE estimator ˆ θ and display estimates of them in Figur e 1. Note that the true parameter value is subtracted from each estimate for comparison, i.e., zero corresponds to the true value. Additionally , we mark the sample mean of each sample with a vertical bar and overlay a kernel density estimate for each set of estimates. W e see that the distribution are center ed around the true values and the shapes are appr oximately Gaussian. As we have an explicit and computable expression for the log-likelihood, we can employ it to summarize the inference. W e can compute standard errors from the numerical Hessian of ` ( θ ) . In T able 1 we summarize the standard errors estimates and compar e them to their true values. These simulation results support the δ 0 δ 1 α 1 ˜ υ 1 ρ 1 true parameter 0.50 0.50 1.00 0.75 0.80 mean of MLE estimate 0.52 0.52 0.99 0.75 0.80 std. dev . of MLE estimates 0.29 0.24 0.12 0.11 0.03 mean of SE estimates 0.25 0.19 0.14 0.10 0.03 std. dev . of SE estimates 0.05 0.02 0.01 0.01 < 0 . 01 T able 1: Assessing the standard error estimates. W e simulate 100 STEPPs with 50 actors and 5 time transi- tions then compute the MLE and standard error for each. This table shows the sample mean and sample standard deviation of the MLE and standard error estimate of each parameter . In all cases, both the MLE and standard err or estimates are close to their true values. use of likelihood-based inference and suggest that the MLE and standard errors will be credible. In most situations where this estimator will be used the amount of information will be at the level of this simulation 10 or higher . The explicit form of the log-likelihood, ` ( θ ) , also make it possible to assess the goodness-of-fit of a model via an analysis of deviance. Specifically , we can compute the change in log-likelihood from the null model ( θ = 0) to the MLE. Similarly , we can the graphical goodness-of-fit ideas to assess the overall fit of the model (Hunter , Goodreau, and Handcock, 2008a). T o do so we can use temporal-structural network summary statistics (e.g., persistence of ties, counts of nodal types) to compare observed behavior with those produced by the STEPP model. 4 Application to Adolescent Risk Behavior in Networks In this section we apply to a longitudinal network of friendships within a school. The primary question of interest is the coevolution of risky behavior and friendship ties. Mor e specifically , the interaction between social forces and substance use in adolescents has drawn significant attention in recent resear ch. In par- ticular , researchers are interested in quantifying the effect of peers on individual behavior as well as the effect an individual may have on their peers. Brechwald and Prinstein (2011) summarize recent advances in the study of peer influence and explore the range of behaviors for which peer influence occurs. Poulin, Kiesner , Pedersen, and Dishion (2011) provide a longitudinal analysis of friendship selection on adoles- cent cigarette, alcohol, and marijuana use. Their analysis depends on a cross-lag panel model that tests for the recipr ocated association between substance use and the number of new friends who use the same substance. While this work provides unique insight into peer influence on adolescent substance use, the lack of sophisticated models for complex social processes is pr oblematic. The SAOMs discussed in section 1 have been applied by De La Haye, Green, Kennedy , Pollard, and T ucker (2013) to explicitly model selection and influence in adolescent social networks with respect to mar- ijuana use at two schools in the Add Health study . Their results indicate that having friends who have used marijuana in their lifetime is not a significant pr edictor of individual initiation while recent (within the last six months) use is a significant pr edictor of individual initiation at one of the schools. T ucker , De La Haye, Kennedy , Green, and Pollard (2014) use SAOMs to model selection and influence effects of marijuana more extensively in the Add Health study . They found that in one school, influence occurred in recipr ocated re- lationships which are hypothesized to be characterized by closeness and trust. However , in another school it was found that adopting friends’ drug use behavior appears to be a strategy for attaining social status. SAOMs provide researchers with a sophisticated model for beginning to disentangle social forces and be- havior , but it requir es strong assumptions. Specifically , the set of actors is assumed to remain fixed over time and the coevolution of social structure and behavior is based on friendship tie variables which may unstable. 4.1 CARBIN Study The Contextualizing Adolescent Risk Behavior In Networks (CARBIN) study was designed to investigate changes in risky behavior , e.g., drug use, amongst middle and high school students in the context of di- rected friendship networks. While several waves of data wer e collected from a few urban schools in Peoria, IL, we use four waves of data collected from one school for this illustration. Each student student filled out an extensive survey that asked them about their personal behavior and to nominate their friends within the school. W e utilize these friendship nominations to construct social networks and focus on two substance use variables: any alcohol and any marijuana use in the last 30 days. T able 2 summarizes the raw data of interest for this exercise. Since there are only three time transitions and the amount of composition change is relatively modest, we do not explore models for the migration process. Instead, we focus on the latent social space and the processes that govern it. In the next section, we fit latent positions to the observed networks and estimate STEPP parameters. 11 wave 1 wave 2 wave 3 wave 4 respondents 150 168 173 168 immigrants 0 30 5 11 emigrants 0 7 19 72 alcohol users 52 53 60 36 marijuana users 9 13 13 10 friendship ties 754 842 707 317 T able 2: Summary counts for the CARBIN study (by wave) 4.2 Implementation and Estimation T o estimate STEPP parameters, we first need to infer actor positions in the latent social space from observed networks. Krivitsky and Handcock (2008) provide a compelling framework for fitting latent position cluster models to cross-sectional social networks. Based on this work, we use the latentnet package (Krivitsky and Handcock, 2014a) to fit latent positions for each wave of data. The standard procedur e for fitting such positions uses the minimum Kullack-Leibler (MKL) divergence fr om a prior distribution which is invariant under rotation and scaling. Since we are concerned with the transitions between time periods, each set of cross-sectional positions should r easonably align with the others. The data exhibit strong non-planar patterns in space so we fit positions in three dimension ( d = 3) . First, we fit an aggregate model to a single combination of all four networks to obtain a reference point for each individual cross-section. It is important to note that we only use the observed friendship ties and do not adjust for any covariate information in this pr ocess. Next, we fit latent positions to each cr oss- sectional network using this set of points from the aggregate model as refer ences. That is, the refer ence points pr ovide starting values for the optimization process. Finally , we use Procrustes analysis to minimize any residual variance from r otation or scaling between waves. The final sequence of points is referred to in the standard STEPP notation. Thus, we have actor positions Z 0 , Z 1 , Z 2 , Z 3 in three dimensions, covariate matrices X 0 , X 1 , X 2 , X 3 where the first column is a binary indicator of individual alcohol use and the second column is a binary indicator of individual marijuana use, and persistence sets S 0 , S 1 , S 2 , S 3 that indicate which actors are pr esent at each point in time. Using the inferential framework presented in Section 3, we estimate a STEPP model for this data. Con- sider a model with a drift process (basic and atomic), homophilous attraction on both variables, and het- erophilous repulsion on both variables. W e use homophilous attraction because it is reasonable to assume that students who use alcohol or marijuana are likely to attract other students who use and vice versa. Similarly , we use heterophilous repulsion because it is more likely that students who do not share similar substance use behavior are mor e likely to be repelled by one another than attracted. W ith a fully specified model, we can state a null hypothesis r egarding the parameter values and use the likelihood to compute standard err ors and p − values. W e consider the null hypothesis, δ 0 = 1 δ 1 = α 1 = α 2 = ˜ υ 1 = ˜ υ 2 = 0 ρ 1 = ρ 2 = 0 . 5 . That is, we assume that the only process at work is basic drift and the behavior persistence terms are equivalent to a fair coin flip. Note that setting δ 0 = 1 in the null is somewhat arbitrary but the other parameters are of primary interest. T o obtain standard errors, we use a standard estimate of the Hessian of the likelihood function and then base nominal p − values on the nominal limiting normal distribution discussed in Section 3. In the next section, we report results and pr ovide a brief interpretation. 4.3 Results The results from the estimation are summarized in T able 3. W e observe that all of the processes except heterophilous r epulsion on marijuana are significant at the 10% level. 12 Parameter Estimate Std. Error p-value Basic drift - δ 0 0.0817 (0.018) < 0 . 0001 ∗∗∗ Atomic drift - δ 1 0.0707 (0.041) 0 . 0873 . Alcohol attraction - α 1 0.0766 (0.027) 0 . 0053 ∗∗ Marijuana attraction - α 2 0.0984 (0.042) 0 . 0181 ∗ Alcohol repulsion - ˜ υ 1 0.1084 (0.018) < 0 . 0001 ∗∗∗ Marijuana repulsion - ˜ υ 2 0.0000 (0.000) 0 . 9980 Alcohol persistence - ρ 1 0.7077 (0.024) < 0 . 0001 ∗∗∗ Alcohol persistence - ρ 2 0.9248 (0.019) < 0 . 0001 ∗∗∗ T able 3: Summary of STEPP model estimates for the CARBIN data. T o compare the ef fects of each process on the social system, we rescale the estimated parameters to produce a relative interpretation. Recall that the parameters can be viewed as pseudo-scaling factors in cal- culating the mean and variance of a normal distribution. That is, scaling all of the parameters would have a small or negligible effect on the mean and an inversely proportional ef fect on the variance. Intuitively , the mean of each ETD provides information regarding where actors are likely to move and the variance pro- vides information regar ding the range of those potential movements. Hence, we report δ ∗ 0 , δ ∗ 1 , α ∗ 1 , α ∗ 2 , ˜ υ ∗ 1 , ˜ υ ∗ 2 and τ = δ 0 + δ 1 + α 1 + α 2 + ˜ υ 1 + ˜ υ 2 where δ ∗ 0 = δ 0 /τ , δ ∗ 1 = δ 1 /τ , α ∗ 1 = α 1 /τ , α ∗ 2 = α 2 /τ , ˜ υ ∗ 1 = ˜ υ 1 /τ , ˜ υ ∗ 2 = ˜ υ 2 /τ . These rescaled parameters reflect the proportion of actors’ movements through the social space, e.g., δ ∗ 0 = 0 . 1 would imply that 10% of social movement is attributable to basic drift. The r escaled parameters in T able 3 tell a more compelling story about this social space. W e observe that 18.7% of movement is basic drift, Basic Atomic Alcohol Marijuana Alcohol Marijuana Sum δ ∗ 0 δ ∗ 1 α ∗ 1 α ∗ 2 ˜ υ ∗ 1 ˜ υ ∗ 2 0.187 0.162 0.176 0.226 0.249 0.000 1.000 V ariation Alcohol Marijuana τ ρ 1 ρ 2 0.436 0.708 0.925 T able 4: Rescaled STEPP parameters 16.2% is atomic drift, 17.6% is homophilous attraction on alcohol use, 22.6% is homophilous attraction on marijuana use, 24.9% is heterophilous repulsion based on alcohol use, and 0% is heterophilous repulsion on marijuana use. The persistence for alcohol use (or non use) is 70.8% and the persistence for marijuana use (or non use) is 92.5%. Using STEPPs to model adolescent substance use provides a new conceptualization and quantification of the social forces at play in a community . Since the complexity of social networks is captured by distance in Euclidean space, we do not need to model nuanced changes in friendship ties or shared substance use. Instead, we simply model the fundamental forces of attraction and repulsion as they pertain to each sub- stance. In the example above, we observe that the forces of homophilous attraction and heterophilous re- pulsion on alcohol ar e very str ong, accounting for a combined 42.5% of the actors’ movements. Conversely , there is no heter ophilous repulsion on marijuana and a modest homophilous attraction contributing to 22.6% of actors’ movements. This r esult tells a compelling story about the differ ence between alcohol and marijuana in this school. Based on the model formation, we can infer that shared alcohol use (or non-use) is pushing students together while students with opposing usage are driven apart, and some students ar e influenced to adopt new behaviors based on those closest to them in social space. Conversely , shared mar - ijuana use (or non-use) is pushing students together while opposing usage has no effect on driving them apart. Although, it must be noted that this is not a statement regar ding the general influence of alcohol or marijuana since our analysis is merely an illustration of the methods presented in this paper . It is possible that we are observing the effects of social stigma. That is, marijuana is less prevalent in the example and might be stigmatized compared to alcohol. Formalizing and testing this notion is the topic of future work. 13 5 Discussion This paper introduces a novel class of stochastic models for the coevolution of social structure and indi- vidual behavior over time. This model class is built on ideas successful applied in latent space approaches to network analysis, longitudinal social network analysis, and spatial-temporal point processes with spe- cific components being motivated by physics and psychology . Additionally , realistic specifications of the broader class lead to traditional likelihood-based inference and computationally tractable solutions to esti- mation problems. As shown in Section 2, complex social systems can be stochastically repr esented by a set of ego centric processes. The drift process provides an intuitive baseline for the ways in which actors navigate a social space with respect to their own position and neighboring actors’ positions. Moreover , the atomic drift process incorporates principles fr om particle physics to produce stable dynamics over time while Schopen- hauer ’s porcupine dilemma provides a philosophical ar gument for the functional form of the transition distribution. The attraction and repulsion pr ocesses introduce a fundamental dependence between the evolution of individual behavior and one’s social position over time. That is, these processes shed light on selection and influence phenomena in a way that is distinctly different from existing frames like the SAOMs. While these models reflect real social processes, Section 3 shows that natural specifications also lead to multivariate normal conditional ETDs and computationally tractable likelihood-based inference. It is easy to get lost in the technical details and lose sight of the overarching elegance in these results. Recall that we implement a probabilistic version of the inverse square law , one of the most fundamental relationships in nature, and the r esult is an analytically closed form transition distribution. Furthermore, that distribution is one of the most fundamental in science, nature and statistical theory: the multivariate normal distribution. Although the parameters of this distribution can be cumbersome to calculate, the result makes likelihood- based inference possible. Section 4 illustrates the core methods developed in this paper with an application to a study of alcohol and drug use in adolescent friendship networks. This shows the potential of these methods in numerous applications, but it also highlights futur e challenges. The pr ocess of fitting latent positions to cr oss-sectional networks and minimizing the variation across observed points in time can be very nuanced and challeng- ing. It is not practical for applied resear chers interested in implementing these models to perform this exercise every time. Hence, the focus of future work in this area is to develop more holistic approaches to inferring latent actor positions. Alternatively , this class of models may lead to differ ent forms of data collection in social systems that inform the latent positions more accurately than social ties. The STEPP framework for social systems can be extended both methodologically and in application. In this paper , we focus on population level for ces but it is natural to extend the model to allow for individual or community level forces. A natural example is to extend the notion of attractive forces and allow actors to have differ ent masses. That is, some actors may have inherently more social influence or attractiveness. Also, we might allow for dif ferential homophily or heterophily , e.g., the attractive force between non-users is weaker than the force between users. In addition to methodological extensions, there are applications of the STEPP framework that do not requir e solving inferential problems. For example, one can use STEPP simulation as a virtual laboratory for intervention assessment. Given reasonable assumptions r egarding ac- tor positions and parameter values, it is possible to stage hypothetical interventions and simulate possible outcomes. Consider a class of 100 students wher e 40 of them are known binge drinkers and the administra- tion has two options: they can target a few students and conduct intense personal interventions that are 90% effective or implement a binge drinking prevention program that every student participates in but is only 25% effective. Since targeting popular students could have spillover effects, it is unclear which option is best. Furthermore, it is unclear which students to tar get. Simulated STEPPs shed light on an otherwise un- certain decision process. In conclusion, the STEPP framework provides realistic stochastic r epresentations of complex social systems which provides novel tools for social science r esearch. 14 Acknowledgements Research r eported in this publication release was supported by the National Institute of Drug Abuse of the National Institutes of Health under award number R01DA033280. The project described was supported by grant numbers 1R21HD063000 and 5R21HD075714-02 from NICHD, grant number N00014-08-1-1015 from the ONR, grant numbers MMS-0851555, MMS-1357619 from the NSF . W e are grateful to the California Center for Population Research at UCLA (CCPR) for general support. CCPR receives population resear ch infrastructur e funding (R24-HD041022) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). Its contents are solely the responsibility of the authors and do not nec- essarily represent the of ficial views of the Demographic & Behavioral Sciences (DBS) Branch, the National Science Foundation, the Office of Naval Research, or the National Institutes of Health. The authors would also like to acknowledge Kayla de la Haye and the principal investigators of the CARBIN study , Har old D. Green, Jr . and Dorothy Espelage, for facilitating this work and motivating the methods presented in this paper . Appendix Covariance Matrix in Theorem 1 In Theorem 1, the expr ession for the covariance is: H t ij = 1 ( j = i ) 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , Z t − 1 − i )) 0 . . . 0 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t i 1 )) . . . 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t iq )) 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t i 1 )) . . . 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t iq )) 0 where the 0s ar e matched to ρ 1 , . . . , ρ q and λ in the parameter θ . Proof of Results In this appendix we provide pr oofs of the two lemmas and theorem in Section 3. 15 Proof of Lemma 1 Proof. Base case: w 1 k z − µ 1 k + w 2 k z − µ 2 k ( w 1 + w 2 ) k z − ( w 1 µ 1 + w 2 µ 2 ) / ( w 1 + w 2 ) k . Initially , the sub- scripts on µ 1 and µ 2 will be set to superscripts so the subscript can denote individual components. First, w 1 z − µ 1 = w 1 d X i =1 ( z i − µ 1 i ) 2 = w 1 d X i =1 ( z 2 i − 2 z i µ 1 i + 2( µ 1 i ) 2 ) w 1 d X i =1 ( z 2 i − 2 z i µ 1 i ) . Then w 1 z − µ 1 + w 2 z − µ 2 w 1 d X i =1 ( z 2 i − 2 z i µ 1 i ) + w 2 d X i =1 ( z 2 i − 2 z i µ 2 i ) = d X i =1 ( w 1 z 2 i − 2 w 1 z i µ 1 i + w 2 z 2 i − 2 w 2 z i µ 2 i ) = d X i =1 (( w 1 + w 2 ) z 2 i − 2 z i ( w 1 µ 1 i + w 2 µ 2 i )) = ( w 1 + w 2 ) d X i =1 ( z 2 i − 2 z i ( w 1 µ 1 i + w 2 µ 2 i ) / ( w 1 + w 2 )) ( w 1 + w 2 ) d X i =1 ( z i − ( w 1 µ 1 i + w 2 µ 2 i ) / ( w 1 + w 2 )) 2 = ( w 1 + w 2 ) z − ( w 1 µ 1 + w 2 µ 2 ) / ( w 1 + w 2 ) . Induction step: Assume P n j =1 w j k z − µ j k w ∗ k z − µ ∗ /w ∗ k for n = k and show true for n = k + 1 . Let w 0 = P k j =1 w j and w 00 = P k +1 j =1 w j . Similarly , let µ 0 = P k j =1 w j µ j and µ 00 = P k +1 j =1 w j µ j . Then k +1 X j =1 w j k z − µ j k = k X j =1 w j k z − µ j k + w k +1 k z − µ k +1 k w 0 k z − µ 0 /w 0 k + w k +1 k z − µ k +1 k w ∗ k z − µ ∗ /w ∗ k , where w ∗ = k X j =1 w j + w k +1 = w 00 and µ ∗ = w 0 µ 0 w 0 + w k +1 µ k +1 = µ 0 + w k +1 µ k +1 = d X j =1 w j µ j + w k +1 µ k +1 = µ 00 16 Proof of Lemma 2 Proof. First, observe that if h ( z ) g ( z ) , then e h ( z ) ∝ e g ( z ) . Then by Lemma 1, P ( Z = z ) ∝ exp ( − n X i =1 w i k z − µ i k ) ∝ exp − w ∗ z − µ ∗ w ∗ . Since we can rewrite k z k = z > z , P ( Z = z ) ∝ exp ( − w ∗ z − µ ∗ w ∗ > z − µ ∗ w ∗ ) = exp ( − 1 2 z − µ ∗ w ∗ > 1 2 w ∗ I d − 1 z − µ ∗ w ∗ ) . Therefor e, P ( Z = z ) = (2 π ) − d/ 2 (2 w ∗ ) d/ 2 exp ( − 1 2 z − µ ∗ w ∗ > 1 2 w ∗ I d − 1 z − µ ∗ w ∗ ) . Proof of the Theorem Proof. First, we need to verify the marginal distribution of [ Z t i | X t i , Z t − 1 , X t − 1 , S t − 1 ] up to a normalizing constant. Recall the complete STEPP distribution P θ ( S t , X t , Z t | S t − 1 , X t − 1 , Z t − 1 ) = P λ ( S t | S t − 1 ) Y i ∈ S t P δ ( Z t i | Z t − 1 , S t − 1 ) × exp q X m =1 1 ( X t im = X t − 1 im ) log ρ m + 1 ( X t im 6 = X t − 1 im ) log(1 − ρ m ) ! × P α ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) P υ ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) . By marginalizing and conditioning on S t and X t i , we can reduce this to P ( Z t i | X t i , Z t − 1 , X t − 1 ) = P δ ( Z t i | Z t − 1 , S t − 1 ) P α ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) P υ ( Z t i , X t i | Z t − 1 , X t − 1 , S t − 1 ) . 17 Since each term on the right hand side has an exponential form, the exponents sum as follows δ 0 Z t i − Z t − 1 i + δ 1 X j ∈ S t 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , Z t − 1 − i )) w t − 1 ij Z t i − Z t − 1 j + q X m =1 X j ∈ S t α m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t im )) w t − 1 ij Z t i − Z t − 1 j + q X m =1 X j ∈ S t υ m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t im )) w t − 1 ij Z t i − Z t − 1 j = X j ∈ S t ( δ 0 1 ( i = j ) + δ 1 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , Z t − 1 − i )) + q X m =1 α m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , A t im )) + q X m =1 υ m 1 ( Z t − 1 j ∈ B k ( Z t − 1 i , U t im ))) w t − 1 ij Z t i − Z t − 1 j = X j ∈ S t θ > H t ij w t ij Z t i − Z t − 1 j . Hence, P ( Z t i | X t i , Z t − 1 , X t − 1 ) ∝ exp − X j ∈ S t θ > H t ij w t ij Z t i − Z t − 1 j . and Lemma 2 implies that [ Z t i | X t i , Z t − 1 , X t − 1 , S t − 1 ] ∼ MV N ( µ t i , Σ t i ) . References Whitney A Brechwald and Mitchell J Prinstein. Beyond homophily: A decade of advances in understanding peer influence processes. Journal of Research on Adolescence , 21(1):166–179, 2011. G Casella and R. L. Berger . Statistical Inference . Duxbury , Pacific Gr ove, 2nd edition, 2002. Kayla De La Haye, Har old D Green, David P Kennedy , Michael S Pollar d, and Joan S T ucker . Selection and influence mechanisms associated with marijuana initiation and use in adolescent friendship networks. Journal of Research on Adolescence , 23(3):474–486, 2013. Kayo Fujimoto and Thomas W V alente. Social network influences on adolescent substance use: Disentan- gling structural equivalence fr om cohesion. Social Science & Medicine , 74(12):1952–1960, 2012. Isobel Claire Gormley and Thomas Brendan Murphy . A latent space model for rank data. In Statistical Network Analysis: Models, Issues, and New Directions , pages 90–102. Springer , New Y ork, 2007. Mark S Handcock, Adrian E Raftery , and Jeremy M T antrum. Model-based clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 170(2):301–354, 2007. Mark S. Handcock, David R. Hunter , Carter T . Butts, Steven M. Goodreau, Morris, and Martina. statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software , 24(1):1–11, 2008. URL http://www.jstatsoft.org/v24/i01 . Mark S. Handcock, David R. Hunter , Carter T . Butts, Steven M. Goodreau, Pavel N. Krivitsky , Skye Bender- deMoll, and Martina Morris. statnet: Software tools for the Statistical Analysis of Network Data . The Statnet Project ( http://www.statnet.org ), 2014a. URL CRAN.R- project.org/package=statnet . R package version 2014.2.0. 18 Mark S. Handcock, David R. Hunter , Carter T . Butts, Steven M. Goodreau, Pavel N. Krivitsky , and Martina Morris. ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks . The Statnet Project ( http: //www.statnet.org ), 2014b. URL CRAN.R- project.org/package=ergm . R package version 3.1.2. Steve Hanneke, W enjie Fu, Eric P Xing, et al. Discrete temporal models of social networks. Electronic Journal of Statistics , 4:585–605, 2010. Peter D Hoff. Bilinear mixed-effects models for dyadic data. Journal of the american Statistical association , 100 (469):286–295, 2005. Peter D Hoff, Adrian E Raftery , and Mark S Handcock. Latent space appr oaches to soc ial network analysis. Journal of the american Statistical association , 97(460):1090–1098, 2002. Paul W Holland and Samuel Leinhardt. T ransitivity in structural models of small gr oups. Comparative Group Studies , 1971. Paul W Holland and Samuel Leinhardt. A dynamic model for social networks. Journal of Mathematical Sociology , 5(1):5–20, 1977. Paul W Holland and Samuel Leinhardt. An exponential family of probability distributions for directed graphs. Journal of the american Statistical association , 76(373):33–50, 1981. David R. Hunter , Steven M. Goodreau, and Mark S. Handcock. Goodness of fit for social network models. Journal of the American Statistical Association , 103:248–258, 2008a. David R Hunter , Mark S Handcock, Carter T Butts, Steven M Goodr eau, and Martina Morris. ergm: A pack- age to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software , 24(3):nihpa54860, 2008b. Pavel N Krivitsky and Mark S Handcock. Fitting position latent cluster models for social networks with latentnet. Journal of Statistical Software , 24(2), 2008. Pavel N Krivitsky and Mark S Handcock. A Separable Model for Dynamic Networks. Journal of the Royal Statistical Sociey B , 75(4), 2013. Pavel N. Krivitsky and Mark S. Handcock. latentnet: Latent position and cluster models for statistical networks . The Statnet Project ( http://www.statnet.org ), 2014a. URL CRAN.R- project.org/package= latentnet . R package version 2.5.1. Pavel N. Krivitsky and Mark S. Handcock. ter gm: Fit, Simulate and Diagnose Models for Network Evolution based on Exponential-Family Random Graph Models . The Statnet Project ( http://www.statnet.org ), 2014b. URL CRAN.R- project.org/package=tergm . R package version 3.1.4. Mark Lubell, John Scholz, Ramiro Berar do, and Garry Robins. T esting policy theory with statistical models of networks. Policy Studies Journal , 40(3):351–374, 2012. Jesper Moller and Rasmus Plenge W aagepetersen. Statistical inference and simulation for spatial point processes . CRC Press, New Y ork, 2003. Martina Morris, Steven M Goodreau, Carter T Butts, Mark S Handcock, and David R Hunter . ergm: A pack- age to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software , 24(03), 2008. Franc ¸ ois Poulin, Jeff Kiesner , Sara Pedersen, and Thomas J Dishion. A short-term longitudinal analysis of friendship selection on early adolescent substance use. Journal of adolescence , 34(2):249–256, 2011. 19 Ruth Ripley , Krists Boitmanis, and T om A.B. Snijders. RSiena: Siena - Simulation Investigation for Empirical Network Analysis , 2013. URL http://CRAN.R- project.org/package=RSiena . R package version 1.1-232. Garry Robins and Philippa Pattison. Random graph models for temporal processes in social networks*. Journal of Mathematical Sociology , 25(1):5–41, 2001. David R Schaefer , Sandra D Simpkins, Andrea E V est, and Chara D Price. The contribution of extracur- ricular activities to adolescent friendships: New insights through social network analysis. Developmental psychology , 47(4):1141, 2011. Arthur Schopenhauer . Parer ga and paralipomena. short philosophical essays, vol. 2, translated by efj payne, 1974. Susan Shortr eed, Mark S Handcock, and Peter Hoff. Positional estimation within a latent space model for networks. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences , 2(1):24, 2006. Thomas A. B. Snijders, C. E. G. Steglich, and M. Schweinberger . Modeling the co-evolution of networks and behavior . In Kees van Montfort, Johan Oud, and Albert Satorra, editors, Longitudinal models in the behavioral and related sciences , pages 41–71. Routledge, London, UK, 2007. T om AB Snijders. Models for longitudinal network data. Models and methods in social network analysis , 1: 215–247, 2005. T om AB Snijders. Longitudinal methods of network analysis. Encyclopedia of complexity and system science , pages 5998–6013, 2009. T om AB Snijders, Gerhard G V an de Bunt, and Christian EG Steglich. Introduction to stochastic actor-based models for network dynamics. Social networks , 32(1):44–60, 2010. Christian Steglich, T om AB Snijders, and Michael Pearson. Dynamic networks and behavior: Separating selection from influence. Sociological Methodology , 40(1):329–393, 2010. Joan S T ucker , Kayla De La Haye, David P Kennedy , Har old D Green, and Michael S Pollard. Peer influence on marijuana use in differ ent types of friendships. Journal of Adolescent Health , 54(1):67–73, 2014. Gerhar d G V an de Bunt, Marijtje AJ V an Duijn, and T om AB Snijders. Friendship networks through time: An actor-oriented dynamic statistical network model. Computational & Mathematical Organization Theory , 5(2):167–192, 1999. 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment