Computable de Finetti measures
We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten a…
Authors: Cameron E. Freer, Daniel M. Roy
Computable de Finetti measures Cameron E. F reer a , Daniel M. Ro y b a Dep artment of Mathematics, Massachusetts Institute of T e chnolo gy, Cambridge, MA, USA b Computer Scienc e and A rtificial Intel ligenc e L abor atory, Massachusetts Institute of T e chnolo gy, Cambridge, MA, USA Abstract W e pro v e a computable v ersion of de Finetti’s theorem on exc hangeable sequences of real random v ariables. As a consequence, exc hangeable sto chastic pro cesses expressed in probabilistic functional programming languages can b e automati- cally rewritten as pro cedures that do not mo dify non-lo cal state. Along the wa y , w e prov e that a distribution on the unit in terv al is computable if and only if its momen ts are uniformly computable. Key wor ds: de Finetti’s theorem, exc hangeabilit y, computable probabilit y theory, probabilistic programming languages, m utation 2010 MSC: 03D78, 60G09, 68Q10, 03F60, 68N18 1. In tro duction The classical de Finetti theorem states that an exchangeable sequence of real random v ariables is a mixture of indep endent and identically distributed (i.i.d.) sequences of random v ariables. Moreo v er, there is an (almost surely unique) measure-v alued random v ariable, called the dir e cting r andom me asur e , condi- tioned on whic h the random sequence is i.i.d. The distribution of the directing random measure is called the de Finetti me asur e or the mixing me asur e . This pap er examines the c omputable probability theory of exchangeable se- quences of real-v alued random v ariables. W e prov e a computable version of de Finetti’s theorem: the distribution of an exc hangeable sequence of real random v ariables is computable if and only if its de Finetti measure is computable. The classical pro ofs do not readily effectivize; instead, we sho w how to directly com- pute the de Finetti measure (as c haracterized b y the classical theorem) in terms of a computable represen tation of the distribution of the exc hangeable sequence. Along the w a y , we pro v e that a distribution on [0 , 1] ω is computable if and only if its momen ts are uniformly computable, whic h may b e of indep endent in terest. A key step in the pro of is to describ e the de Finetti measure in terms of the momen ts of a set of random v ariables deriv ed from the exc hangeable sequence. When the directing random measure is (almost surely) contin uous, we can show that these moments are computable, whic h suffices to complete the pro of of the main theorem in this case. In the general case, w e give a pro of inspired by a ran- domized algorithm that, with probabilit y one, computes the de Finetti measure. 1.1. Computable Pr ob ability The ory These results are form ulated in the T uring-machine-based bit-mo del for com- putation ov er the reals (for a general survey , see Bra v erman and Co ok [ 1 ]). This computational mo del has been explored b oth via the t yp e-2 theory of effectiv- it y (TTE) framework for computable analysis, and via effectiv e domain-theoretic represen tations of measures. Computable analysis has its origins in the study of recursiv e real functions, and can b e seen as a wa y to provide “automated numerical analysis” (for a tutorial, see Brattk a, Hertling, and W eihrauc h [ 2 ]). Effectiv e domain theory has its origins in the study of the semantics of programming languages, where it contin ues to ha v e many applications (for a surv ey , see Edalat [ 3 ]). Here w e use methods from these approaches to transfer a represen tational result from probabilit y theory to a setting where it can directly transform statistical ob jects as represen ted on a computer. The computable probabilit y measures in the bit-mo del coincide with those distributions from which w e can generate exact samples to arbitrary precision on a computer. Results in the bit-mo del also hav e direct implications for programs that manipulate probabilit y distributions numerically . In many areas of statistics and computer science, esp ecially mac hine learning, the ob jects of interest include distributions on data structures that are higher-order or are defined using re- cursion. Probabilistic functional programming languages pro vide a con venien t setting for describing and manipulating suc h distributions, and the theory w e presen t here is directly relev an t to this setting. Exc hangeable sequences pla y a fundamental role in b oth statistical mo dels and their implemen tation on computers. Given a se quential description of an ex- c hangeable pro cess, in whic h one uses previous samples or sufficient statistics to sample the next elemen t in the sequence, a direct implemen tation in a probabilis- tic functional programming language would need to use non-local communication (to access old samples or up date sufficient statistics). This is often implemented b y mo difying the program’s in ternal state directly (i.e., using mutation ), or via some indirect metho d suc h as a state monad. The classical de Finetti theorem implies that (for such sequences ov er the reals) there is an alternative description in which samples are conditionally indep endent (and so could b e implemen ted without non-local communication), thereby allo wing parallel implemen tations. But the classical result do es not imply that there is a pr o gr am that samples the sequence according to this description. Even when there is such a program, 2 the classical theorem do es not provide a metho d for finding it. The computable de Finetti theorem states that suc h a program do es exist. Moreov er, the pro of itself pro vides a metho d for constructing the desired program. In Section 6 we describ e how an implemen tation of the computable de Finetti theorem w ould p erform a co de transformation that eliminates the use of non-lo cal state in pro- cedures that induce exc hangeable sto c hastic pro cesses. This transformation is of interest beyond its implications for programming language semantics. In statistics and mac hine learning, it is often desirable to kno w the representation of an exchangeable sto chastic pro cess in terms of its de Finetti measure (for sev eral examples, see Section 6.3 ). Many such pro cesses in mac hine learning ha ve v ery complicated (though computable) distributions, and it is not alwa ys feasible to find the de Finetti representation b y hand. The computable de Finetti theorem pro vides a metho d for automatically obtaining suc h represen tations. 2. de Finetti’s Theorem W e assume familiarit y with the standard measure-theoretic form ulation of probabilit y theory (see, e.g., Billingsley [ 4 ] or Kallenberg [ 5 ]). Fix a basic proba- bilit y space (Ω , F , P ) and let B R denote the Borel sets of R . Note that w e will use ω to denote the set of nonnegative in tegers (as in logic), rather than an element of the basic probabilit y space Ω (as in probability theory). By a r andom me asur e w e mean a random elemen t in the space of Borel measures on R , i.e., a kernel from (Ω , F ) to ( R , B R ). An even t A ∈ F is said to occur almost sur ely (a.s.) if P A = 1. W e denote the indicator function of a set B b y 1 B . Definition 2.1 (Exchangeable sequence). Let X = { X i } i ≥ 1 b e a sequence of real-v alued random v ariables. W e say that X is exchange able if, for ev ery finite set { k 1 , . . . , k j } of distinct indices, ( X k 1 , . . . , X k j ) is equal in distribution to ( X 1 , . . . , X j ). Theorem 2.2 (de Finetti [ 6 , Chap. 1.1]). L et X = { X i } i ≥ 1 b e an exchange- able se quenc e of r e al-value d r andom variables. Ther e is a r andom pr ob ability me asur e ν on R such that { X i } i ≥ 1 is c onditional ly i.i.d. with r esp e ct to ν . That is, P [ X ∈ · | ν ] = ν ∞ a . s . (1) Mor e over, ν is a.s. unique and given by ν ( B ) = lim n →∞ 1 n n X i =1 1 B ( X i ) a . s ., (2) wher e B r anges over B R . 3 The random measure ν is called the dir e cting r andom me asur e . 1 Its distribution (a measure on probabilit y measures), whic h w e denote b y µ , is called the de Finetti me asur e or the mixing me asur e . As in Kallen b erg [ 6 , Chap. 1, Eq. 3], w e may tak e exp ectations on b oth sides of ( 1 ) to arriv e at a c haracterization P { X ∈ · } = E ν ∞ = Z m ∞ µ ( dm ) (3) of an exc hangeable sequence as a mixture of i.i.d. sequences. A Ba yesian p ersp ective suggests the follo wing interpretation: exchangeable sequences arise from indep enden t observ ations from a latent measure ν . Posterior analysis follows from placing a prior distribution on ν . F or further discussion of the implications of de Finetti’s theorem for the foundations of statistical inference, see Da wid [ 7 ] and Lauritzen [ 8 ]. In 1931, de Finetti [ 9 ] pro v ed the classical result for binary exc hangeable se- quences, in which case the de Finetti measure is simply a mixture of Bernoulli distributions; the exchangeable sequence is equiv alent to rep eatedly flipping a coin whose w eigh t is dra wn from some distribution on [0 , 1]. In 1937, de Finetti [ 10 ] extended the result to arbitrary real-v alued exchangeable sequences. W e will refer to this more general version as the de Finetti the or em . Later, He- witt and Sa v age [ 11 ] extended the result to compact Hausdorff spaces, and Ryll- Nardzewski [ 12 ] introduced a weak er notion than exc hangeabilit y that suffices to giv e a conditionally i.i.d. represen tation. Hewitt and Sav age [ 11 ] pro vide a history of the early dev elopmen ts, and a discussion of some subsequent extensions can be found in Kingman [ 13 ], Diaconis and F reedman [ 14 ], and Aldous [ 15 ]. A recent b o ok by Kallenberg [ 6 ] pro vides a comprehensive view of the area of probabilit y theory that has grown out of de Finetti’s theorem, stressing the role of inv ariance under symmetries. 2.1. Examples Consider an exchangeable sequence of [0 , 1]-v alued random v ariables. In this case, the de Finetti measure is a distribution on the (Borel) measures on [0 , 1]. F or example, if the de Finetti measure is a Dirac measure on the uniform distri- bution on [0 , 1] (i.e., the distribution of a random measure whic h is almost surely the uniform distribution), then the induced exchangeable sequence consists of indep enden t, uniformly distributed random v ariables on [0 , 1]. As another example, let p b e a random v ariable, uniformly distributed on [0 , 1], and let ν := δ p , i.e., the Dirac measure concentrated on p . Then the de Finetti measure is the uniform distribution on Dirac measures on [0 , 1], and 1 The directing random measure is only unique up to a null set, but it is customary to refer to it as if it were unique, as long as we only rely on almost-sure prop erties. 4 the corresp onding exchangeable sequence is p, p, . . . , i.e., a constant sequence, marginally uniformly distributed. As a further example, we consider a sto c hastic pro cess { X i } i ≥ 1 comp osed of binary random v ariables whose finite marginals are giv en b y P { X 1 = x 1 , . . . , X n = x n } = Γ( α + β ) Γ( α )Γ( β ) Γ( α + S n )Γ( β + ( n − S n )) Γ( α + β + n ) , (4) where S n := P i ≤ n x i , and where Γ is the Gamma function and α, β are p ositive real num b ers. (One can verify that these marginals satisfy Kolmogorov’s exten- sion theorem [ 5 , Theorem 6.16], and so there is a sto chastic pro cess { X i } i ≥ 1 with these finite marginals.) Clearly this pro cess is exchangeable, as n and S n are in v arian t to order. This pro cess can also b e describ ed b y a sequen tial scheme kno wn as P´ oly a’s urn [ 16 , Chap. 11.4]. Eac h X i is sampled in turn according to the conditional distribution P { X n +1 = 1 | X 1 = x 1 , . . . , X n = x n } = α + S n α + β + n . (5) This pro cess is often describ ed as rep eated sampling from an urn: starting with α red balls and β black balls, a ball is drawn at each stage uniformly at random, and then returned to the urn along with an additional ball of the same color. By de Finetti’s theorem, there exists a random v ariable θ ∈ [0 , 1] with resp ect to whic h the sequence is conditionally indep endent and P { X i = 1 | θ } = θ for each i . In fact, P [ X 1 = x 1 , . . . , X n = x n | θ ] = Q i ≤ n P [ X i = x i | θ ] = θ S n (1 − θ ) ( n − S n ) . (6) F urthermore, one can show that θ is Beta( α, β )-distributed, and so the process giv en by the marginals ( 4 ) is called the Beta-Bernoulli pro cess. Finally , the de Finetti measure is the distribution of the random Bernoulli measure θ δ 1 + (1 − θ ) δ 0 . 2.2. The Computable de Finetti The or em In eac h of these examples, the de Finetti measure is a c omputable me asur e . (In Section 3 , w e mak e this and related notions precise. F or an implementation of the Beta-Bernoulli pro cess in a probabilistic programming language, see Sec- tion 6 .) A natural question to ask is whether computable exc hangeable sequences alw a ys arise from computable de Finetti measures. In fact, computable de Finetti measures giv e rise to computable distributions on exc hangeable sequences (see Prop osition 5.1 ). Our main result is the con v erse: every computable distribu- tion on real-v alued exchangeable sequences arises from a computable de Finetti measure. 5 Theorem 2.3 (Computable de Finetti). L et χ b e the distribution of a r e al- value d exchange able se quenc e X , and let µ b e the distribution of its dir e cting r andom me asur e ν . Then µ is c omputable r elative to χ , and χ is c omputable r elative to µ . In p articular, χ is c omputable if and only if µ is c omputable. The directing random measure is classically given a.s. by the explicit limiting expression ( 2 ). Without a computable handle on the rate of con v ergence, the limit is not directly computable, and so we cannot use this limit directly to compute the de Finetti measure. How ever, w e are able to reconstruct the de Finetti mea- sure using the moments of random v ariables derived from the directing random measure. 2.2.1. Outline of the Pr o of Recall that B R denotes the Borel sets of R . Let I R denote the set of open in terv als, and let I Q denote the set of op en interv als with rational endp oints. Then I Q ( I R ( B R . F or k ≥ 1 and β ∈ B k R = B R × · · · × B R , w e write β ( i ) to denote the i th co ordinate of β . Let X = { X i } i ≥ 1 b e an exc hangeable sequence of real random v ariables, with distribution χ and directing random measure ν . F or every γ ∈ B R , we define a [0 , 1]-v alued random v ariable V γ := ν γ . A classical result in probabilit y theory [ 5 , Lem. 1.17] implies that a Borel measure on R is uniquely c haracterized by the mass it places on the op en in terv als with rational endp oin ts. Therefore, the distribution of the sto c hastic pro cess { V τ } τ ∈I Q determines the de Finetti measure µ (the distribution of ν ). Definition 2.4 (Mixed momen ts). Let { x i } i ∈ C b e a family of random v ari- ables indexed by a set C . The mixe d moments of { x i } i ∈ C are the exp ectations E Q k i =1 x j ( i ) , for k ≥ 1 and j ∈ C k . W e can now restate the consequence of de Finetti’s theorem describ ed in Eq. ( 3 ), in terms of the finite-dimensional marginals of the exchangeable sequence X and the mixed momen ts of { V β } β ∈B R . Corollary 2.5. P T k i =1 { X i ∈ β ( i ) } = E Q k i =1 V β ( i ) for k ≥ 1 and β ∈ B k R . F or k ≥ 1, let L R k denote the set of finite unions of op en rectangles in R k (i.e., the lattice generated by I k R ), and let L Q k denote the set of finite unions of op en rectangles in Q k . (Note that I Q ( L Q ( L R ( B R .) As we will show in Lemma 3.5 , when χ is computable, we can e n umerate all rational lo wer b ounds on quan tities of the form P T k i =1 { X i ∈ σ ( i ) } , (7) 6 where k ≥ 1 and σ ∈ L k Q . In general, we cannot enumerate all rational upp er b ounds on ( 7 ). How ever, if σ ∈ L k Q (for k ≥ 1) is such that, with probability one, ν places no mass on the b oundary of any σ ( i ), then P T k i =1 { X i ∈ σ ( i ) } = P T k i =1 { X i ∈ σ ( i ) } , where σ ( i ) denotes the closure of σ ( i ). In this case, for ev ery rational upp er b ound q on ( 7 ), w e ha v e that 1 − q is a low er b ound on P S k i =1 { X i 6∈ σ ( i ) } , (8) a quantit y for which w e can enumerate all rational low er b ounds. If this prop erty holds for all σ ∈ L k Q , then w e can compute the mixed moments { V τ } τ ∈L Q . A natural condition that implies this prop erty for all σ ∈ L k Q is that ν is a.s. con tin uous (i.e., with probability one, ν { x } = 0 for every x ∈ R ). In Section 4 , w e show how to computably recov er a distribution from its mo- men ts. This suffices to recov er the de Finetti measure when ν is a.s. contin uous, as we show in Section 5.1 . In the general case, p oint masses in ν can prev ent us from computing the mixed moments. Here we use a pro of inspired by a ran- domized algorithm that almost surely a v oids the p oin t masses and reco v ers the de Finetti measure. F or the complete pro of, see Section 5.3 . 3. Computable Represen tations W e b egin b y introducing notions of computabilit y on v arious spaces. These definitions follow from more general TTE notions, though we will sometimes deriv e simpler equiv alent represen tations for the concrete spaces we need (suc h as the real n umbers, Borel measures on reals, and Borel measures on Borel measures on reals). F or details, see the original pap ers, as noted. W e assume familiarit y with standard notions of computabilit y theory , suc h as computable and computably enumerable (c.e.) sets (see, e.g., Rogers [ 17 ] or Soare [ 18 ]). Recall that r ∈ R is a c.e. r e al (sometimes called a left-c.e. or left- c omputable r e al ) when the set of all rationals less than r is a c.e. set. Similarly , r is a c o-c.e. r e al (sometimes called a right-c.e. or right-c omputable r e al ) when the set of all rationals greater than r is c.e. A real r is a computable real when it is b oth a c.e. and co-c.e. real. T o represen t more general spaces, w e work in terms of an effectiv ely presented top ology . Supp ose that S is a second-coun table T 0 top ological space with subbasis S . F or ev ery p oint x ∈ S , define the set S x := { B ∈ S : x ∈ B } . Because S is T 0 , w e ha ve S x 6 = S y when x 6 = y , and so the set S x uniquely determines the p oin t x . It is therefore conv enient to define representations on top ological spaces under the assumption that the space is T 0 . In the sp ecific cases b elow, w e often ha v e muc h more structure, which we use to simplify the representations. W e no w dev elop these definitions more formally . 7 Definition 3.1 (Computable top ological space). Let S be a second-coun t- able T 0 top ological space with a coun table subbasis S . Let s : ω → S b e an en umeration of S (p ossibly with rep etition), i.e., a total surjective (but not nec- essarily injective) function. W e say that S is a c omputable top olo gic al sp ac e (with r esp e ct to s ) when the set h m, n i : s ( m ) = s ( n ) (9) is a c.e. subset of ω , where h · , · i is a standard pairing function. This definition of a computable top ological space is derived from W eihrauc h’s definition [ 21 , Def. 3.2.1] in terms of “notations”. (See also, e.g., Grubba, Sc hr¨ oder, and W eihrauch [ 19 , Def. 3.1].) It is often p ossible to pic k a subbasis S (and en umeration s ) for which the elemen tal “observ ations” that one can computably observe are those of the form x ∈ B , where B ∈ S . Then the set S x = { B ∈ S : x ∈ B } is computably en umerable (with respect to s ) when the point x is suc h that it is ev entually noticed to be in each basic op en set containing it; w e will call suc h a p oint x c omputable . This is one motiv ation for the definition of computable p oint in a T 0 space b elo w. Note that in a T 1 space, tw o computable p oints are computably distinguish- able, but in a T 0 space, computable p oin ts will be, in general, distinguishable only in a computably enumerable fashion. Ho wev er, this is essentially the b est that is p ossible, if the op en sets are those that w e can “observe”. (F or more details on this approac h to considering datatypes as top ological spaces, in which basic op en sets corresp ond to “observ ations”, see Battenfeld, Schr¨ oder, and Simpson [ 20 , § 2].) Note that the c hoice of top ology and subbasis are essen tial; for example, we can reco v er b oth computable reals and c.e. reals as instances of “computable p oint” for appropriate computable top ological spaces, as we describ e in Section 3.1 . Definition 3.2 (Names and computable p oints). Let ( S, S ) b e a comput- able top ological space with resp ect to an enumeration s . Let x ∈ S . The set { n : s ( n ) ∈ S x } = { n : x ∈ s ( n ) } (10) is called the s -name (or simply , name ) of x . W e sa y that x is computable when its s -name is c.e. Note that this use of the term “name” is similar to the notion of a “complete name” (see [ 21 , Lem. 3.2.3]), but differs somewhat from TTE usage (see [ 21 , Def. 3.2.2]). 8 Definition 3.3 (Computable functions). Let ( S, S ) and ( T , T ) b e comput- able top ological spaces (with resp ect to enumerations s and t , resp ectively). W e sa y that a function f : S → T is c omputable (with r esp e ct to s and t ) when there is a partial computable functional g : ω ω → ω ω suc h that for all x ∈ dom( f ) and en umerations N = { n i } i ∈ ω of an s -name of x , w e hav e that g ( N ) is an en umeration of a t -name of f ( x ). (See [ 21 , Def. 3.1.3] for more details.) Note that an implication of this defini- tion is that computable functions are con tin uous. Recall that a functional g : ω ω → ω ω is partial computable if there is a monotone computable function h : ω <ω → ω <ω mapping finite prefixes (of in teger sequences) to finite prefixes, suc h that giv en increasing prefixes of an input N in the domain of g , the output of h will even tually include ev ery finite prefix of g ( N ). (See [ 21 , Def. 2.1.11] for more details.) Informally , h can b e used to read in an en umeration of an s -name of a point x and outputs an enumeration of a t -name of the p oin t f ( x ). Let ( S, S ) and ( T , T ) b e computable top ological spaces. In many situations where we are in terested in es tablishing the computabilit y of some function f : S → T , we ma y refer to the function implicitly via pairs of p oin ts x ∈ S and y ∈ T related b y y = f ( x ). In this case, we will say that y (under the top ology T ) is c omputable r elative to x (under the topology S ) when f : S → T is a computable function. W e will often elide one or b oth top ologies when they are clear from con text. 3.1. R epr esentations of R e als W e will use b oth the standard top ology and right order top ology on the real line R . The reals under the standard topology are a computable top ological space using the basis I Q with respect to a straigh tforw ard effective enumeration; the computable p oin ts of this space are the computable reals. The reals under the right or der top olo gy are a computable top ological space using the basis R < := ( c, ∞ ) : c ∈ Q , (11) under a standard enumeration; the computable p oin ts of this space are the c.e. reals. Recall that, for k ≥ 1, the set I k Q is a basis for the (pro duct of the) standard top ology on R k that is closed under in tersection and makes ( R k , I k Q ) a com- putable top ological space (under a straigh tforw ard enumeration of I k Q ). Lik e- wise, an effective enumeration of cylinders σ × R ω , for σ ∈ S k ≥ 1 I k Q , mak es R ω a computable topological space. Replacing I Q with R < and “standard” with “righ t order” ab ov e gives a characterization of computable vectors and sequences of reals under the righ t order top ology . 9 W e can use the right order top ology to define a representation for op en sets. Let ( S, S ) b e a computable top ological space, with respect to an enumeration s . Then an op en set B ⊆ S is c.e. op en when the indicator function 1 B is computable with resp ect to S and R < . The c.e. open sets can be shown to b e the computable p oin ts in the space of op en sets under the Scott top ology . Note that for the computable top ological space ω (under the discrete top ology and the iden tit y enumeration) the c.e. op en sets are precisely the c.e. sets of naturals. 3.2. R epr esentations of Continuous R e al F unctions W e now consider computable representations for contin uous functions on the reals. Let ( S, S ) and ( T , T ) each b e either of ( R , I Q ) or ( R , R < ), and let s and t b e the associated enumerations. F or k ≥ 1, the compact-op en top ology on the space of con tinuous functions from S k to T has a subbasis comp osed of sets of the form f : f A ) ⊆ B , (12) where A and B are elements in the b ases S k and T , respectively . An effective en umeration of this subbasis can b e constructed in a straigh tforward fashion from s and t . In particular, let k ≥ 1 and let s k b e an effective enumeration of k -tuples of basis elemen ts deriv ed from s . Then a con tinuous function f : ( R k , S k ) → ( R , T ) is computable (under the compact-op en top ology) when h m, n i : f s k ( m )) ⊆ t ( n ) (13) is a c.e. set. The set ( 13 ) is the name of f . A con tin uous function is computable in this sense if and only if it is com- putable according to Definition 3.3 . (See [ 21 , Ch. 6] and [ 21 , Thm. 3.2.14]). Note that when S = T = I Q , this reco v ers the standard definition of a computable real function. When S = I Q and T = R < , this recov ers the standard definition of a lo w er-semicomputable real function [ 22 ]. 3.3. R epr esentations of Bor el Pr ob ability Me asur es The following representations for probability measures on computable top o- logical spaces are devised from more general TTE represen tations in Sc hr¨ oder [ 23 ] and Bosserhoff [ 24 ], and agree with W eihrauch [ 25 ] in the case of the unit in terv al. In particular, the representation for M 1 ( S ) b elow is admissible with resp ect to the w eak top ology , hence computably equiv alent (see W eihrauc h [ 21 , Chap. 3]) to the canonical TTE represen tation for Borel measures giv en in Sc hr¨ oder [ 23 ]. 10 Sc hr¨ oder [ 23 ] has also shown the equiv alence of this representation for proba- bilit y measures (as a computable space under the weak topology) with pr ob abilis- tic pr o c esses . A probabilistic pro cess (see Schr¨ oder and Simpson [ 26 ]) formalizes a notion of a program that uses randomness to sample points in terms of their names of the form ( 10 ). F or a second-countable T 0 top ological space S with subbasis S , let M 1 ( S ) denote the set of Borel probabilit y measures on S (i.e., the probability measures on the σ -algebra generated b y S ). Such measures are determined b y the measure they assign to finite intersections of elemen ts of S . Note that M 1 ( S ) is itself a second-coun table T 0 space. No w let ( S, S ) b e a computable topological space with resp ect to the en u- meration s . W e will describ e a subbasis for M 1 ( S ) that makes it a computable top ological space. Let L S denote the lattice generated by S (i.e., the closure of S under finite union and intersection), and let s L b e an effective en umeration deriv ed from s . Then, the class of sets { γ ∈ M 1 ( S ) : γ σ > q } , (14) where σ ∈ L S and q ∈ Q , is a subbasis for the weak top ology on M 1 ( S ). An effectiv e enumeration of this subbasis can b e constructed in a straigh tforward fashion from the enumeration of S and an effective enumeration { q n } n ∈ ω of the rationals, making M 1 ( S ) a computable top ological space. In particular, the name of a measure η ∈ M 1 ( S ) is the set {h m, n i : η s L ( m ) > q n } . Corollary 3.4 (Computable distribution). A Bor el pr ob ability me asur e η ∈ M 1 ( S ) is computable (under the weak top ology) if and only if η B is a c.e. r e al, uniformly in the s L -index of B ∈ L S . Note that, for computable top ological spaces ( S, S ) and ( T , T ) with en umera- tions s and t , a measure η ∈ M 1 ( T ) is computable relative to a point x ∈ S when η B is a c.e. real relative to x , uniformly in the t L -index of B ∈ L T . Corollary 3.4 implies that the measure of a c.e. op en set (i.e., the c.e. union of basic op en sets) is a c.e. real (uniformly in the en umeration of the terms in the union), and that the measure of a co-c.e. closed set (i.e., the complemen t of a c.e. op en set) is a co-c.e. real (similarly uniformly); see, e.g., [ 27 , § 3.3] for details. Note that on a discrete space, where singletons are b oth c.e. open and co-c.e. closed, the measure of each singleton is a computable real. But for a general space, it is to o strong to require that even basic op en sets hav e computable measure (see W eihrauc h [ 25 ] for a discussion; moreov er, such a requirement is stronger than what is necessary to ensure that a, e.g., probabilistic T uring machine can pro duce exact samples to arbitrary accuracy). W e will b e interested in computable measures in M 1 ( S ), where S is either R ω , [0 , 1] k , or M 1 ( R ). In order to apply Corollary 3.4 to c haracterize concrete 11 notions of computabilit y for M 1 ( S ), w e will now describ e choices of top ologies on these three spaces. 3.3.1. Me asur es on R e al V e ctors and Se quenc es under the Standar d T op olo gy Using Corollary 3.4 , we can c haracterize the class of computable distributions on real sequences using the computable top ological spaces characterized ab ov e in Section 3.1 . Let ~ x = { x i } i ≥ 1 b e a sequence of real-v alued random v ariables (e.g., the exchangeable sequence X , or the deriv ed random v ariables { V τ } τ ∈I Q under the canonical enumeration of I Q ), and let η b e the joint distribution of ~ x . Then η is computable if and only if η ( σ × R ω ) = P x ∈ σ × R ω is a c.e. real, uniformly in k ≥ 1 and σ ∈ L Q k . The follo wing simpler characterization w as giv en b y M ¨ uller [ 28 , Thm. 3.7]. Lemma 3.5 (Computable distribution under the standard top ology). L et ~ x = { x i } i ≥ 1 b e a se quenc e of r e al-value d r andom variables with joint distri- bution η . Then η is c omputable if and only if η ( τ × R ω ) = P T k i =1 { x i ∈ τ ( i ) } (15) is a c.e. r e al, uniformly in k ≥ 1 and τ ∈ I k Q . Therefore knowing the measure of the sets in S k I k Q ( S k L Q k is sufficient. Note that the right-hand side of ( 15 ) is precisely the form of the left-hand side of the expression in Corollary 2.5 . Note also that one obtains a c haracterization of the computabilit y of a finite-dimensional v ector b y em b edding it as an initial segmen t of a sequence. 3.3.2. Me asur es on R e al V e ctors and Se quenc es under the Right Or der T op olo gy Borel measures on R under the right order top ology pla y an imp ortant role when represen ting measures on measures, as Corollary 3.4 p ortends. Corollary 3.6 (Computable distribution under the right order top ology). L et ~ x = { x i } i ≥ 1 b e a se quenc e of r e al-value d r andom variables with joint distri- bution η . Then η is c omputable under the (pr o duct of the) right or der top olo gy if and only if η S m i =1 ( c i 1 , ∞ ) × · · · × ( c ik , ∞ ) × R ω = P S m i =1 T k j =1 { x j > c ij } (16) is a c.e. r e al, uniformly in k , m ≥ 1 and C = ( c ij ) ∈ Q m × k . Again, one obtains a c haracterization of the computability of a finite-dimen- sional vector by embedding it as an initial segmen t of a sequence. Note also that if a distribution on R k is computable under the standard top ology , then it is clearly computable under the righ t order top ology . The ab ov e characterization is used in the next section as well as in Prop osition 5.1 , where w e must compute an in tegral with resp ect to a top ology that is coarser than the standard top ology . 12 3.3.3. Me asur es on Bor el Me asur es The de Finetti measure µ is the distribution of the directing random measure ν , an M 1 ( R )-v alued random v ariable. Recall the definition V β := ν β , for β ∈ B R . F rom Corollary 3.4 , it follows that µ is computable under the w eak top ology if and only if µ ( S m i =1 T k j =1 { γ ∈ M 1 ( R ) : γ σ ( j ) > c ij } ) = P S m i =1 T k j =1 { V σ ( j ) > c ij } (17) is a c.e. real, uniformly in k , m ≥ 1 and σ ∈ L k Q and C = ( c ij ) ∈ Q m × k . As an immediate consequence of ( 17 ) and Corollary 3.6 , we obtain the follo wing c haracterization of computable de Finetti measures. Corollary 3.7 (Computable de Finetti measure). The de Finetti me asur e µ is c omputable r elative to the joint distribution of { V τ } τ ∈L Q under the right or der top olo gy, and vic e versa. In p articular, µ is c omputable if and only if the joint distribution of { V τ } τ ∈L Q is c omputable under the right or der top olo gy. 3.3.4. Inte gr ation The follo wing lemma is a restatement of an integration result by Sc hr¨ oder [ 23 , Prop. 3.6], which itself generalizes integration results on standard top ologies of finite-dimensional Euclidean spaces b y M¨ uller [ 28 ] and the unit interv al b y W eihrauc h [ 25 ]. Define I := { A ∩ [0 , 1] : A ∈ I Q } , (18) whic h is a basis for the standard top ology on [0 , 1], and define I < := { A ∩ [0 , 1] : A ∈ R < } , (19) whic h is a basis for the righ t order top ology on [0 , 1]. Lemma 3.8 (Integration of b ounded lo wer-semicon tin uous functions). L et k ≥ 1 and let S b e either I Q or R < . L et f : ( R k , S k ) → ([0 , 1] , I < ) (20) b e a c ontinuous function and let µ b e a Bor el pr ob ability me asur e on ( R k , S k ) . Then Z f dµ (21) is a c.e. r e al r elative to f and µ . 13 The follo wing result of M ¨ uller [ 28 ] is an immediate corollary . Corollary 3.9 (Integration of b ounded con tinuous functions). L et g : ( R k , I k Q ) → ([0 , 1] , I ) (22) b e a c ontinuous function and let µ b e a Bor el pr ob ability me asur e on ( R k , I k Q ) . Then Z g dµ (23) is a c omputable r e al r elative to g and µ . 4. The Computable Moment Problem One often has access to the momen ts of a distribution, and wishes to recov er the underlying distribution. Let ~ x = ( x i ) i ∈ ω b e a random v ector in [0 , 1] ω with distribution η . Classically , the distribution of ~ x is uniquely determined by the mixed moments of ~ x . W e show that the distribution is in fact c omputable from the mixed momen ts. One classical w a y to pass from the moments of ~ x to its distribution is via the L ´ evy in v ersion formula, whic h maps the characteristic function φ ~ x : R ω → C , giv en b y φ ~ x ( t ) := E ( e i h t,~ x i ) , (24) to the distribution of ~ x . How ever, even in the finite-dimensional case, the in- v ersion formula in v olv es a limit for whic h we ha v e no direct handle on the rate of con vergence, and so the distribution it defines is not obviously computable. Instead, we use a computable version of the W eierstrass approximation theorem to compute the distribution relativ e to the mixed momen ts. T o sho w that η is computable relative to the mixed moments, it suffices to sho w that η ( σ × [0 , 1] ω ) = E 1 σ ( x 1 , . . . , x k ) is a c.e. real relativ e to the mixed momen ts, uniformly in σ ∈ S k ≥ 1 I k Q . W e b egin by building sequences of p olyno- mials that con v erge p oint wise from b elow to indicator functions of the form 1 σ for σ ∈ S k ≥ 1 L Q k . Lemma 4.1 (Polynomial appro ximations). L et k ≥ 1 and σ ∈ L Q k . Ther e is a se quenc e p n,σ : n ∈ ω (25) of r ational p olynomials of de gr e e k , c omputable uniformly in n , k , and σ , such that, for al l ~ x ∈ [0 , 1] k , we have − 2 ≤ p n,σ ( ~ x ) ≤ 1 σ ( ~ x ) and lim m →∞ p m,σ ( ~ x ) = 1 σ ( ~ x ) . (26) 14 Proof. Let k ≥ 1. F or σ ∈ L Q k , and ~ x ∈ R k , define d ( ~ x, [0 , 1] k \ σ ) to b e the distance from ~ x to the nearest p oint in [0 , 1] k \ σ . It is straightforw ard to show that d ( ~ x, [0 , 1] k \ σ ) is a computable real function of ~ x , uniformly in k and σ . F or n ∈ ω , define f n,σ : R k → R by f n,σ ( ~ x ) := − 1 n + 1 + min { 1 , n · d ( ~ x, [0 , 1] k \ σ ) } , (27) and note that − 1 ≤ f n,σ ( ~ x ) ≤ 1 σ ( ~ x ) − 1 n +1 and lim m →∞ f m,σ ( ~ x ) = 1 σ ( ~ x ). F ur- thermore, f n,σ ( ~ x ) is a computable (hence contin uous) real function of ~ x , uniformly in n , k , and σ . By the effective W eierstrass appro ximation theorem (see P our-El and Richards [ 30 , p. 45]), w e can find (uniformly in n , k , and σ ) a polynomial p n,σ with rational co efficien ts that uniformly approximates f n,σ to within 1 / ( n + 1) on [0 , 1] k . These p olynomials hav e the desired prop erties. W e thank the anonymous referee for suggestions that simplified the pro of of this lemma. Using these p olynomials, we can compute the distribution from the moments. The other direction follo ws from computable in tegration results. Theorem 4.2 (Computable momen ts). L et ~ x = ( x i ) i ∈ ω b e a r andom ve ctor in [0 , 1] ω with distribution η . Then η is c omputable r elative to the mixe d moments of { x i } i ∈ ω , and vic e versa. In p articular, η is c omputable if and only if the mixe d moments of { x i } i ∈ ω ar e uniformly c omputable. Proof. An y monic monomial in k v ariables, considered as a real function, com- putably maps [0 , 1] k in to [0 , 1] (under the standard top ology). F urthermore, as the restriction of η to an y k co ordinates is computable relative to η (uniformly in the co ordinates), it follo ws from Corollary 3.9 that eac h mixed momen t (the exp ectation of a monomial under suc h a restriction of η ) is computable relative to η , uniformly in the index of the monomial and the co ordinates. Let k ≥ 1 and σ ∈ I k Q . T o establish the computabilit y of η , it suffices to sho w that η ( σ × [0 , 1] ω ) = E 1 σ × [0 , 1] ω ( ~ x ) = E 1 σ ( x 1 , . . . , x k ) . (28) is a c.e. real relativ e to the mixed moments, uniformly in k and σ . By Lemma 4.1 , there is a uniformly computable sequence of p olynomials ( p n,σ ) n ∈ ω that conv erge p oin t wise from b elow to the indicator 1 σ . Therefore, b y the dominated conv er- gence theorem, E 1 σ ( x 1 , . . . , x k ) = sup n E p n,σ ( x 1 , . . . , x k ) . (29) 15 The expectation E p n,σ ( x 1 , . . . , x k ) is a Q -linear combination of mixed moments, hence a computable real relativ e to the mixed momen ts, uniformly in n , k , and σ . Th us the suprem um ( 29 ) is a c.e. real relative to the mixed moments, uniformly in k and σ . 5. Pro of of the Computable de Finetti Theorem F or the remainder of the pap er, let X b e a real-v alued exchangeable sequence with distribution χ , let ν be its directing random measure, and let µ be the corresp onding de Finetti measure. Classically , the join t distribution of X is uniquely determined b y the de Finetti measure (see Equation 3 ). W e now sho w that the joint distribution of X is in fact c omputable relative to the de Finetti measure. Prop osition 5.1. The distribution χ is c omputable r elative to µ . Proof. Let k ≥ 1 and σ ∈ I k Q . All claims are uniform in k and σ . In order to sho w that χ , the distribution of X , is computable relativ e to µ , w e must sho w that P T k i =1 { X i ∈ σ ( i ) } is a c.e. real relative to µ . Note that, by Corollary 2.5 , P T k i =1 { X i ∈ σ ( i ) } = E Q k i =1 V σ ( i ) . (30) Let η b e the joint distribution of ( V σ ( i ) ) i ≤ k and let f : [0 , 1] k → [0 , 1] b e defined b y f ( x 1 , . . . , x k ) := Q k i =1 x i . (31) T o complete the pro of, we now show that Z f dη = E Q k i =1 V σ ( i ) (32) is a c.e. real relative to µ . Note that η is computable under the righ t order top ol- ogy relativ e to µ . F urthermore, f is order-preserving (in eac h dimension) and lo w er-semicon tin uous, i.e., is a con tinuous (and ob viously computable) function from ([0 , 1] k , I k < ) to ([0 , 1] , I < ). Therefore, b y Lemma 3.8 , we hav e that R f dη is a c.e. real relativ e to µ . W e will first prov e the main theorem under the additional hypothesis that the directing random measure is almost surely contin uous. W e then sk etc h a randomized argument that succeeds with probabilit y one. Finally , w e presen t the pro of of the main result, which can b e seen as a derandomization. 16 5.1. Almost Sur ely Continuous Dir e cting R andom Me asur es F or k ≥ 1 and ψ ∈ L k R , w e sa y that ψ is a ν -c ontinuity set when, for i ≤ k , w e ha v e ν ( ∂ ψ ( i )) = 0 a.s., where ∂ ψ ( i ) denotes the b oundary of ψ ( i ). Lemma 5.2. R elative to χ , the mixe d moments of { V τ } τ ∈L Q ar e uniformly c.e. r e als and the mixe d moments of { V τ } τ ∈L Q ar e uniformly c o-c.e. r e als; in p ar- ticular, if σ ∈ L k Q (for k ≥ 1 ) is a ν -c ontinuity set, then the mixe d moment E Q k i =1 V σ ( i ) is a c omputable r e al, uniformly in k and σ . Proof. Let k ≥ 1 and σ ∈ L k Q . All claims are uniform in k and σ . By Corol- lary 2.5 , E Q k i =1 V σ ( i ) = P T k i =1 { X i ∈ σ ( i ) } , (33) whic h is a c.e. real relative to χ . The set σ is a co-c.e. closed set in R k b ecause w e can computably en umerate all τ ∈ L k Q con tained in the complemen t of σ . Therefore, E Q k i =1 V σ ( i ) = P T k i =1 { X i ∈ σ ( i ) } (34) is the measure of a co-c.e. closed set, hence a co-c.e. real relative to χ . When σ is a ν -con tinuit y set, E Q k i =1 V σ ( i ) = E Q k i =1 V σ ( i ) , (35) and so the exp ectation is a computable real relative to χ . Prop osition 5.3 (Almost surely con tinuous directing random measure). Assume that ν is almost sur ely c ontinuous. Then µ is c omputable r elative to χ . Proof. Let k ≥ 1 and σ ∈ L k Q . The almost sure con tin uit y of ν implies that σ is an ν -con tin uit y s et. Therefore, b y Lemma 5.2 , the moment E Q k i =1 V σ ( i ) is a computable real relativ e to χ , uniformly in k and σ . The computable momen t theorem (Theorem 4.2 ) then implies that the join t distribution of the v ariables { V τ } τ ∈L Q is computable under the standard top ology relative to χ , and so their join t distribution is also computable under the (coarser) righ t order top ology relativ e to χ . By Corollary 3.7 , this implies that µ is computable relative to χ . 17 5.2. “R andomize d” Pr o of Sketch In general, the joint distribution of { V σ } σ ∈L Q is not computable under the standard top ology b ecause the directing random measure ν may , with nonzero probabilit y , hav e a point mass on a rational. In this case, the mixed moments of { V τ } τ ∈L Q are c.e., but not co-c.e., reals relativ e to χ . As a result, the com- putable momen t theorem (Theorem 4.2 ) is inapplicable. F or arbitrary directing random measures, w e giv e a pro of of the computable de Finetti theorem that w orks regardless of the lo cation of p oint masses. Consider the following sketc h of a “randomized algorithm”: W e indep endently sample a coun tably infinite sequence of real n umbers A from a computable, abso- lutely contin uous distribution that has supp ort ev erywhere on the real line (e.g., a Gaussian or Cauch y). Let L A denote the lattice generated by op en interv als with endp oints in A . Note that, with probability one, A will b e dense in R and ev ery ψ ∈ L A will b e a ν -contin uity set. If the algorithm pro ceeds analogously to the case where ν is almost surely con tinuous, using L A as our basis, rather than L Q , then it will compute the de Finetti measure with probabilit y one. Let A b e a dense sequence of reals suc h that ν ( A ) = 0 a.s. Consider the v ari- ables V ζ defined in terms of elemen ts ζ of the new basis L A (defined analogously to L A ). W e begin by proving an extension of Lemma 5.2 : The mixed moments of the set of v ariables { V ζ } ζ ∈L A are computable relativ e to A and χ . Lemma 5.4. L et k ≥ 1 and ψ ∈ L k A . The mixe d moment E Q k i =1 V ψ ( i ) is a c omputable r e al r elative to A and χ , uniformly in k and ψ . Proof. Let k ≥ 1 and ψ ∈ L k A . All claims are uniform in k and ψ . W e first sho w that, relativ e to A and χ , the mixed momen ts of { V ζ } ζ ∈L A are uniformly c.e. reals. W e can compute (relative to A ) a sequence σ 1 , σ 2 , . . . ∈ L k Q (36) suc h that comp onen t wise for each n ≥ 1, σ n ⊆ σ n +1 and S m σ m = ψ . (37) Note that if ζ , ϕ ∈ L Q satisfy ζ ⊆ ϕ , then V ζ ≤ V ϕ (a.s.), and so, b y the con- tin uit y of measures (and of m ultiplication), Q k i =1 V σ n ( i ) con v erges from b elow to Q k i =1 V ψ ( i ) with probability one. Therefore, the dominated conv ergence theorem giv es us E Q k i =1 V ψ ( i ) = sup n E Q k i =1 V σ n ( i ) . (38) Using Corollary 2.5 , we see that the expectation E Q k i =1 V σ n ( i ) is a c.e. real relativ e to A and χ , uniformly in n , and so the supremum ( 38 ) is a c.e. real relativ e to A and χ . 18 Similarly , the mixed moments of { V ζ } ζ ∈L A are uniformly co-c.e. reals relative to A and χ , as can b e seen via a sequence of nested unions of rational interv als whose union has complement equal to ψ . Th us, b ecause ψ is a ν -contin uity set, the mixed momen t E Q k i =1 V ψ ( i ) is a computable real relativ e to A and χ . Lemma 5.5. The de Finetti me asur e µ is c omputable r elative to A and χ . Proof. It follows immediately from Lemma 5.4 and Theorem 4.2 that the joint distribution of { V ψ } ψ ∈L A is computable relative to A and χ . This joint distribu- tion classically determines the de Finetti measure. Moreo ver, as w e no w show, w e can compute (relativ e to A and χ ) the desired represen tation with resp ect to the (original) rational basis. In particular, w e prov e that the joint distribution of { V τ } τ ∈L Q is computable under the righ t order top ology relative to A and χ . Let m, k ≥ 1, let τ ∈ L k Q , and let C = ( c ij ) ∈ Q m × k . W e will express τ as a union of elemen ts of L k A . Note that τ is an c.e. op en set (relative to A ) with resp ect to the basis L k A . In particular, w e can computably en umerate (relative to A , and uniformly in k and τ ) a sequence σ 1 , σ 2 , . . . ∈ L k A suc h that ∪ n σ n = τ and σ n ⊆ σ n +1 . Note that V τ ( j ) ≥ V σ n ( j ) (a.s.) for all n ≥ 1 and j ≤ k . By the con tin uit y of measures (and of union and in tersection), P S m i =1 T k j =1 { V τ ( j ) > c ij } = sup n P S m i =1 T k j =1 { V σ n ( j ) > c ij } . (39) The probabilit y P S m i =1 T k j =1 { V σ n ( j ) > c ij } is a c.e. real relative to A and χ , uniformly in n , m , k , τ , and C , and so the supremum ( 39 ) is a c.e. real relative to A and χ , uniformly in m , k , τ , and C . Let Φ denote the map taking ( A, χ ) to µ , as describ ed in Lemma 5.5 . Recall that A is a random dense sequence with a computable distribution, as defined ab o v e, and let ˆ µ = Φ( A , χ ). Then ˆ µ is a random v ariable, and more- o v er, ˆ µ = µ almost surely . How ever, while A is almost surely noncomputable, the distribution of A is computable, and so the distribution of ˆ µ is computable relativ e to χ . Exp ectations with resp ect to the distribution of ˆ µ can then be used to (deterministically) compute µ relativ e to χ . A proof along these lines could be made precise b y making M 1 ( M 1 ( M 1 ( R ))) in to a computable top ological space. Instead, in Section 5.3 , w e complete the pro of b y explicitly computing µ relativ e to χ in terms of the standard ratio- nal basis. This construction can b e seen as a “derandomization” of the ab ov e algorithm. Alternativ ely , the ab o v e sk etch could b e interpreted as a degenerate pr ob abilis- tic pr o c ess (see Sc hr¨ oder and Simpson [ 26 ]) that samples a name of the de Finetti measure with probability one. Sc hr¨ oder [ 23 ] shows that representations in terms 19 of probabilistic pro cesses are computably reducible to representations of com- putable distributions. The structure of the derandomized argument o ccurs in other pro ofs in com- putable analysis and probabilit y theory . W eihrauc h [ 25 , Thm. 3.6] prov es a com- putable in tegration result via an argumen t that could lik ewise b e seen as a deran- domization of an algorithm that densely sub divides the unit interv al at random lo cations to find contin uity sets. Bosserhoff [ 24 , Lem. 2.15] uses a similar argu- men t to compute a basis for a computable metric space, for which ev ery basis elemen t is a con tin uit y set; this suggests an alternativ e approach to completing our pro of. M ¨ uller [ 28 , Thm. 3.7] uses a similar construction to find op en h yp er- cub es such that for an y > 0, the probabilit y on their b oundaries is less than . These argumen ts also resem ble the proof of the classical Portman teau theorem [ 5 , Thm. 4.25], in which an uncoun table family of sets with disjoin t b oundaries is defined, almost all of whic h are con tin uit y sets. 5.3. “Der andomize d” Construction Let m, k ≥ 1 and C = ( c ij ) ∈ Q m × k . By an abuse of notation, we define 1 C : [0 , 1] k → [0 , 1] (40) to b e the indicator function for the set S m i =1 ( c i 1 , 1] × · · · × ( c ik , 1] . (41) F or n ∈ ω , we denote by p n,C the p olynomial p n,σ (as defined in Lemma 4.1 ), where σ := S m i =1 ( c i 1 , 2) × · · · × ( c ik , 2) ∈ L Q k . (42) Here, w e hav e arbitrarily chosen 2 > 1 so that the sequence of p olynomials { p n,C } n ∈ ω con v erges p oint wise from b elow to 1 C on [0 , 1] k . Let ~ x = ( x 1 , . . . , x k ) and ~ y = ( y 1 , . . . , y k ). W e can write p n,C ( ~ x ) = p + n,C ( ~ x ) − p − n,C ( ~ x ) , (43) where p + n,C and p − n,C are p olynomials with p ositive co efficien ts. Define the 2 k - v ariable p olynomial q n,C ( ~ x, ~ y ) := p + n,C ( ~ x ) − p − n,C ( ~ y ) . (44) W e denote q n,C ( V ϕ (1) , . . . , V ϕ ( k ) , V ζ (1) , . . . , V ζ ( k ) ) (45) b y q n,C ( V ϕ , V ζ ), and similarly with p n,C . 20 Prop osition 5.6. L et n ∈ ω , let k , m ≥ 1 , let σ ∈ L k Q , and let C ∈ Q m × k . Then E q n,C ( V σ , V σ ) is a c.e. r e al r elative to χ , uniformly in n , k , m , σ , and C . Proof. By Lemma 5.2 , relative to χ , and uniformly in n , k , m , σ , and C , each monomial of p + n,C ( V σ ) has a c.e. real exp ectation, and each monomial of p − n,C ( V σ ) has a co-c.e. real exp ectation, and so b y the linearit y of exp ectation E q n,C ( V σ , V σ ) is a c.e. real. In the final pro of we use the following dense partial order on pro ducts of L R . Definition 5.7. Let k ≥ 1. W e call ψ ∈ L k R a r efinement of ϕ ∈ L k R , and write ψ C ϕ , when ψ ( i ) ⊆ ϕ ( i ) (46) for all i ≤ k . W e are no w ready to pro v e the main theorem. Proof of Theorem 2.3 (Comput able de Finetti). The distribution χ (of the exchangeable sequence X ) is computable relativ e to the de Finetti measure µ by Prop osition 5.1 . W e now give a pro of of the other direction, sho wing that the join t distribution of { V σ } σ ∈L Q is computable under the right order top ology relativ e to χ , which by Corollary 3.7 will complete the pro of. Let k , m ≥ 1, let π ∈ L k Q , and let C = ( c ij ) ∈ Q m × k . F or ζ ∈ L k R , let V ζ denote the k -tuple ( V ζ (1) , . . . , V ζ ( k ) ) and similarly for V ζ . T ake 1 C to b e defined as ab o v e in ( 40 ) and ( 41 ). It suffices to show that P S m i =1 T k j =1 { V π ( j ) > c ij } = E 1 C ( V π ) (47) is a c.e. real relative to χ , uniformly in k , m , π , and C . W e do this b y a series of reductions, whic h results in a suprem um o ver quan tities of the form E q n,C ( V σ , V σ ) for σ ∈ L k Q . By the densit y of the reals and the con tin uit y of measures, w e ha v e that V π = sup ψ C π V ψ a . s ., (48) where ψ ranges ov er L k R . It follows that 1 C ( V π ) = sup ψ C π 1 C ( V ψ ) a . s ., (49) 21 b ecause 1 C is lo wer-semicon tinuous and order-preserving (in each dimension), as ( 41 ) is an open set in the right order top ology on [0 , 1] k . Therefore, by the dominated con v ergence theorem, we hav e that E 1 C ( V π ) = sup ψ C π E 1 C ( V ψ ) . (50) Recall that the p olynomials { p n,C } n ∈ ω con v erge point wise from b elo w to 1 C in [0 , 1] k . Therefore, by the dominated conv ergence theorem, E 1 C ( V ψ ) = sup n E p n,C ( V ψ ) . (51) As V ψ ( i ) ≥ V ψ ( i ) a.s. for i ≤ k , we hav e that E p n,C ( V ψ ) = E p + n,C ( V ψ ) − E p − n,C ( V ψ ) (52) ≥ E p + n,C ( V ψ ) − E p − n,C ( V ψ ) . (53) Note that if ψ is a ν -con tin uit y set, then V ψ ( i ) = V ψ ( i ) a.s., and so E p n,C ( V ψ ) = E p + n,C ( V ψ ) − E p − n,C ( V ψ ) . (54) Again, dominated con v ergence theorem giv es us E Q k i =1 V ψ ( i ) = sup σ C ψ E Q k i =1 V σ ( i ) and (55) E Q k i =1 V ψ ( i ) = inf τ B ψ E Q k i =1 V τ ( i ) , (56) where σ and τ range ov er L k Q . Therefore, by the linearity of exp ectation, E p + n,C ( V ψ ) = sup σ C ψ E p + n,C ( V σ ) and (57) E p − n,C ( V ψ ) = inf τ B ψ E p − n,C ( V τ ) , (58) and so, if ψ is a ν -con tin uit y set, we hav e that E p n,C ( V ψ ) = sup σ C ψ E p + n,C ( V σ ) − inf τ B ψ E p − n,C ( V τ ) (59) = sup σ C ψ C τ E q n,C ( V σ , V τ ) . (60) Because ν has at most coun tably man y p oint masses, those ψ ∈ I k R that are ν -con tinuit y sets are dense in I k Q . On the other hand, for those ψ that are not ν -con tinuit y sets, ( 60 ) is a low er b ound, as can b e shown from ( 53 ). Therefore, sup ψ C π E p n,C ( V ψ ) = sup ψ C π sup σ C ψ C τ E q n,C ( V σ , V τ ) . (61) 22 Note that { ( σ, τ ) : ( ∃ ψ C π ) σ C ψ C τ } = { ( σ , τ ) : σ C π and σ C τ } . Hence sup ψ C π sup σ C ψ sup τ B ψ E q n,C ( V σ , V τ ) = sup σ C π sup τ B σ E q n,C ( V σ , V τ ) . (62) Again b y dominated con v ergence we hav e sup τ B σ E q n,C ( V σ , V τ ) = E q n,C ( V σ , V σ ) . (63) Com bining ( 47 ), ( 50 ), ( 51 ), ( 61 ), ( 62 ), and ( 63 ), we hav e E 1 C ( V π ) = sup n sup σ C π E q n,C ( V σ , V σ ) . (64) Finally , b y Prop osition 5.6 , the exp ectation E q n,C ( V σ , V σ ) (65) is a c.e. real relativ e to χ , uniformly in σ , n , k , m , π , and C . Hence the supremum ( 64 ) is a c.e. real relativ e to χ , uniformly in k , m , π , and C . 6. Exc hangeabilit y in Probabilistic F unctional Programming Languages The computable de Finetti theorem has implications for the semantics of prob- abilistic functional programming languages, and in particular, gives conditions under which it is p ossible to eliminate mo difications of non-lo cal state. F urther- more, an implemen tation of the computable de Finetti theorem itself performs this co de transformation automatically . F or context, w e pro vide some background on probabilistic functional pro- gramming languages. W e then describe the co de transformation p erformed b y the computable de Finetti theorem, using the example of the P´ oly a urn and Beta- Bernoulli pro cess discussed earlier. Finally , w e discuss partial exchangeabilit y and its role in recen t mac hine learning applications. 6.1. Pr ob abilistic F unctional Pr o gr amming L anguages F unctional programming languages with probabilistic choice operators ha ve recen tly been prop osed as univ ersal languages for statistical modeling (e.g., IBAL [ 31 ], λ ◦ [ 32 ], Churc h [ 33 ], and HANSEI [ 34 ]). Within domain theory , researchers ha v e considered idealized functional languages that can manipulate exact real n um b ers, such as Escard´ o’s RealPCF+ [ 35 ] (based on Plotkin [ 36 ]), and func- tional languages hav e also b een extended by probabilistic choice op erators (e.g., b y Escard´ o [ 37 ] and Saheb-Djahromi [ 38 ]). The seman tics of probabilistic programs hav e b een studied extensively in the- oretical computer science in the context of randomized algorithms, probabilistic 23 mo del chec king, and other areas. How ever, the application of probabilistic pro- grams to universal statistical mo deling has a somewhat different character from m uc h of the other w ork on probabilistic programming languages. In Bay esian analysis, the goal is to use observed data to understand unob- serv ed v ariables in a probabilistic mo del. This type of inductive reasoning, from evidence to h yp othesis, can b e thought of as inferring the hidden states of a pro- gram that generates the observed output. One sp eaks of the c onditional exe cution of probabilistic programs, in which they are “run backw ards” to sample from the conditional probabilit y distribution giv en the observed data. A wide v ariety of algorithms implemen t conditional inference in probabilis- tic functional programming. Go o dman et al. [ 33 ] describ e the language Churc h, whic h extends a pure subset of Scheme, and whose implementation MIT-Ch urc h p erforms appro ximate conditional execution via Marko v chain Monte Carlo (which can b e thought of as a random walk o v er the execution of a Lisp machine). P ark, Pfenning, and Thrun [ 32 ] describ e the language λ ◦ , which extends OCaml, and they implement approximate conditional execution by Monte Carlo imp ortance sampling. Ramsey and Pfeffer [ 39 ] describe a sto c hastic lam b da calculus whose seman tics are giv en b y me asur e terms , which supp ort the efficient computation of conditional exp ectations. Finally , in nonparametric Bay esian statistics, higher-order distributions (e.g., distributions on distributions, or distributions on trees) arise naturally , and so it is helpful to work in a language that can express these t yp es. Probabilistic functional programming languages are therefore a conv enient c hoice for expressing nonparametric mo dels. The representation of distributions by randomized algorithms that produce samples can highlight algorithmic issues. F or example, a distribution will, in general, ha ve many differen t represen tations as a probabilistic program, each with its o wn time, space, and entrop y complexit y . F or example, both w a ys of sampling a Beta-Bernoulli pro cess describ ed in Section 2.1 can b e represen ted in, e.g., the Ch urc h probabilistic programming language. One of the questions that motiv ated the present w ork was whether there is alwa ys an algorithm for sampling from the de Finetti measure when there is an algorithm for sampling the exchangeable sequence. This question was first raised b y Ro y et al. [ 40 ]. The computable de Finetti theorem answ ers this question in the affirmative, and, furthermore, sho ws that one can mov e b etw een these representations automatically . In the follo wing section, w e pro vide a concrete example of the representational change made p ossible by the computable de Finetti transformation, using the syntax of the Ch urc h probabilistic programming language. 24 6.2. Co de T r ansformations Ch urc h extends a pure subset of Scheme (a dialect of Lisp) with a sto chastic, binary-v alued 2 flip pro cedure, calls to which return indep endent, Bernoulli( 1 2 )- distributed random v alues in { 0 , 1 } . Using the semantics of Churc h, it is p ossible to asso ciate every closed Ch urc h expression (i.e., one without free v ariables) with a distribution on v alues. F or example, ev aluations of the expression (+ ( flip ) ( flip ) ( flip )) pro duce samples from the Binomial( n = 3 , p = 1 2 ) distribution, while ev aluations of ( λ (x) (if (= 1 ( flip )) x 0)) alw a ys return a pro cedure, applications of which b eha v e like the probability k ernel x 7→ 1 2 ( δ x + δ 0 ), where δ r denotes the Dirac measure concentrated on the real r . Ch urc h is call-by-v alue and so ev aluations of (= ( flip ) ( flip )) return true and false with equal probabilit y , while the application of the pro- cedure ( λ (x) (= x x)) to the argumen t ( flip ) , written (( λ (x) (= x x)) ( flip )) , alw a ys returns true . (F or more examples, see [ 33 ].) In Sc heme, unlike Ch urc h, one can mo dify the state of a non-lo cal v ariable using m utation via the set! pro cedure. (In functional programming languages, non-lo cal state ma y b e impleme n ted via other metho ds. F or example, in Haskell, one could use the state monad.) If we consider introducing a set! op erator to Ch urc h, thereby allowing a pro cedure to mo dify its environmen t using m utation, it is not clear ho w one can, in a manner similar to abov e, asso ciate pro cedures with probabilit y kernels and closed expressions with distributions. F or example, a pro cedure could then keep a counter v ariable and return an increasing sequence of in tegers on rep eated calls. Such a procedure w ould not correspond with a probabilit y kernel. A generic wa y to translate co de with mutation into co de without m utation is to p erform a state-passing transformation, where the state is explicitly threaded 2 The original Churc h pap er defined the flip pro cedure to return true or false , but it is easy to mo v e betw een these t wo definitions. 25 throughout the program. In particular, a v ariable represen ting state is passed in to all pro cedures as an additional argumen t, transformed in lieu of set! op era- tions, and returned alongside the original return v alues at the end of pro cedures. Under suc h a transformation, the pro cedure in the counter v ariable example w ould b e transformed into one that accepted the current count and returned the incremen ted count. One downside of such a transformation is that it obscures conditional indep endencies in the program, and thus complicates inference from an algorithmic standp oin t. An alternativ e transformation is made p ossible by the c omputable de Finetti theorem, whic h implies that a particular type of exchange able m utation can b e remo v ed without requiring a state-passing transformation. F urthermore, this alternativ e transformation exp oses the conditional independencies. The rest of this section describ es a concrete example of this alternative transformation, and builds on the mathematical c haracterization of the Beta-Bernoulli pro cess and the P´ olya urn scheme as describ ed in Section 2.1 . Recall that the P´ oly a urn sc heme induces the Beta-Bernoulli pro cess, whic h can also b e describ ed directly as a sequence of indep enden t Bernoulli random v ariables with a shared parameter sampled from a Beta distribution. In Ch urc h it is p ossible to write co de corresp onding to b oth descriptions, but expressing the P´ olya urn scheme without the use of mutation requires that w e k eep track of the coun ts and thread these v alues throughout the sequence. If instead we introduce the set! op erator and track the n um b er of red and blac k balls b y m utating non-lo cal state, we can compactly represent the P´ olya urn scheme in a wa y that mirrors the form of the more direct description using Beta and Bernoulli random v ariables. Fix a, b > 0, and define sample-beta-coin and sample-p´ olya-coin as fol- lo ws: (i) (define (sample-beta-coin) (let ((weight ( beta a b))) ( λ () ( flip weight)) ) ) (ii) (define (sample-p´ olya-coin) (let ((red a) (total (+ a b)) ) ( λ () (let ((x ( flip red total ))) (set! red (+ red x)) (set! total (+ total 1)) x ) ) ) Recall that, giv en a Ch urc h expression E , the ev aluation of the ( λ () E ) sp ecial form in an environmen t ρ creates a pro cedure of no argumen ts whose application results in the ev aluation of the expression E in the en vironmen t ρ . The applica- tion of either sample-beta-coin or sample-p´ olya-coin returns a pro cedure of 26 no arguments whose application returns (random) binary v alues. In particular, if w e sample t w o pro cedures my-beta-coin and my-p´ olya-coin via (define my-beta-coin (sample-beta-coin)) (define my-p´ olya-coin (sample-p´ olya-coin)) then repeated applications of b oth my-beta-coin and my-p´ olya-coin pro duce random binary sequences that are Beta-Bernoulli pro cesses. Ev aluating (my-beta-coin) returns 1 with probability weight and 0 other- wise, where the shared weight parameter is itself drawn from a Beta( a, b ) distri- bution on [0 , 1]. The sequence ind uced b y repeated applications of my-beta-coin is exchangeable b ecause applications of flip return indep enden t samples. Note that the sequence is not i.i.d.; for example, an initial sequence of ten 1’s w ould lead one to predict that the next application is more lik ely to return 1 than 0. How ever, conditioned on weight (a v ariable hidden within the opaque pro- cedure my-beta-coin ) the sequence is i.i.d. If w e sample another procedure, my-other-beta-coin , via (define my-other-beta-coin (sample-beta-coin)) then its corresp onding weight v ariable will b e independent, and so rep eated applications will generate a sequence that is indep enden t of that generated b y my-beta-coin . The code in (ii) implements the P´ olya urn sc heme with a red balls and b blac k balls (see [ 16 , Chap. 11.4]), and so the sequence of return v alues from rep eated applications of my-p´ olya-coin is exc hangeable. Therefore, de Finetti’s theorem implies that the distribution of the sequence is equiv alent to that induced b y i.i.d. dra ws from the directing random measure. In the case of the P´ olya urn scheme, we kno w that the directing random measure is a random Bernoulli measure whose parameter has a Beta( a, b ) distribution. In fact, the (random) distribution of eac h sample pro duced by my-beta-coin is such a random Bernoulli measure. Informally , w e can therefore think of sample-beta-coin as the de Finetti measure of the Beta-Bernoulli pro cess. Although the distributions on sequences induced by my-beta-coin and my-p´ olya-coin are identical, there is an imp ortant seman tic difference b et w een these t w o implemen tations caused by the use of set! . While applications of sample-beta-coin pro duce samples from the de Finetti measure in the sense describ ed ab ov e, applications of sample-p´ olya-coin do not; successiv e appli- cations of my-p´ olya-coin pro duce samples from differen t distributions, none of whic h is the directing random measure for the sequence (a.s.). In particular, the distribution on return v alues changes each iteration as the sufficient statis- tics are up dated (using the m utation op erator set! ). In con trast, applications 27 of my-beta-coin do not mo dify non-lo cal state; in particular, the sequence pro- duced by such applications is i.i.d. conditioned on the v ariable weight , whic h do es not change during the course of execution. An implementation of the computable de Finetti theorem (Theorem 2.3 ), sp ecialized to the case of binary sequences (in which case the de Finetti measure is a distribution on Bernoulli measures and is th us determined b y the distribution on [0 , 1] of the random probability assigned to the v alue 1), transforms (ii) into a m utation-free pro cedure whose return v alues ha v e the same distribution as that of the samples pro duced by ev aluating ( beta a b) . In the general case, given a program that generates an exchangeable sequence of reals, an implemen tation of the computable de Finetti theorem produces a m utation-free pro cedure generated-code such that applications of the pro cedure sample-directing-random-measure defined b y (define (sample-directing-random-measure) (let ((shared-randomness ( uniform 0 1))) ( λ () (generated-code shared-randomness)) ) ) sample from the de Finetti measure in the sense describ ed ab ov e. In partic- ular, (ii) would b e transformed into a pro cedure generated-code suc h that the sequences produced b y rep eated applications of the pro cedures returned b y sample-beta-coin and sample-directing-random-measure ha ve the same dis- tribution. In addition to their simpler seman tics, m utation-free pro cedures are often desirable for practical reasons. F or example, having sampled the directing ran- dom measure, an exc hangeable sequence of random v ariables can b e efficiently sampled in parallel without the ov erhead necessary to comm unicate sufficien t statistics. Mansinghk a [ 41 ] describ es some situations where one can exploit condi- tional indep endence and exchangeabilit y in probabilistic programming languages for impro v ed parallel execution. 6.3. Partial Exchange ability of Arr ays and Other Data Structur es The example ab ov e inv olved binary sequences, but the computable de Finetti theorem can be used to transform implemen tations of real exc hangeable se- quences. Consider the follo wing exc hangeable sequence whose com binatorial structure is kno wn as the Chinese restauran t pro cess (see Aldous [ 15 ]). Let α > 0 b e a computable real and let H be a computable distribution on R . F or n ≥ 1, eac h X n is sampled in turn according to the conditional distribution P [ X n +1 | X 1 , . . . , X n ] = 1 n + α α H + n X i =1 δ X i a.s. (66) 28 The sequence { X n } n ≥ 1 is exc hangeable and the directing random measure is a Diric hlet pro cess whose “base measure” is αH . Given such a program, w e can automatically recov er the underlying Diric hlet pro cess prior, samples from which are random measures whose discrete structure was characterized b y Sethura- man’s “stick-breaking construction” [ 42 ]. Note that the random measure is not pro duced in the same manner as Sethuraman’s construction and certainly is not of closed form. But the resulting mathematical ob jects hav e the same structure and distribution. Exc hangeable sequences of random ob jects other than reals can often be giv en de Finetti-t yp e representations. F or example, the Indian buffet pro cess, defined b y Griffiths and Ghahramani [ 43 ], is the com binatorial pro cess underlying a set- value d exchangeable sequence that can be writt en in a w a y analogous to the P´ oly a urn in (ii) . Just as the Chinese restaurant pro cess gives rise to the Dirichlet pro cess, the Indian buffet pro cess gives rise to the Beta pro cess (see Thibaux and Jordan [ 44 ] for more details). In the case where the “base measure” of the underlying Beta pro cess is dis- cr ete , the resulting exc hangeable sequence of sets corresponds to an exc hangeable sequence of inte ger indices (enco ding finite subsets of the coun table supp ort of the discrete base measure). If we are given suc h a represen tation, the computable de Finetti theorem implies the existence of a computable de Finetti measure. Ho w ev er, the case of a general base measure is more complicated. A “stick- breaking construction” of the Indian buffet pro cess giv en b y T eh, G¨ or ¨ ur, and Ghahramani [ 45 ] is analogous to the co de in (i) , but samples only a ∆ 1 -index for the (a.s. finite) sets, rather than a canonical index (see Soare [ 18 , I I.2]); how ever, man y applications dep end on having a canonical index. These observ ation w ere first noted by Ro y et al. [ 40 ]. Similar problems arise when using the Inv erse L ´ evy Measure metho d [ 46 ] to construct the Indian buffet pro cess. The computable de Finetti theorem is not directly applicable in this case b ecause the theorem p ertains only to exchangeable sequences of real random v ariables, not random sets, although an extension of the theorem to computable P olish spaces migh t suffice. Com binatorial structures other than sequences hav e b een given de Finetti- t yp e represen tational theorems based on notions of p artial exc hangeabilit y . F or example, an array of random v ariables is called sep ar ately (or jointly ) exchange- able when its distribution is in v arian t under (simultaneous) p ermutations of the ro ws and columns and their higher-dimensional analogues. Nearly fifty years af- ter de Finetti’s result, Aldous [ 47 ] and Ho ov er [ 48 ] show ed that the entries of an infinite array satisfying either separate or joint exchangeabilit y are conditionally i.i.d. These results hav e b een connected with the theory of graph limits by Dia- conis and Janson [ 49 ] and Austin [ 50 ] by considering the adjacency matrix of an exc hangeable random graph. 29 As w e hav e seen with the Beta-Bernoulli pro cess and other examples, struc- tured probabilistic mo dels can often b e represented in multiple wa ys, eac h with its own adv an tages (e.g., representational simplicity , comp ositionality , inherent parallelism, etc.). Extensions of the computable de Finetti theorem to partially exc hangeable settings could pro vide analogous transformations b et w een represen- tations on a wider range of data structures, including many that are increasingly used in practice. F or example, the Infinite Relational Model [ 51 ] can b e view ed as an urn scheme for a partially exchangeable arra y , while the hierarc hical stochastic blo c k mo del constructed from a Mondrian pro cess in [ 52 ] is describ ed in a w ay that mirrors the Aldous-Ho o v er representation, making the conditional indep en- dence explicit. A cknow le dgements C.E.F. has b een partially supp orted b y NSF Grant No. DMS-0901020, and D.M.R. has b een partially supported b y an NSF Graduate Researc h F ellowship. Some of the results in this pap er were presen ted at the Computability in Eur op e con- ference in Heidelb erg, German y , July 19–24, 2009, and an extended abstract [ 53 ] w as published in the pro ceedings. The authors w ould lik e to thank Nate Ac k- erman, Oleg Kisely ov, Vik ash Mansinghk a, Hartley Rogers, Ch ung-chieh Shan, and the anon ymous referees of b oth the extended abstract and the presen t article for helpful commen ts. References [1] M. Brav erman, S. Co ok, Computing ov er the reals: foundations for scientific computing, Notices Amer. Math. Soc. 53 (3) (2006) 318–329. 2 [2] V. Brattk a, P . Hertling, K. W eihrauch, A tutorial on computable analysis, in: S. B. Co op er, B. L¨ ow e, A. Sorbi (Eds.), New computational paradigms: changing conceptions of what is computable, Springer, Berlin, 2008. 2 [3] A. Edalat, Domains for computation in mathematics, physics and exact real arithmetic, Bull. Sym b olic Logic 3 (4) (1997) 401–452. 2 [4] P . Billingsley , Probability and measure, John Wiley & Sons Inc., New Y ork, third edn., 1995. 3 [5] O. Kallenberg, F oundations of modern probabilit y , Springer, New Y ork, second edn., 2002. 3 , 5 , 6 , 20 [6] O. Kallenberg, Probabilistic symmetries and inv ariance principles, Springer, New Y ork, 2005. 3 , 4 [7] A. P . Dawid, Intersub jectiv e statistical mo dels, in: Exchangeabilit y in probabilit y and statistics (Rome, 1981), North-Holland, Amsterdam, 217–232, 1982. 4 [8] S. L. Lauritzen, Extreme p oint mo dels in statistics, Scand. J. Statist. 11 (2) (1984) 65–91. 4 [9] B. de Finetti, F unzione caratteristica di un fenomeno aleatorio, Atti della R. Accademia Nazionale dei Lincei, Ser. 6. Memorie, Classe di Scienze Fisiche, Matematiche e Naturali 4 (1931) 251–299. 4 30 [10] B. de Finetti, La pr´ evision : ses lois logiques, ses sources sub jectiv es, Ann. Inst. H. P oincar´ e 7 (1) (1937) 1–68. 4 [11] E. Hewitt, L. J. Sa v age, Symmetric measures on Cartesian pro ducts, T rans. Amer. Math. So c. 80 (1955) 470–501. 4 [12] C. Ryll-Nardzewski, On stationary sequences of random v ariables and the de Finetti’s equiv alence, Collo q. Math. 4 (1957) 149–156. 4 [13] J. F. C. Kingman, Uses of exchangeabilit y , Ann. Probabilit y 6 (2) (1978) 183–197. 4 [14] P . Diaconis, D. F reedman, Partial exchangeabilit y and sufficiency , in: Statistics: applica- tions and new directions (Calcutta, 1981), Indian Statist. Inst., Calcutta, 205–236, 1984. 4 [15] D. J. Aldous, Exc hangeability and related topics, in: ´ Ecole d’´ et´ e de probabilit ´ es de Saint- Flour, XI I I—1983, Lecture Notes in Math., vol. 1117, Springer, Berlin, 1–198, 1985. 4 , 28 [16] B. de Finetti, Theory of probability . Vol. 2, John Wiley & Sons Ltd., London, 1975. 5 , 27 [17] H. Rogers, Jr., Theory of recursive functions and effective computabilit y , MIT Press, Cam- bridge, MA, second edn., 1987. 7 [18] R. I. Soare, Recursiv ely en umerable sets and degrees, Perspectives in Mathematical Logic, Springer-V erlag, Berlin, 1987. 7 , 29 [19] T. Grubba, M. Schr¨ oder, K. W eihrauch, Computable metrization, Math. Logic Q. 53 (4-5) (2007) 381–395. 8 [20] I. Battenfeld, M. Schr¨ oder, A. Simpson, A conv enien t category of domains, in: Computa- tion, meaning, and logic: articles dedicated to Gordon Plotkin, v ol. 172 of Ele ctr on. Notes The or. Comput. Sci. , Elsevier, Amsterdam, 69–99, 2007. 8 [21] K. W eihrauch, Computable analysis: an in tro duction, Springer, Berlin, 2000. 8 , 9 , 10 [22] K. W eihrauch, X. Zheng, Computability on contin uous, low er semi-contin uous and upp er semi-con tinuous real functions, Theoret. Comput. Sci. 234 (1-2) (2000) 109–133. 10 [23] M. Schr¨ oder, Admissible represen tations for probabilit y measures, Math. Logic Q. 53 (4-5) (2007) 431–445. 10 , 11 , 13 , 19 [24] V. Bosserhoff, Notions of probabilistic computability on represented spaces, J. of Univ ersal Comput. Sci. 14 (6) (2008) 956–995. 10 , 20 [25] K. W eihrauch, Computability on the probability measures on the Borel sets of the unit in terv al, Theoret. Comput. Sci. 219 (1–2) (1999) 421–437. 10 , 11 , 13 , 20 [26] M. Schr¨ oder, A. Simpson, Representing probability measures using probabilistic processes, J. Complex. 22 (6) (2006) 768–782. 11 , 19 [27] V. Brattk a, G. Presser, Computability on subsets of metric spaces, Theor. Comput. Sci. 305 (1-3) (2003) 43–76. 11 [28] N. T. M ¨ uller, Computability on random v ariables, Theor. Comput. Sci. 219 (1-2) (1999) 287–299. 12 , 13 , 14 , 20 [29] K. W eihrauch, On computable metric spaces Tietze-Urysohn extension is computable, in: J. Blanc k, V. Brattk a, P . Hertling (Eds.), Computability and Complexity in Analysis, 4th In ternational W orkshop, CCA 2000, Swansea, UK, Septem b er 17-19, 2000, Selected P ap ers, v ol. 2064 of L e ctur e Notes in Comput. Sci. , Springer, 357–368, 2000. [30] M. B. P our-El, J. I. Richards, Computability in analysis and ph ysics, Springer, Berlin, 1989. 15 [31] A. Pfeffer, IBAL: A probabilistic rational programming language, in: Pro c. of the 17th Int. Join t Conf. on Artificial In telligence, Morgan Kaufmann Publ., 733–740, 2001. 23 [32] S. Park, F. Pfenning, S. Thrun, A probabilistic language based on sampling functions, A CM T rans. Program. Lang. Syst. 31 (1) (2008) 1–46. 23 , 24 [33] N. D. Go odman, V. K. Mansinghk a, D. M. Ro y , K. Bonawitz, J. B. T enenbaum, Churc h: a language for generative mo dels, in: Uncertaint y in Artificial Intelligence, 2008. 23 , 24 , 25 [34] O. Kiselyo v, C. Shan, Embedded probabilistic programming, in: W. M. T aha (Ed.), 31 Domain-Sp ecific Languages, IFIP TC 2 W orking Conference, DSL 2009, Oxford, UK, July 15-17, 2009, Pro ceedings, vol. 5658 of L e ctur e Notes in Comput. Sci. , Springer, 360–384, 2009. 23 [35] M. Escard´ o, T. Streicher, Induction and recursion on the partial real line with applications to Real PCF, Theoret. Comput. Sci. 210 (1) (1999) 121–157. 23 [36] G. D. Plotkin, LCF considered as a programming language, Theoret. Comput. Sci. 5 (3) (1977/78) 223–255. 23 [37] M. Escard´ o, Semi-decidability of ma y , must and probabilistic testing in a higher-type set- ting, Electron. Notes in Theoret. Comput. Sci. 249 (2009) 219–242. 23 [38] N. Saheb-Djahromi, Probabilistic LCF, in: Mathematical F oundations of Computer Sci- ence, 1978 (Proc. Seven th Symp os., Zak opane, 1978), v ol. 64 of L e cture Notes in Comput. Sci. , Springer, Berlin, 442–451, 1978. 23 [39] N. Ramsey , A. Pfeffer, Stochastic lam b da calculus and monads of probability distributions, Pro c. of the 29th ACM SIGPLAN-SIGACT Symp. on Principles of Program. Lang. (2002) 154–165. 24 [40] D. M. Roy , V. K. Mansinghk a, N. D. Go o dman, J. B. T enenbaum, A sto chastic program- ming persp ective on nonparametric Ba yes, in: Nonparametric Bay esian W orkshop, In t. Conf. on Mac hine Learning, 2008. 24 , 29 [41] V. K. Mansinghk a, Nativ ely probabilistic computation, Ph.D. thesis, Massach usetts Insti- tute of T echnology , 2009. 28 [42] J. Sethuraman, A constructive definition of Dirichlet priors, Statistica Sinica 4 (1994) 639– 650. 29 [43] T. L. Griffiths, Z. Ghahramani, Infinite laten t feature models and the Indian buffet pro cess, in: Adv. in Neural Inform. Processing Syst. 17, MIT Press, Cam bridge, MA, 475–482, 2005. 29 [44] R. Thibaux, M. I. Jordan, Hierarchical b eta pro cesses and the Indian buffet pro cess, in: Pro c. of the 11th Conf. on A.I. and Stat., 2007. 29 [45] Y. W. T eh, D. G¨ or ¨ ur, Z. Ghahramani, Stick-breaking construction for the Indian buffet pro cess, in: Pro c. of the 11th Conf. on A.I. and Stat., 2007. 29 [46] R. L. W olp ert, K. Ickstadt, Sim ulation of L´ evy random fields, in: Practical nonparametric and semiparametric Bay esian statistics, v ol. 133 of L e ctur e Notes in Statist. , Springer, New Y ork, 227–242, 1998. 29 [47] D. J. Aldous, Representations for partially exc hangeable arrays of random v ariables, J. Multiv ariate Analysis 11 (4) (1981) 581–598. 29 [48] D. N. Ho ov er, Relations on probability spaces and arrays of random v ariables, preprint, Institute for Adv anced Study , Princeton, NJ, 1979. 29 [49] P . Diaconis, S. Janson, Graph limits and exchangeable random graphs, Rendicon ti di Matematica, Ser. VII 28 (1) (2008) 33–61. 29 [50] T. Austin, On exc hangeable random v ariables and the statistics of large graphs and h yp er- graphs, Probab. Surv. 5 (2008) 80–145. 29 [51] C. Kemp, J. T enenbaum, T. Griffiths, T. Y amada, N. Ueda, Learning systems of concepts with an infinite relational model, in: Pro c. of the 21st Nat. Conf. on Artificial In telligence, 2006. 30 [52] D. M. Roy , Y. W. T eh, The Mondrian pro cess, in: Adv. in Neural Inform. Pro cessing Syst. 21, 2009. 30 [53] C. E. F reer, D. M. Roy , Computable exchangeable sequences hav e computable de Finetti measures, in: K. Ambos-Spies, B. L¨ ow e, W. Merkle (Eds.), Mathematical Theory and Computational Practice (CiE 2009), Pro c. of the 5th Conf. on Computability in Europ e, v ol. 5635 of L e ctur e Notes in Comput. Sci. , Springer, 218–231, 2009. 30 32
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment