Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem

Distributions.jl : Deﬁnition and Mo deling of Probabilit y Distributions in the JuliaStats Ecosystem Mathieu Besançon Zuse Institute Berlin Theo dore P apamark ou The Univ ersity of Manc hester Da vid Anthoﬀ Univ ersity of California at Berk eley Alex Arslan Beacon Biosignals, Inc. Simon Byrne California Institute of T ec hnology Dah ua Lin Chinese Univ ersity of Hong K ong John P earson Duk e Universit y Medical Cen ter Abstract Random v ariables and their distributions are a cen tral part in many areas of statistical metho ds. The Distributions.jl pac kage pro vides Julia users and developers to ols for work- ing with probabilit y distributions, lev eraging Julia features for their in tuitive and ﬂexible manipulation, while remaining highly eﬃcient through zero-cost abstractions. K eywor ds : Julia , distributions, mo deling, interface, mixture, KDE, sampling, probabilistic programming, inference. 1. In tro duction The Distributions.jl pac kage ( JuliaStats 2019 ) deﬁnes interfaces for represen ting probabilit y distributions and other mathematical objects describing the generation of random samples. It is part of the JuliaStats op en-source organization hosting pac kages for core mathemati- cal and statistical functions ( StatsF uns.jl , PDMats.jl ), mo del ﬁtting ( GLM.jl for generalized linear mo dels, Hyp othesisT ests.jl , MixedMo dels.jl , KernelDensity .jl , Surviv al.jl ) and utilities ( Distances.jl , Rmath.jl ). Distributions.jl deﬁnes generic and sp eciﬁc b ehavior required for distributions and other “sample-able” objects. The package implemen ts a large n umber of distributions to b e di- rectly usable for probabilistic mo deling, estimation and simulation problems. It lev erages idiomatic features of Julia including multiple dispatc h and the type system, b oth presented in more detail in Bezanson, Edelman, Karpinski, and Shah ( 2017 ) and Bezanson et al. ( 2018 ). In man y applications, including but not limited to ph ysical sim ulation, mathematical opti- mization or data analysis, computer programs need to generate random num b ers from sample spaces or more sp eciﬁcally from a probabilit y distribution or to infer a probability distribution giv en prior knowledge and observ ed samples. Distributions.jl uniﬁes these use cases under one set of mo deling to ols. 2 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem A core idea of the package is to build an equiv alence b etw een mathematical ob jects and corresp onding Julia t yp es. A probability distribution is therefore represen ted by a type with a given b eha vior provided through the implementation of a set of metho ds. This equiv alence mak es the syntax of programs ﬁt the semantics of the mathematical ob jects. Related soft ware In the scipy.stats 1 mo dule of the SciPy project ( Virtanen et al. 2020 ), distributions are created and manipulated as ob jects in the object-oriented programming sense. Metho ds are deﬁned for a distribution class, then sp ecialized for contin uous and discrete distribution classes inheriting from the generic distribution, from which inherit classes representing actual distribution families. Represen tations of probabilit y distributions ha v e also b een implemen ted in statically-typed functional programming languages, in Hask ell in the Probabilistic F unctional Programming pac kage ( Erwig and Kollmansberger 2006 ), and in OCaml ( Kiselyo v and Shan 2009 ), supp ort- ing only discrete distributions as an asso ciation b etw een collection elemen ts and probabilities. The w ork in Ścibior, Ghahramani, and Gordon ( 2015 ) presents a generic monad-based repre- sen tation of probability distributions allowing for b oth contin uous and discrete distributions. The stats pac kage, whic h is distributed as part of the R language, includes functions related to probabilit y distributions which use a preﬁx naming conv en tion: r dist for random sampling, d dist for computing the probabilit y densit y , p dist for computing the cumulativ e distribution function, and q dist for computing the quantiles. The distr pac kage ( R uckdesc hel, K ohl, Stabla, and Camphausen 2006 ) also allo ws R users to deﬁne their own distribution as a class of the S4 ob ject-oriented system, with four functions r, d, p, q stored by the ob ject. Only one of the four functions has to b e pro vided b y the user when creating a distribution ob ject, the other functions are computed in a suitable wa y . This approac h increases ﬂexibilit y but implies a run time cost dep ending on whic h function has b een pro vided to deﬁne a given distribution ob ject. F or instance, when only the random generation function is provided, the RtoDPQ function empirically constructs an estimation for the others, which requires drawing samples instead of directly ev aluating an analytical density . In C++ , the b o ost library ( Bo ost Dev elop ers 2018 ), the maths comp onent includes common distributions and computations up on them. As in Distributions.jl , probabilit y distributions are represen ted by t yp es (classes), as opp osed to functions. The underlying numeric t yp es are rendered generic by the use of templates. The parameters of eac h distribution are accessed through exposed metho ds, while common op erations are deﬁned as non-mem b er functions, th us sharing a similar syn tax with single dispatch. 2 Design and implemen tation prop osals for a m ultiple dispatc h mec hanism in C++ hav e b een inv estigated in Pirk elbauer, Solo dkyy , and Stroustrup ( 2010 ) and describ ed as a library in Le Go c and Donzé ( 2015 ), which would allow for more sophisticated dispatc h rules in the Bo ost distribution interface. This pap er is organized as follo ws. Section 2 presen ts the main t yp es deﬁned in the package and their hierarch y . In Section 3 , the Sampleable t yp e and the asso ciated sampling interface are presen ted. Section 4 presen ts the distribution interface, whic h is the central part of the pac kage. Section 5 presents the av ailable to ols for ﬁtting and estimation of distributions 1 https://docs.scipy.org/doc/scipy/reference/stats.html 2 https://www.boost.org/doc/libs/1_69_0/libs/math/doc/html/math_toolkit/stat_tut/overview/ generic.html Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 3 from data using parametric and non-parametric tec hniques. Section 6 presents mo deling to ols and algorithms for mixtures of distributions. Section 7 highligh ts t w o applications of Distributions.jl in related pac kages for k ernel densit y estimation and the implemen tation of Probabilistic Programming Languages in pure Julia . Section 8 concludes on the w ork presen ted and on futu re dev elopment of the ecosystem for probability distributions in Julia . 2. T yp e hierarc h y The Julia language allows the deﬁnition of new t yp es and their use for specifying function argumen ts using the multiple dispatch mec hanism ( Zappa Nardelli, Bely ako v a, Pelenitsyn, Ch ung, Bezanson, and Vitek 2018 ). Most common probabilit y distributions can b e broadly classiﬁed along tw o facets: • the dimensionalit y of the v alues (e.g., univ ariate, multiv ariate, matrix v ariate) • whether it has discr ete or c ontinuous supp ort, corresp onding to a densit y with resp ect to a coun ting measure or a Leb esgue measure In the Julia t yp e system seman tics, these prop erties can b e captured b y adding t yp e pa- rameters c haracterizing the random v ariable to the distribution t yp e whic h represen ts it. P arametric t yping mak es these pieces of information on the sample space av ailable to the Julia compiler, allo wing dispatc h to b e performed at compile-time, making the op eration a zero-cost abstraction. Distribution is an abstract t yp e that tak es t wo parameters: a VariateForm t yp e whic h describ es the dimensionality , and ValueSupport type whic h describes the discreteness or con tinuit y of the supp ort. These “prop erty types” ha ve singleton subtypes which en umerate these prop erties: julia> abstract type VariateForm end julia> struct Univariate <: VariateForm end julia> struct Multivariate <: VariateForm end julia> struct MatrixVariate <: VariateForm end julia> abstract type ValueSupport end julia> struct Discrete <: ValueSupport end julia> struct Continuous <: ValueSupport end V arious t yp e aliases are then deﬁned for user con v enience: julia> DiscreteUnivariateDistribution = Distribution{Univariate, Discrete} julia> ContinuousUnivariateDistribution = Distribution{Univariate, Continuous} julia> DiscreteMultivariateDistribution = Distribution{Multivariate, Discrete} julia> ContinuousMultivariateDistribution = Distribution{Multivariate, Continuous} The Julia <: op erator in the deﬁnition of a new t yp e sp eciﬁes the direct sup ertype. Sp eciﬁc distribution families are then implemented as sub-types of Distribution : t ypically these are deﬁned as comp osite t yp es (" struct ") with ﬁelds capturing the parameters of the distribution. F urther information on the t yp e system can be found in Appendix B . F or example, the univ ariate uniform distribution on ( a, b ) is deﬁned as: julia> struct Uniform{T<:Real} <: ContinuousUnivariateDistribution 4 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem a::T b::T end Note in this case the Uniform distribution is itself a parametric t yp e, this allo ws it to make use of diﬀerent numeric types. By default these are Float64 , but they can also b e Float32 , BigFloat , Rational , the Dual type from F orwardDiﬀ.jl in order to support features like automatic diﬀerentiation ( Rev els, Lubin, and Papamark ou 2016 ), or user deﬁned num b er t yp es. Probabilities are assigned to subsets in a sample space, probabilistic t yp es are qualiﬁed based on this sample space from a VariateForm corresp onding to ranks of the samples (scalar, v ector, matrix, tensor) and a ValueSupport corresp onding to the set from which eac h scalar elemen t is restricted. Other t yp es of sample spaces can b e deﬁned for diﬀerent use cases by implemen ting new sub-t yp es. W e provide t wo examples b elow, one for ValueSupport and one for VariateForm . There are diﬀerent p ossibilities to represen t sto c hastic pro cesses using the to ols from Distri- butions.jl . One p ossibilit y is to deﬁne them as a new ValueSupport type. julia> using Distributions julia> abstract type StochasticProcess <: ValueSupport end julia> struct ContinuousStochasticProcess <: StochasticProcess end julia> abstract type ContinuousStochasticSampler{F<:VariateForm} <: Sampleable{F,ContinuousStochasticProcess} end More complete examples of represen tations of sto chastic pro cesses can b e found in Bridge.jl ( Sc hauer 2018 ). Distributions.jl can also b e extended to supp ort tensor-v ariate random v ari- ables in a similar fashion: julia> using Distributions julia> struct TensorVariate <: VariateForm end julia> abstract type TensorSampleable{S<:ValueSupport} <: Sampleable{TensorVariate,S} end This allo ws other developers to deﬁne their o wn mo d els on top of Distributions.jl without requiring the mo diﬁcation of the pac kage, while end-users b eneﬁt from the same in terface and con ven tions, regardless of whether one t yp e was deﬁned in Distributions.jl or in an external pac kage. The types describing a probabilistic sampling pro cess then dep end on t w o t yp e parameters inheriting from VariateForm and ValueSupport . The most generic form of suc h a construct is represented b y the Sampleable type, deﬁning something from whic h random samples can b e dra wn: julia> abstract type Sampleable{F<:VariateForm,S<:ValueSupport} end A Distribution is a sub-type of Sampleable , carrying the same t yp e parameter capturing the sample space: julia> abstract type Distribution{F<:VariateForm,S<:ValueSupport} <: Sampleable{F,S} end Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 5 A Distribution is more sp eciﬁc than a Sampleable , it describ es the probability law map- ping elements of a σ -algebra (subsets of the sample space) to corresponding probabilities of o ccurrence and is asso ciated with corresp onding probability distribution functions (CDF, PDF). As such, it extends the required interface as detailed in Section 4 . In Distributions.jl , distribution families are represen ted as types and particular distributions as instances of these t yp es. One adv antage of this structure is the ease of deﬁning a new distribution family by creating a sub-t yp e of Distribution resp ecting the in terface. The b ehavior of distributions can also b e extended b y deﬁning new functions ov er all sub-types of Distribution and using the in terface. 3. Sampling in terface Some programs require the generation of random v alues in a certain fashion without requiring the analytical closed-form probability distribution. The Sampleable type and interface serve these use cases. A random quantit y drawn from a sample space with given probability distribution requires a w ay to sample v alues. Suc h construct from which v alues can b e sampled is programmatically deﬁned as an abstract type Sampleable parameterized by F and S . The ﬁrst type parameter F classiﬁes sampling ob jects and distribution b y their dimension, univ ariate distributions asso ciated with scalar random v ariables, multiv ariate distributions asso ciated with vec tor random v ariables and matrix-v ariate distributions asso ciated with random matrices. The second type parameter S sp eciﬁes the supp ort, discrete or con tinuous. New v alue supp ort and v ariate form types can also b e deﬁned, subt yp ed from ValueSupport or VariateForm . F urthermore, probability distributions are mathematical functions to consider as immutable. A Sampleable on the other hand can b e a m utable ob ject as shown b elow. Example 3.1. Consider the following implemen tation of a discrete N-state Mark o v chain as a Sampleable . julia> using Distributions julia> mutable struct MarkovChain <: Sampleable{Univariate,Discrete} s::Int m::Matrix{Float64} end A type implementation can only b e a subt yp e of an abstract type as Sampleable . With the structure deﬁned, we can implemen t the required metho d for a Sampleable , namely rand whic h is deﬁned in Base Julia. julia> import Base: rand julia> import Random julia> function rand(rng, mc::MarkovChain) r = rand(rng) v = cumsum(mc.m[mc.s,:]) idx = findfirst(x -> x >= r, v) mc.s = idx return idx end julia> rand(mc::MarkovChain) = rand(Random.GLOBAL_RNG, mc) 6 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem rng is a random num b er generator (RNG) ob ject, passing it as an argumen t makes the rand implemen tation predictable. If repro ducibility is not imp ortant to the use case, rand can b e called as rand(mc) as implemen ted in the second metho d, using the global random num b er generator Random.GLOBAL_RNG . Note that the Mark ov c h ain implemen tation could hav e b een deﬁned in an immutable wa y . This implementations how ever highlights one possible deﬁnition of a Sampleable diﬀerent from a probabilit y distribution. This ﬂexibilit y of implementation comes with a homogeneous in terface, any item from which random samples can b e generated is called in the same fashion. F rom the user p ersp ectiv e, where a sampler is deﬁned has no impact on its use thanks to Julia multiple dispatch mechanism: the correct metho d (function implementation) is called dep ending on the input t yp e. The rand function is the only required element for deﬁning the sampling with a particular pro cess. Diﬀeren t metho ds are deﬁned for this function, sp ecifying the pseudo-random n um- b er generator (PRNG) to b e used. The default RNG uses the Mersenne-T wister algorithm ( Matsumoto and Nishimura 1998 ). The sampling state is kept in a global v ariable used by default when no RNG is provided. New random num b er generators can b e deﬁned b y users, sub-t yp ed from the Random.AbstractRNG t yp e. 4. Distribution in terface and t yp es The core of the package are probabilit y distributions, deﬁned as an abstract t yp e as pre- sen ted in 2 . An y distribution m ust implement the rand metho d, which means random v alues follo wing the distribution can b e generated. The t wo other essen tial metho ds to implement are pdf giving the probability densit y function at one point for con tinuous distributions or the probability mass function at one point for discrete distributions and cdf ev aluat- ing the Cumulated Densit y F unction at one p oint. The quantile method from the stan- dard library Statistics mo dule can be implemented for a Distribution type, with the form quantile(d::Distribution,p::Number) and returning the v alue corresp onding to the cor- resp onding cumulativ e probabilit y p . Give n that the method rand() without any argumen t follo ws a uniform pseudo-random num b er in the in terv al [0 , 1] , a default fall-bac k method for random num b er generation can b e deﬁned b y in v erse transform sampling for a univ ariate distribution as: julia> rand(d::UnivariateDistribution) = quantile(d, rand()) The equiv alen t R functions for the normal distribution can b e matched to the Distributions.jl w ay of expressing them as follows: julia> using Distributions julia> rnorm(n, mu, sig) = rand(Normal(mu, sig), n) julia> dnorm(x, mu, sig) = pdf(Normal(mu, sig), x) julia> pnorm(x, mu, sig) = cdf(Normal(mu, sig), x) julia> qnorm(p, mu, sig) = quantile(Normal(mu, sig), p) The adv antage of using multiple dispatc h is that supp orting a new distribution only requires the package API to grow b y one elemen t whic h is the new distribution type, instead of four new functions. Most common probability distributions are deﬁned by a mathematical form and a set of parameters. All distributions must implemen t the params metho d from the Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 7 StatsBase.jl , allowing the f ollo wing example to alwa ys w ork for any given distribution type Dist and d an instance of the distribution: julia> p = params(d) julia> new_dist = Dist(p...) Dep ending on the distribution, sampling a batch of data can b e done in a more eﬃcien t manner than multiple individual samples. Sp ecialized samplers can b e provided for distribu- tions, letting user sample batches of v alues. Sampling from the Kolmogoro v distribution for instance is done b y pulling atomic samples using an alternating series metho d ( Devroy e 1986 , IV.5), while the P oisson distribution lets users use either a coun ting sampler or Ahrens-Dieter sampler ( Ahrens and Dieter 1982 ). W e will not expand explanations on most trivial functions of the distribution interface, suc h as minimum , maximum whic h can b e deﬁned in terms of support . Other v alues are optional to deﬁne for distributions, such as mean , median , variance . Not deﬁning these metho ds as mandatory allo ws for instance for the Cauch y-Lorentz distribution to b e deﬁned. 5. Distribution ﬁtting and estimation Giv en a collection of samples and a distribution dep enden t on a vector of parameters, the distribution ﬁtting task consists in ﬁnding an estimation of the distribution parameters ˆ θ . 5.1. Maximum lik eliho o d estimation Maxim um Likelihoo d is a common technique for estimating the parameters θ of a distribution giv en observ ations ( Wilks 1938 ; Aldrich 1997 ), with numerous applications in statistics but also in signal pro cessing ( Pham and Garat 1997 ). The Distributions.jl in terface is deﬁned as one fit_mle function with t wo metho ds: Distributions.fit_mle(D, x) Distributions.fit_mle(D, x, w) With D a distribution t yp e to ﬁt, x b eing either a v ector of observ ations if D is univ ariate, a ma- trix of individuals/attributes if D is m ultiv ariate, and w weigh ts for the individual data p oints. The function call returns an instance of D with the parameter adjusted to the observ ations from x . Example 5.1. Fitting a normal distribution b y maximum likelihoo d. julia> using Distributions julia> import Random julia> Random.seed!(33) julia> xs = rand(Normal(50.0, 10.0), 1000) julia> fit_mle(Normal, xs) Distributions.Normal{Float64}( µ =49.781, σ =10.002) A dditional partial information can b e provided with distribution-sp eciﬁc keyw ords. fit_mle(Normal, x) accepts for instance the keyw ord arguments mu and sigma to sp ecify either a ﬁxed mean or standard deviation. 8 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem The maximum likelihoo d estimation (MLE) function is implemen ted for sp eciﬁc distribution t yp es, corresp onding to the fact that there is no default and eﬃcient MLE algorithm for a generic distribution. The implementations present in Distributions.jl only require the com- putation of suﬃcien t statistics. F or a more adv anced use cases, section D.3 presen ts the maximization of the likelihoo d for a custom distribution ov er a real dataset with a Cartesian pro duct of univ ariate distributions. The b ehavior can b e extended to a new distribution t yp e D b y implementing fit_mle for it: julia> using Distributions julia> struct D <: ContinuousUnivariateDistribution end julia> function Distributions.fit_mle(::Type{D}, xs::AbstractVector) # implementation end 5.2. Non-parametric estimation Some distribution estimation tasks hav e to b e carried out without the kno wledge of the dis- tribution form or family for v arious reasons. Non-parametric estimation generally comprises the estimation of the Cum ulative Density F unction and of the Probability Densit y F unction. StatsBase.jl provides the ecdf function, a higher order function returning the empirical CDF at any new p oint. Since the empirical CDF is built as a weigh ted sum of Heaviside functions, it is discon tinuous. StatsBase.jl also pro vides a generic fit function, used to build density histograms. Histogram is a t yp e storing the bins and heigh ts, other functions from Base Julia are deﬁned on Histogram for con venience. fit for histogram is deﬁned with the following signature: StatsBase.fit(Histogram, data[, weight][, edges]; closed=:right, nbins) 6. Mo deling mixtures of distributions Mixture mo dels are used for mo deling p opulations which consists of multiple sub-p opulations; mixture mo dels are often applied to clustering or noise separation tasks. A mixture distribution consists of sev eral c omp onent distributions , eac h of which has a relativ e c omp onent weight or c omp onent prior pr ob ability . F or a mixture of n comp onen ts, then the densities can b e written as a w eigh ted sum: f mix ( x ; π , θ ) = n X i =1 π i f ( x, θ i ) where the parameters are the comp onent weigh ts π = ( π 1 , . . . , π n ) , taking v alues on the unit simplex, and eac h θ i parameterizes the i th comp onen t distribution. Sampling from a mixture model consists of ﬁrst selecting the comp onen t according to the relativ e w eight, then sampling from the corresp onding comp onen t distribution. Therefore a mixture mo del can also b e interpreted as a hierarc hical mo del. Mixtures are deﬁned as an abstract sub-type of Distribution , parameterized by the same t yp e information on the v ariate form and v alue supp ort: Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 9 julia> abstract type AbstractMixtureModel{VF<:VariateForm,VS<:ValueSupport} <: Distribution{VF, VS} end An y AbstractMixtureModel is therefore a Distribution and therefore implements sp ecial- ized the mandatory metho ds insupport , mean , var , etc. Mixture mo dels also need to imple- men t the following b ehavior: • ncomponents(d::AbstractMixtureModel) : returns the n umber of comp onents in the mixture • component(d::AbstractMixtureModel, k) : returns the k-th comp onen t • probs(d) : returns the vector of prior probabilities ov er comp onents A concrete generic implemen tation is then deﬁned as: julia> struct MixtureModel{VF<:VariateForm,VS<:ValueSupport,Component<:Distribution} <: AbstractMixtureModel{VF,VS} components::Vector{Component} prior::Categorical end Once constructed, it can b e manipulated as an y distribution with the diﬀeren t metho ds of the in terface. Example 6.1. Figure 1 shows the plot resulting from the follo wing construction of a uni- v ariate Gaussian mixture mo del. julia> using Distributions julia> import Plots julia> gmm = MixtureModel( Normal.([-1.0, 0.0, 3.0], [0.3, 0.5, 1.0]), [0.25, 0.25, 0.5], ) julia> xs = -2.0:0.01:6.0 julia> Plots.plot(xs, pdf.(gmm, xs), legend=nothing) julia> Plots.ylabel!("\$f_X(x)\$") julia> Plots.xlabel!("\$x\$") julia> Plots.title!("Gaussian mixture PDF") The MixtureModel constructor is taking as argument a vector of Normal distributions, deﬁned here from a v ector of mean v alues and a vector of standard deviation v alues. Mixture mo dels remain an activ e area of research and of dev elopment within Distributions.jl , estimation metho ds, and impro v ed multiv ariate and matrix-v ariate supp ort are plann ed for future releases. A more adv anced example of estimation of a Gaussian mixture b y a generic EM algorithm is presen ted in section D.4 . 7. Applications of Distributions.jl 10 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem Figure 1: Example PDF of a univ ariate Gaussian mixture Distributions.jl has become a foundation for statistical programming in Julia , notably used for economic mo deling ( F ederal Reserv e Bank Of New Y ork 2019 ; Lyon, Sargent, and Stach urski 2017 ), Marko v chain Monte Carlo simulations ( Smith 2018 ) or randomized black-box opti- mization F eldt ( 2017 ) 3 . W e presen t b elo w tw o applications of the package for non -parametric con tinuous density estimation and probabilistic programming. 7.1. Kernel density estimation Probabilit y densit y functions can b e estimated in a non-parametric fashion using kernel den- sit y estimation (KDE, Rosenblatt 1956 ; Parzen 1962 ). This is supp orted through the Ker- nelDensit y .jl pac kage, deﬁning the kde function to infer an estimate density from data. Both univ ariate and biv ariate density estimates are supp orted. Most of the algorithms and param- eter selection heuristics developed in KernelDensity .jl are based on Silverman ( 2018 ). Ker- nelDensit y .jl supp orts m ultiple distributions as base kernels, and can b e extended to other families. The default kernel used is Gaussian. Example 7.1. W e highlight the estimation of a kernel density on data generated from the mixture of a log-normal and uniform distributions. julia> import Random, Plots, KernelDensity julia> using Distributions julia> function generate_point(rng = Random.GLOBAL_RNG) thres = rand(rng) if thres >= 0.5 rand(rng, LogNormal()) else rand(rng, Uniform(2.0, 3.0)) end end julia> mt = Random.MersenneTwister(42) julia> xs = [generate_point(mt) for _ in 1:5000] 3 Other packages depending on Distributions.jl can b e found on the Julia General registry: https://github. com/JuliaRegistries/General Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 11 Figure 2: A djusted KDE with v arious bandwidths julia> bandwidths = [0.05, 0.1, 0.5] julia> densities = [KernelDensity.kde(xs, bandwidth = bw) for bw in bandwidths] julia> p = Plots.plot() julia> for (b,d) in zip(bandwidths, densities) Plots.plot!(p, d.x, d.density, labels = "bw = $b") end julia> Plots.xlims!(p, 0.0, 8.0) julia> Plots.title!("KDE with Gaussian Kernel") The bandwidth bw = 0 . 1 seems not to o verﬁt isolated data p oin ts without smo othing out imp ortan t comp onents. All examples pro vided use the Plots.jl 4 pac kage to plot results, whic h can b e used with v arious plotting engines as bac k-end. W e can compare the kernel densit y estimate to the real PDF as done in the following script and illustrated Figure 3 . julia> xvals = 0.01:0.01:8.0 julia> yvals = map(xvals) do x comp1 = pdf(LogNormal(), x) comp2 = pdf(Uniform(2.0, 3.0), x) 0.5 * comp1 + 0.5 * comp2 end julia> p = Plots.plot(xvals, yvals, labels = "Real distribution") julia> kde = KernelDensity.kde(xs, bandwidth = 0.1) julia> Plots.plot!(p, kde.x, kde.density, labels = "KDE") julia> Plots.plot!(p, xvals, yvals, labels = "Real distribution") julia> Plots.xlims!(p, 0.0, 8.0) The kernel density estimation tec hnique relies on a base distribution (the k ernel) whic h is con voluted against the data. The pac kage uses directly the in terface presented in section 4 to accept a Distribution as second parameter. The follo wing script computes the kernel density estimates with a Gaussian and triangular distributions. The result is illustrated Figure 4 . julia> mt = Random.MersenneTwister(42) 4 h ttp://do cs.juliaplots.org 12 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem Figure 3: Comparison of the exp erimen tal and real PDF Figure 4: Comparison of diﬀerent kernels deﬁned as Distribution ob jects julia> ( µ , σ ) = (5.0, 1.0) julia> xs = [rand(mt, Normal( µ , σ )) for _ in 1:50] julia> ndist = Normal(0.0, 0.3) julia> gkernel = kde(xs, ndist) julia> tdist = TriangularDist(-0.5, 0.5) julia> tkernel = KernelDensity.kde(xs, tdist) julia> p = Plots.plot(tkernel.x, tkernel.density, labels = "Triangular kernel") julia> Plots.plot!(p, gkernel.x, gkernel.density, labels = "Gaussian kernel", legend = :left) julia> Plots.title!(p, "Comparison of Gaussian and triangular kernels") The density estimator is computed via the F ourier transform: any distribution with a deﬁned c haracteristic function (via the cf function from Distributions.jl ) can b e used as a k ernel. The abilit y to manipulate distributions as t yp es and ob jects allo ws end-users and pac kage dev elop ers to compose on top of deﬁned distributions and to deﬁne their own to use for k ernel densit y estimations without an y additional run time cost. The kernel estimation of a bi-v ariate Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 13 densit y is shown in D.2 . 7.2. Probabilistic programming languages Probabilistic programs are computer programs with the added capability of dra wing the v alue of a v ariable from a probability distribution and conditioning v ariable v alues on observ ations ( Gordon, Henzinger, Nori, and Rajamani 2014 ). They allow for the sp eciﬁcation of complex relations b etw een random v ariables using a set of constructs that go b eyond t ypical graphical mo dels to include ﬂo w con trol (e.g., lo ops, conditions), recursion, user-deﬁned t yp es and other algorithmic building blo c ks, diﬀering from explicit graph construction by users, as done for instance in Smith ( 2018 ). Probabilistic programming languages (PPL) are programming language or libraries enabling developers to write such a probabilistic program. That is, they are a t yp e of domain-sp eciﬁc language (DSL, F owler 2010 ). W e refer the interested reader to V an de Meen t, P aige, Y ang, and W o o d ( 2018 ) for an ov erview of PPL design and use, to Ścibior et al. ( 2017 ) for the analysis of Bay esian inference algorithms as transformations of in termediate representations, and to Vákár, Kammar, and Staton ( 2019 ) for representation of recursiv e types in PPLs. They often fall in tw o categories. In the ﬁrst type, the PPL is its o wn indep endent language, including syntax and parsing, though it may rely on another “host” language for its execution engine. F or example, Stan ( Carp en ter et al. 2017 ), uses its o wn .stan mo del sp eciﬁcation format, though the parser and inference engine are written in C++ . This type of approach b eneﬁts from deﬁning its o wn syn tax, thus designing it to lo ok similar to the wa y equiv alen t statistical mo dels are written. The mo del is also v eriﬁed for syn tactic correctness at compile-time. 5 In the second type of PPL, the language is embedded within and makes use of the syn tax of the host language, leveraging that language’s native constructs. F or example, PyMC ( P atil, Huard, and F onnesb eck 2010 ; K o ch urov, Carroll, Wiecki, and Lao 2019 ), Pyro ( Bingham et al. 2019 ), and Edward ( T ran, Kucukelbir, Dieng, Rudolph, Liang, and Blei 2016 ) all use Python as the host language, lev eraging Theano , T orch , and T ensorFlow , respectively , to p erform man y of the underlying inference computations. Here, the key adv antage is that these PPLs gain access to the ecosystem of the host language (data structures, libraries, to oling). User do cumen tation and dev elopment eﬀorts can also fo cus on the key aspects of a PPL instead of the full stac k. 6 Ho wev er, b oth approaches to PPLs suﬀer from drawbac ks. In the ﬁrst type of PPLs, users m ust add new elements to their to olchains for the developmen t and inference of statistical mo dels. That is, they are likely to use a general-purp ose programming language for other tasks, then switch to the PPL en vironment for the sampling and inference task, and then read the result back in to the general-purp ose programming language. One solution to this problem is to develop APIs in general purp ose languages, but full inter-operability b etw een t wo en vironments is non-trivial. Dev elop ers must then build a whole ecosystem around the PPL, including data structures, input/output, and text editor supp ort, which costs time that migh t otherwise go tow ard improving the inference algorithms and optimization pro cedures. While the second type of PPL may not require the developmen t of a separate, parallel set of to ols, it often runs into “imp edance mismatc hes” of its o wn. In many cases, the host language has not been designed for a re-use of its constructs for probabilistic programming, resulting 5 An example mo del combining Stan with R can be found at Gelman ( 2019 ). 6 The same golf putting data were used using PyMC as PPL in PyMC ( 2019 ). 14 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem in syn tax that aligns p o orly with users’ mathematical in tuitions. Moreo ver, duplication may still o ccur at the level of classes or libraries, as, for example, Pyro and Edward dep end on T orc h and T ensorFlo w, which replicate m uch of the linear algebra functionality of NumPy for their o wn array constructs. By contrast, the design of Distributions.jl has enabled the dev elopment of embedded prob- abilistic programming languages such as T uring.jl ( Ge, Xu, and Ghahramani 2018 ), SOSS.jl ( Sc herrer 2019 ), Omega.jl ( T av ares 2018 ) or Poirot.jl ( Innes 2019 ) with comparatively less o verhead or friction with the host language. These PPLs are able to make use of three el- emen ts unique to the Julia ecosystem to c hallenge the dic hotomy b etw een embedded and stand-alone PPLs. First, they mak e use of Julia ’s rich type system and multiple dispatc h, whic h more easily supp ort mo deling mathematical constructs as types. Second, they all utilize Distributions.jl ’s types and hierarch y for the sampling of random v alues and representation of their distributions. Finally , they use manipulations and transformations of the user Julia co de to create new syn tax that matc hes domain-sp eciﬁc requiremen ts. These transformations are either p erformed through macros (for Soss.jl and T uring.jl ) or mo d- iﬁcations of the compilation phases through compiler ov erdubbing ( Omega.jl and Poirot.jl use Cassette.jl and IR T o ols.jl resp ectiv ely to mo dify the user co de or its in termediate representa- tions). As in the Lisp tradition, Julia ’s macros are written and manipulable in Julia itself and rewrite co de during the lo w ering phase ( Bezanson et al. 2018 ). Macros allow PPL design- ers to k eep programs close to s tandard Julia while introducing package-speciﬁc syntax that more closely mimics statistical conv en tions, all without compromising on p erformance. F or instance, a simple mo del deﬁnition from the T uring.jl do cumentation illustrates a mo del that is b oth readable as Julia co de while staying close to statistical conv en tions: julia> @model coinflip(y) = begin p ~ Beta(1, 1) N = length(y) for n in 1:N y[n] ~ Bernoulli(p) end end With these syn tax manipulation capabilities, along with compiled language p erformance, the com bination of Julia with Distributions.jl pro vides an excellen t foundation for further researc h and dev elopment of PPLs. 7 8. Conclusion and future w ork W e presen ted some of the types, structures and to ols for mo deling and computing on prob- abilit y distributions in the JuliaStats ecosystem. The JuliaStats packages lev erage some key constructs of Julia suc h as deﬁnition of new comp osite t yp es, abstract parametric t yp es and sub-t yping, along with m ultiple dispatc h. This allows users to express computations inv olving random num b er generation and the manipulation of probabilit y distributions. A clear inter- face allows package developers to build new distributions from their mathematical deﬁnition. New algorithms can b e dev elop ed, using the Distribution in terface and as suc h extending the 7 The golf putting mo del was p orted to T uring.jl in Duncan ( 2020 ). Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 15 new features to all sub-t yp es. These tw o features of Julia , namely extension of b ehavior to new types and deﬁnition of behavior ov er existing type hierarc hy , allow diﬀerent features built around the Distribution type to in ter-op erate seamlessly without hard coupling nor additional w ork from the package dev elop ers or end-users. F uture w ork on the package will include the implemen tation of maximum lik eliho o d estimation for other distributions, including mixtures and matrix-v ariate distributions. A c kno wledgmen ts The authors thank and ackno wledge the w ork of all contributors and maintainers of packages of the JuliaStats ecosystem and esp ecially Dav e Kleinsc hmidt for his v aluable feedback on early v ersions of this article, Andreas Noac k for the ov erall sup ervision of the pac kage, including the in tegration of n umerous pull requests from external con tributors, Moritz Sc hauer for discussions on the represen tability of mathematical objects in the Julia type system and Zenna T a v ares for the critical review of early drafts. Researc h sp onsored by the Lab oratory Directed Researc h and Developmen t Program of Oak Ridge National Laboratory , managed by UT-Battelle, LLC, for the US Departmen t of Energy under con tract DE-A C05-00OR22725. This researc h used resources of the Compute and Data En vironment for Science (CADES) at the Oak Ridge National Lab oratory , whic h is supp orted by the Oﬃce of Science of the U.S. Department of Energy under Contract No. DE- A C05-00OR22725. This applies speciﬁcally to funding pro vided by the Oak Ridge National Lab oratory to Theo dore Papamark ou. References A eb erhard S, Co omans D, De V el O (1994). “Comparativ e Analysis of Statistical Pattern Recognition Metho ds in High Dimensional Settings. ” Pattern R e c o gnition , 27 (8), 1065– 1077. doi:10.1016/0031- 3203(94)90145- 7 . Ahrens JH, Dieter U (1982). “Computer Generation of P oisson Deviates from Modiﬁed Normal Distributions. ” A CM T r ansactions on Mathematic al Softwar e , 8 (2), 163–179. doi: 10.1145/355993.355997 . Aldric h J (1997). “R.A. Fisher and the Making of Maximum Likelihoo d 1912–1922. ” Statistic al Scienc e , 12 (3), 162–176. doi:10.1214/ss/1030037906 . Bezanson J, Chen J, Chung B, Karpinski S, Shah VB, Vitek J, Zoubritzky L (2018). “ Julia : Dynamism and Performance Reconciled by Design. ” Pr o c e e dings of the A CM on Pr o gr am- ming L anguages , 2 (OOPSLA), 120. doi:10.1145/3276490 . Bezanson J, Edelman A, Karpinski S, Shah V (2017). “ Julia : A F resh Approac h to Numerical Computing. ” SIAM R eview , 59 (1), 65–98. doi:10.1137/141000671 . Bingham E, Chen JP , Jank owiak M, Ob ermeyer F, Pradhan N, Karaletsos T, Singh R, Szerlip P , Horsfall P , Go o dman ND (2019). “ Pyro : Deep Univ ersal Probabilistic Programming. ” The Journal of Machine L e arning R ese ar ch , 20 (1), 973–978. 16 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem Bo ost Dev elop ers (2018). “ Bo ost C++ Libraries. ” URL https://www.boost.org/ . Carp en ter B, Gelman A, Hoﬀman MD, Lee D, Go o drich B, Betancourt M, Brubaker M, Guo J, Li P , Riddell A (2017). “ Stan : A Probabilistic Programming Language. ” Journal of Statistic al Softwar e , 76 (1), 1–32. doi:10.18637/jss.v076.i01 . Devro ye L (1986). Non-Uniform R andom V ariate Gener ation . Springer-V erlag, New Y ork. Duncan J (2020). “Mo del building of golf putting with T uring.jl . ” URL https://web.archive.org/web/20210611145623/https://jduncstats.com/posts/ 2019- 11- 02- golf- turing/ . Erwig M, K ollmansb erger S (2006). “F unctional Pearls: Probabilistic F unctional Program- ming in Hask ell . ” Journal of F unctional Pr o gr amming , 16 (1), 21–34. doi:10.1017/ s0956796805005721 . F ederal Reserve Bank Of New Y ork (2019). “ DSGE.jl : Solv e and Estimate Dynamic Sto chastic General Equilibrium Mo dels (Including the New Y ork F ed DSGE). ” URL https://github. com/FRBNY- DSGE/DSGE.jl . F eldt R (2017). “ Blac kBoxOptim.jl . ” URL https://github.com/robertfeldt/ BlackBoxOptim.jl . F o wler M (2010). Domain-Sp e ciﬁc L anguages . P earson Education. Ge H, Xu K, Ghahramani Z (2018). “ T uring : Comp osable Inference for Probabilistic Pro- gramming. ” In Pr o c e e dings of the Twenty-First International Confer enc e on A rtiﬁcial Intel- ligenc e and Statistics , pp. 1682–1690. URL http://proceedings.mlr.press/v84/ge18b. html . Gelman A (2019). “Mo del Building and Expansion for Golf Putting (accessed June 2021). ” URL https://mc- stan.org/users/documentation/case- studies/golf.html . Gordon AD, Henzinger T A, Nori A V, Ra jamani SK (2014). “Probabilistic Programming. ” In Pr o c e e dings of the on F utur e of Softwar e Engine ering , pp. 167–181. ACM. Innes M (2019). “ Poirot.jl . ” V ersion 0.1, URL https://github.com/MikeInnes/Poirot.jl . JuliaStats (2019). “ Distributions.jl . ” doi:10.5281/zenodo.2647458 . Kisely ov O, Shan CC (2009). “Embedded Probabilistic Programming. ” In WM T aha (ed.), Domain-Sp e ciﬁc L anguages , pp. 360–384. Springer-V erlag, Berlin. K o c huro v M, Carroll C, Wiecki T, Lao J (2019). “ PyMC4 : Exploiting Coroutines for Im- plemen ting a Probabilistic Programming F ramew ork. ” URL https://openreview.net/ forum?id=rkgzj5Za8H . Le Go c Y, Donzé A (2015). “ EVL : A F ramew ork for Multi-Metho ds in C++ . ” Scienc e of Computer Pr o gr amming , 98 , 531–550. ISSN 0167-6423. doi:10.1016/j.scico.2014.08. 003 . Ly on S, Sargen t TJ, Stac hurski J (2017). “ QuantEcon.jl . ” URL https://github.com/ QuantEcon/QuantEcon.jl . Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 17 Matsumoto M, Nishim ura T (1998). “Mersenne T wister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ” A CM T r ansactions on Mo deling and Com- puter Simulation , 8 (1), 3–30. doi:10.1145/272991.272995 . Mogensen PK, Riseth AN (2018). “ Optim : A Mathematical Optimization P ac kage for Julia . ” Journal of Op en Sour c e Softwar e , 3 (24), 615. doi:10.21105/joss.00615 . Orban D, Siqueira AS (2019). “ JuliaSmo othOptimizers : Infrastructure and Solvers for Con- tin uous Optimization in Julia . ” doi:10.5281/zenodo.2655082 . P arzen E (1962). “On Estimation of a Probability Density F unction and Mo de. ” The A nnals of Mathematic al Statistics , 33 (3), 1065–1076. doi:10.1214/aoms/1177704472 . P atil A, Huard D, F onnesb eck C (2010). “ PyMC : Bay esian Sto c hastic Mo delling in Python . ” Journal of Statistic al Softwar e , 35 (4), 1–81. doi:10.18637/jss.v035.i04 . Pham DT, Garat P (1997). “Blind Separation of Mixture of Independent Sources through a Quasi-Maxim um Likelihoo d Approach. ” IEEE T r ansactions on Signal Pr o c essing , 45 (7), 1712–1725. doi:10.1109/78.599941 . Pirk elbauer P , Solo dkyy Y, Stroustrup B (2010). “Design and Ev aluation of C++ Open Multi- Metho ds. ” Scienc e of Computer Pr o gr amming , 75 (7), 638–667. doi:10.1016/j.scico. 2009.06.002 . Generativ e Programming and Comp onent Engineering (GPCE 2007). PyMC (2019). “Mo del building and expansion for golf putting. ” URL http: //web.archive.org/web/20191210110708/https://nbviewer.jupyter.org/github/ pymc- devs/pymc3/blob/master/docs/source/notebooks/putting_workflow.ipynb . Rev els J, Lubin M, Papamark ou T (2016). “F orw ard-Mo de Automatic Diﬀeren tiation in Julia . ” arXiv 1607.07892 , arXiv.org E-Prin t Archiv e. URL . Rosen blatt M (1956). “Remarks on Some Nonparametric Estimates of a Density F unction. ” The A nnals of Mathematic al Statistics , 27 (3), 832–837. doi:10.1214/aoms/1177728190 . R uckdesc hel P , Kohl M, Stabla T, Camphausen F (2006). “ S 4 Classes for Distributions. ” R News , 6 (2), 2–6. Sc hauer M (2018). “ Bridge.jl : a Statistical T o olb o x for Diﬀusion Pro cesses and Sto c hastic Diﬀeren tial Equations. ” doi:10.5281/zenodo.891230 . V ersion 0.9. Sc herrer C (2019). “ Soss.jl : Probabilistic Programming via Source Rewriting. ” V ersion 0.1, URL https://github.com/cscherrer/Soss.jl . Ścibior A, Ghahramani Z, Gordon AD (2015). “Practical Probabilistic Programming with Monads. ” A CM SIGPLAN Notic es , 50 (12), 165–176. doi:10.1145/2887747.2804317 . Ścibior A, Kammar O, Vákár M, Staton S, Y ang H, Cai Y, Ostermann K, Moss SK, Heunen C, Ghahramani Z (2017). “Denotational V alidation of Higher-Order Bay esian Inference. ” Pr o c e e dings of the A CM on Pr o gr amming L anguages , 2 , 60. doi:10.1145/3158148 . Silv erman BW (2018). Density Estimation for Statistics and Data A nalysis . Routledge. 18 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem Smith BJ (2018). “ Mam ba.jl : Marko v Chain Mon te Carlo (MCMC) for Bay esian Analysis in Julia . ” V ersion 0.12, URL https://github.com/brian- j- smith/Mamba.jl . T a v ares Z (2018). “ Omega.jl : Causal, Higher-Order, Probabilistic Programming. ” V ersion 0.1, URL https://github.com/zenna/Omega.jl . T ran D, Kucukelbir A, Dieng AB, R udolph M, Liang D, Blei DM (2016). “ Edward : A Library for Probabilistic Mo deling, Inference, and Criticism. ” arXiv 1610.09787 , arXiv.org E-Print Arc hive. URL . Vákár M, Kammar O, Staton S (2019). “A Domain Theory for Statistical Probabilistic Programming. ” Pr o c e e dings of the A CM on Pr o gr amming L anguages , 3 (POPL), 36. doi: 10.1145/3290349 . V an de Meen t JW, Paige B, Y ang H, W oo d F (2018). “An Introduction to Probabilistic Programming. ” arXiv 1809.10756 , arXiv.org E-Prin t Arc hive. URL abs/1809.10756 . Virtanen P , Gommers R, Oliphan t TE, Hab erland M, Reddy T, Cournap eau D, Burovski E, Peterson P , W ec kesser W, Brigh t J, v an der W alt SJ, Brett M, Wilson J, Millman KJ, Ma yoro v N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, F eng Y, Mo ore EW, V anderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, P edregosa F, v an Mulbregt P , SciPy 10 Con tributors (2020). “ SciPy 1.0: F undamental Algorithms for Scientiﬁc Computing in Python . ” Natur e Metho ds , 17 , 261–272. doi:10.1038/s41592- 019- 0686- 2 . Wilks SS (1938). “The Large-Sample Distribution of the Likelihoo d Ratio for T esting Comp osite Hyp otheses. ” The A nnals of Mathematic al Statistics , 9 (1), 60–62. doi: 10.1214/aoms/1177732360 . Zappa Nardelli F, Belyak ov a J, Pelenitsyn A, Chung B, Bezanson J, Vitek J (2018). “ Julia Subt yping: A Rational Reconstruction. ” Pr o c e e dings of the A CM on Pr o gr amming L an- guages , 2 (OOPSLA), 113. doi:10.1145/3276483 . Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 19 A. Installation of the relev ant pac kages The recommended installation of the relev an t pac kages is done through the Pkg.jl 8 to ol av ail- able within the Julia distribution standard library . The common w ay to interact with the to ol is within the REPL for Julia 1.0 and ab o v e. The closing square brack et "]" starts the pkg mo de, in whic h the REPL sta ys un til a return key is stroke. julia> ] add StatsBase (v1.0) pkg> add Distributions (v1.0) pkg> add KernelDensity This installation and all co de snippets are guaran teed to w ork on Julia 1.0 and later with the semantic versioning commitment. Once installed, a package can b e remov ed with the command rm PackageName or up dated. All package mo diﬁcations are guaran teed to resp ect v ersion constrain ts for their direct and transitiv e dep endencies. The packages StatsBase.jl , Distributions.jl , KernelDensit y .jl are all op en-source under the MIT license. B. Julia t yp e system B.1. F unctions and metho ds In Julia , a function “is an ob ject that maps a tuple of argument v alues to a return v alue” . 9 A sp ecial case is an empty tuple as input, as in y = f() , and a function that returns the nothing v alue. A function deﬁnition creates a top-lev el identiﬁer with the function name. This can b e passed around as any other v alue, for example to other functions. The function map takes a function and a collection map(f, c) , and applies the function to each elemen t of the collection to return the mapp ed v alues. A function migh t represen t a conceptual computation but diﬀerent sp eciﬁc implementations migh t exist for this computation. F or instance, the addition of tw o num b ers is a common concept, but ho w it is implemented dep ends on the n umber t yp e. The sp eciﬁc implementation of addition for complex n um b ers is 10 julia> +(z::Complex, w::Complex) = Complex(real(z) + real(w), imag(z) + imag(w)) This sp eciﬁc implementation is a metho d of the + function. Users can deﬁne their own implemen tation of existing functions, thus creating a new metho d for this function. Diﬀeren t metho ds can b e implemented b y changing the tuple of argumen ts, either with a diﬀerent n umber of arguments or diﬀerent types. 11 Example B.1. In the follo wing example, the function f has tw o metho ds. The function called dep ends on the n um b er of argumen ts. 8 https://docs.julialang.org/en/v1/stdlib/Pkg/ 9 Julia do cumen tation - F unctions: https://docs.julialang.org/en/v1/manual/functions 10 Julia source co de, base/complex.jl 11 Julia do cumen tation - Metho ds: https://docs.julialang.org/en/v1/manual/methods 20 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem julia> function f(x) println(x) end julia> function f(x, y) println("x: $x & y: $y") end Example B.2. In the following example, the function g has tw o metho ds. The ﬁrst one is the most sp eciﬁc metho d and will b e called for any type of the argument x that is a Number . Otherwise, the second metho d, whic h is less sp eciﬁc, will b e called. julia> g(x::T) where {T<:Number} = (3*x, x) julia> g(x) = (x, 3) Note that the order of deﬁnitions do es not matter here, the least speciﬁc could hav e b een deﬁned ﬁrst, and then the n um b er-sp ecialized implemen tation. The metho d dispatc hed on b y the Julia run time is alwa ys the most sp eciﬁc. Example B.3. If there is no unique most sp eciﬁc metho d, Julia will raise a MethodError julia> f(x, b::Float64) = x f (generic function with 1 method) julia> f(x::Float64, b) = b f (generic function with 2 methods) julia> f(3.0, 2.0) ERROR: MethodError: f(::Float64, ::Float64) is ambiguous. Candidates: f(x::Float64, b) in Main at REPL[2]:1 f(x, b::Float64) in Main at REPL[1]:1 Possible fix, define f(::Float64, ::Float64) B.2. Types Julia enables users to deﬁne their o wn types including abstract t yp es, mutable and immutable comp osite types or structures and primitive t yp es (comp osed of bits). P ackages often de- ﬁne abstract t yp es to regroup t yp es under one lab el and pro vide default implementation for a group of types. F or examples, lots of metho ds require arguments which are identiﬁed as Number , up on whic h arithmetic op erations can b e applied, without ha ving to re-deﬁne meth- o ds for each of the concrete n um b er t yp es. The most common t yp e deﬁnition for end-users is comp osite t yp es with the k eyw ord struct as follo ws: julia> struct S field1::TypeOfField1 field2::TypeOfField2 field3 # a field without specified type will take the type ' Any ' end In some cases, a t yp e is deﬁned for the sole purp ose of creating a new metho d and do es not require additional data in ﬁelds. The following deﬁnition is th us v alid: julia> struct S end Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 21 An instance of S is called a singleton , there is only one instance of such type. Example B.4. One use case is the sp eciﬁcation of diﬀerent algorithms for a pro cedure. The input of the pro cedure is alwa ys the same, so is the exp ected output, but diﬀerent w ays to compute the result are a v ailable. julia> struct Alg1 end julia> struct Alg2 end julia> mul(x::Unsigned, y::Unsigned, ::Alg1) = x * y julia> mul(x::T, y::Unsigned, ::Alg2) where {T<:Unsigned} = T(sum(x for _ in 1:y)) Note that in the second example, the information of the concrete type T of x is required to con vert the sum expression into a num b er of type T . C. Comparison of Distributions.jl , Python / SciPy and R In this section, we dev elop several comparativ e examples of ho w v arious computation tasks link ed with probability distributions are p erformed using R , Python with SciPy/NumPy and Julia . C.1. Sampling from v arious distributions The following programs draw 100 samples from a Gamma distribution Γ( k = 10 , θ = 2) , in R : R> set.seed(42) R> rgamma(100, shape = 10, scale = 2) The follo wing Python v ersion uses the NumPy & SciPy libraries. >>> import numpy as np >>> import scipy.stats as stats >>> np.random.seed(42) >>> stats.gamma.rvs(10, scale = 2, size = 100) The follo wing Julia v ersion is written to sta y close to the previous ones, thus setting the global seed and not passing along a new RNG ob ject. julia> import Random julia> using Distributions julia> Random.seed(42) julia> g = Gamma(10, 2) julia> rand(g, 100) C.2. Representing a distribution with v arious scale parameters The follo wing co de examples are in the same order as the previous sub-section: R> x <- seq(-3.0,3.0,0.01) R> y <- dnorm(x, mean = 0.5, sd = 0.75) R> plot(x, y) The Python example uses NumPy , SciPy and matplotlib . 22 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem >>> import numpy as np >>> from scipy import stats >>> from matplotlib import pyplot as plt >>> x = np.arange(-3.0,3.0,0.01) >>> y = stats.norm.pdf(x, loc = 0.5, scale = 0.75) >>> plt.plot(x,y) The Julia v ersion uses deﬁnes a Gaussian random v ariable ob ject n : julia> using Distributions julia> using Plots julia> n = Normal(0.5, 0.75) julia> x = -3.0:0.01:3.0 julia> y = pdf.(n, x) julia> plot(x, y) Note that unlike the R and Python examples, x do es not need to b e represented as an array but as an iterable not storing all intermediate v alues, saving time and memory . The dot or broadcast op erator in pdf.(n, x) applies the function to all elements of x . D. Wine data analysis In this section, we show the example of analyses run on the wine dataset A eb erhard, Co omans, and De V el ( 1994 ) obtained on the UCI mac hine learning rep ository . The data can b e fetched directly o ver HTTP from within Julia . julia> using Distributions, DelimitedFiles julia> import Plots julia> wine_data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data" julia> wine_data_file = download(wine_data_url) julia> raw_wine_data = readdlm(wine_data_file, ' , ' , Float64) julia> wine_quant = raw_wine_data[:,2: end ] julia> wine_labels = Int.(raw_wine_data[:,1]) D.1. Automatic m ultiv ariate ﬁtting W e can then ﬁt a m ultiv ariate distribution to some v ariables and observe the result: julia> alcohol = wine_quant[:,1] julia> log_malic_acid = log.(wine_quant[:,2]) julia> obs = permutedims([alcohol log_malic_acid]) julia> res_normal = fit_mle(MvNormal, obs) julia> cont_func(x1, x2) = pdf(res_normal, [x1,x2]) julia> p = Plots.contour(11.0:0.05:15.0, -0.5:0.05:2.5, cont_func) julia> Plots.scatter!(p, alcohol, log_malic_acid, label="Data points") julia> Plots.title!( p, "Wine scatter plot & Gaussian maximum likelihood estimation") Note that fit_mle needs observ ations as columns, hence the use of permutedims . D.2. Non-parametric density estimation Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 23 Figure 5: Multiv ariate Gauss ian MLE o ver ( x 1 , x 2 ) of the wine data Figure 6: KDE ov er ( x 1 , x 2 ) of the wine data Assuming a multiv ariate Gaussian distribution might b e considered a to o strong assumption, a k ernel density estimator can b e used on the same data: julia> wine_kde = KernelDensity.kde((alcohol, log_malic_acid)) julia> cont_kde(x1, x2) = pdf(wine_kde, x1, x2) julia> p = Plots.contour(11.0:0.05:15.0, -0.5:0.05:2.5, cont_kde) julia> Plots.scatter!(p, alcohol, log_malic_acid, group=wine_labels) julia> Plots.title!(p, "Wine scatter plot & Kernel Density Estimation") The resulting kernel density estimation is sho wn Figure 6 . The level curves highlight the cen ters of the three classes of the dataset. D.3. Pro duct distribution mo del Giv en that a logarithmic transform was used for x 2 , a shifted log-normal distribution can b e ﬁtted: X 2 − 0 . 73 ∼ Log N or mal ( µ, σ ) . The maximum likelihoo d estimator could b e computed 24 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem as in section D.1 . Instead, w e will demonstrate the simplicit y of building new constructs and optimize ov er them. A Product distribution is implemented in Distributions.jl , deﬁning a Cartesian pro duct of the t w o v ariables, thus assuming their indep endence: X = ( X 1 , X 2 ) (1) Assuming X 1 follo ws a normal distribution, given a v ector of 4 parameters [ µ 1 , σ 1 , µ 2 , σ 2 ] , the distribution can b e constructed: julia> function build_product_distribution(p) return product_distribution([ Normal(p[1], p[2]), LogNormal(p[3], p[4]), ]) end Computing the log-likelihoo d of a pro duct distribution b oils do wn to the sum of the individual log-lik eliho o d. The gradien t could b e computed analytically but automatic diﬀerentiation will b e lev eraged here using F orw ardDiﬀ.jl ( Revels et al. 2016 ): julia> function loglike(p) d = build_product_distribution(p) return loglikelihood(d.v[1], wine_quant[:,1]) + loglikelihood(d.v[2], wine_quant[:,2] .- 0.73) end julia> ∇ L(p) = ForwardDiff.gradient(loglike, p) Once the gradient obtained, ﬁrst-order optimization metho ds can b e applied. T o keep ev ery- thing self-con tained, a gradient descent with decreasing step will b e applied: x k +1 = x k + ρ k ∇L ( x k ) (2) ρ k +1 = ρ 0 / ( k + m ) (3) With ( ρ 0 , m ) constants. julia> p = [10.0 + 3.0 * rand(), rand()+1, 2.0 + 3.0*rand(), rand()+1] julia> iter = 1 julia> maxiter = 5000 julia> while iter <= maxiter && sum(abs.( ∇ L(p))) >= 10ˆ-6 p = p + 0.05 * inv(iter+5) * ∇ L(p) p[2] = p[2] < 0 ? -p[2] : p[2] p[4] = p[4] < 0 ? -p[4] : p[4] iter += 1 end julia> d = build_product_distribution(p) Without further co de or metho d tuning, the optimization takes ab out 590 µs and few iterations to con verge as sho wn Figure 8 . F or more complex use cases, users w ould lo ok at more sophisticated techniques as dev elop ed in JuliaSmo othOptimizers ( Orban and Siqueira 2019 ) or Optim.jl ( Mogensen and Riseth 2018 ). Figure 7 highlights the resulting marginal and joint distributions. D.4. Implementation of an Exp ectation-Maximization algorithm Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 25 Figure 7: Resulting inferred joint and marginal distribution for ( X 1 , X 2 ) Figure 8: Illustration of the likelihoo d maximization con vergence In this section, we highlight how Julia dispatch mechanism and Distributions.jl t yp e hierarch y can help users deﬁne algorithms in a generic fashion, leveraging sp eciﬁc structure but allo wing their extension. W e know that the observ ations from the wine data set are split into diﬀerent classes Z . W e will consider only the t w o ﬁrst v ariables, with the second at a log-scale: X = ( x 1 , l og ( x 2)) . The exp ectation step computes the probabilit y for each observ ation i to b elong to eac h lab el k : julia> function expectation_step(X, dists, prior) n = size(X, 1) Z = Matrix{Float64}(undef, n, length(prior)) 26 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem for k in eachindex(prior) for i in 1:n Z[i,k] = prior[k] * pdf(dists[k], X[i,:]) / sum( prior[j] * pdf(dists[j], X[i,:]) for j in eachindex(prior) ) end end return Z end dists is a vector of distributions, note that no assumption is needed on these distributions, other than the standard interface. The only computation applied is indeed the computation of the pdf for eac h of these. The op eration for whic h the sp eciﬁc structure of the distribution is required is the maximiza- tion step. F or man y distributions, a closed-form solution of the maximum-lik eliho o d estimator is kno wn, av oiding exp ensive optimization schemes as developed in D.3 . In the case of the Normal distribution, the maxim um lik eliho o d estimator is giv en by the empirical mean and co v ariance matrix. W e deﬁne the function maximization_step to take a distribution type, the data X and current lab el estimates Z , computing b oth the prior probabilities of each of the classes π k and the corresp onding conditional distribution D k , it returns the pair of vectors ( D , π ) . julia> function maximization_step(::Type{<:MvNormal}, X, Z) n = size(X, 1) Nk = size(Z, 2) µ = map(1:Nk) do k num = sum(Z[i,k] .* X[i,:] for i in 1:n) den = sum(Z[i,k] for i in 1:n) num / den end Σ = map(1:Nk) do k num = zeros(size(X, 2), size(X, 2)) for i in 1:n r = X[i,:] .- µ [k] num .= num .+ Z[i,k] .* (r * r ' ) end den = sum(Z[i,k] for i in 1:n) num ./ den end prior = [inv(n) * sum(Z[i,k] for i in 1:n) for k in 1:Nk ] dists = map(1:Nk) do k MvNormal( µ [k], Σ [k] + 10e-7I) end return (dists, prior) end The ﬁnal blo c k is a function alternativ ely computing Z and ( D , π ) un til conv ergence: Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 27 julia> function expectation_maximization( D::Type{<:Distribution}, X, Nk, maxiter = 500, loglike_diff = 10e-5) # initialize classes n = size(X,1) Z = zeros(n, Nk) for i in 1:n j0 = mod(i,Nk)+1 j1 = j0 > 1 ? j0-1 : 2 Z[i,j0] = 0.75 Z[i,j1] = 0.25 end (dists, prior) = maximization_step(D, X, Z) l = loglike_mixture(X, dists, prior) lprev = 0.0 iter = 0 # EM iterations while iter < maxiter && abs(lprev-l) > loglike_diff Z = expectation_step(X, dists, prior) (dists, prior) = maximization_step(D, X, Z) lprev = l l = loglike_mixture(X, dists, prior) iter += 1 end return (dists, prior, Z, l, iter) end julia> function loglike_mixture(X, dists, prior) l = zero(eltype(X)) n = size(X,1) for i in 1:n l += log( sum(prior[k] * pdf(dists[k], X[i,:]) for k in eachindex(prior)) ) end return l end Starting from an alternated aﬀectation of observ ations to lab els, it calls the t wo metho ds deﬁned ab o v e for the E and M steps. The strength of this implementation is that extending it to more functions only requires to implemen t maximization_step(D, X, Z) for the new distribution. The rest of the structure will then w ork as expected. E. Benc hmarks on mixture PDF ev aluations In this section, we ev aluate the computation time for the PDF of mixtures of distributions, on mixture of Gaussian univ ariate distributions, and then on a mixture of heterogeneous distributions including normal and log-normal distributions. 28 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem n = 0 n = 20 n = 25 n = 50 Figure 9: Con vergence of the EM algorithm julia> using BenchmarkTools: @btime julia> distributions = [ Normal(-1.0, 0.3), Normal(0.0, 0.5), Normal(3.0, 1.0), ] julia> priors = [0.25, 0.25, 0.5] julia> gmm_normal = MixtureModel(distributions, priors) julia> x = rand() julia> @btime pdf($gmm_normal, $x) julia> large_normals = [Normal(rand(), rand()) for _ in 1:1000] julia> large_probs = [rand() for _ in 1:1000] julia> large_probs .= large_probs ./ sum(large_probs) julia> gmm_large = MixtureModel(large_normals, large_probs) @btime pdf($gmm_large, $x) julia> large_h = append!( ContinuousUnivariateDistribution[Normal(rand(), rand()) for _ in 1:1000], ContinuousUnivariateDistribution[LogNormal(rand(), rand()) for _ in 1:1000], ) julia> large_h_probs = [rand() for _ in 1:2000] julia> large_h_probs .= large_h_probs ./ sum(large_h_probs) julia> gmm_h = MixtureModel(large_h, large_h_probs) julia> @btime pdf($gmm_h, $x) Mathieu Besançon, Da vid Anthoﬀ, Alex Arslan, Simon Byrne, Dahua Lin, Theo dore P apamarkou, John Pearson 29 The PDF computation on the small, large and heterogeneous mixture tak e on a verage 332 ns , 105 µs and 259 µs resp ectively . W e also compare the computation time with the man ual computation of the mixture ev aluated as such: julia> function manual_computation(mixture, x) v = zero(eltype(mixture)) p = probs(mixture) d = components(mixture) for i in 1:ncomponents(mixture) if p[i] > 0 v += p[i] * pdf(d[i], x) end end return v end The PDF computation currently in the library is faster than the manual one, the sum o v er w eighted PDF of the comp onen ts b eing p erformed without branching. The ratio of manual time ov er the implementation of Distributions.jl b eing on av erage 1.24, 5.88 and 1.28 resp ec- tiv ely . The greatest acceleration is observed for large mixtures with homogeneous comp onent t yp es. The slo w-down for heterogeneous comp onent t yp es is due to the compiler not able to infer the type of each elemen t of the mixture, th us replacing s tatic with dynamic dis- patc h. The Distributions.jl package also con tains a /perf folder containing exp eriments on p erformance to trac k p ossible regressions and compare diﬀeren t implemen tations. Aﬃliation: Mathieu Besançon Zuse Institute Berlin and Departmen t of Applied Mathematics & Industrial Engineering P olytechnique Montréal, Montréal, QC, Canada and École Cen trale de Lille, Villeneuve d’Ascq, F rance and INOCS, INRIA Lille & CRIST AL, Villeneuv e d’Ascq, F rance E-mail: mathieu.besancon@polymtl.ca 30 Distributions.jl : Probabilit y Distributions in the JuliaStats Ecosystem Theo dore P apamark ou Departmen t of Mathematics The Univ ersity of Manchester and Departmen t of Mathematics Univ ersity of T ennessee and Computational Sciences & Engineering Division Oak Ridge National Lab oratory E-mail: theodore.papamarkou@manchester.ac.uk Da vid Anthoﬀ Energy and Resources Group Univ ersity of California at Berkeley E-mail: anthoff@berkeley.edu Alex Arslan Beacon Biosignals, Inc. E-mail: alex.arslan@beacon.bio Simon Byrne California Institute of T ec hnology E-mail: simonbyrne@caltech.edu Dah ua Lin The Chinese Univ ersity of Hong Kong E-mail: dhlin@ie.cuhk.edu.hk John P earson Departmen t of Biostatistics and Bioinformatics Duk e Universit y Medical Center Durham, NC, United States of America E-mail: john.pearson@duke.edu

Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment