Boltzmann Samplers, Polya Theory, and Cycle Pointing
We introduce a general method to count unlabeled combinatorial structures and to efficiently generate them at random. The approach is based on pointing unlabeled structures in an "unbiased" way that a structure of size n gives rise to n pointed struc…
Authors: Manuel Bodirsky, Eric Fusy, Mihyun Kang
Pointing (or rooting) is an important tool to derive decompositions of combinatorial structures, with many applications in enumerative combinatorics. Such decompositions can for instance be used in polynomial-time algorithms that sample structures of a combinatorial class uniformly at random. For a class of labeled structures, pointing corresponds to taking the derivative of the (typically exponential) generating function of the class. In other words, each structure of size n gives rise to n pointed (or rooted) structures. Other important operations on classes of combinatorial structures are the disjoint union, the product, and the substitution operation -they correspond to addition, multiplication, and composition of the associated generating functions. Together with the usual basic classes of combinatorial structures (finite classes, the class of finite sets, the class of finite sequences, and the class of cycles), this collection of constructions is a powerful device to define a great variety of combinatorial families.
If a class of structures can be described by recursive specifications involving pointing, disjoint unions, products, substitutions, and the basic classes, then the techniques of analytic combinatorics can be applied to obtain enumerative results, to study statistical properties of random structures in the class, and to derive efficient random samplers. An expository account for this line of research is [10]. Among recent developments in the area of random sampling are Boltzmann samplers [9], which are an attractive alternative to the recursive method of sampling [25,11]. Both approaches provide in a systematic way polynomial-time uniform random generators for decomposable combinatorial classes. The advantage of Boltzmann samplers over the recursive method of sampling is that Boltzmann samplers operate in linear time if a small relative tolerance is allowed for the size of the output, and that they have a small preprocessing cost, which makes it possible to sample very large structures.
A third general random sampling approach is based on Markov chains. This approach does not require recursive decompositions of structures, and is applicable on a wide range of combinatorial classes. However, Markov chain methods are mostly limited to approximate uniformity. Moreover, it is usually difficult to obtain bounds on the rate of convergence to the uniform distribution [20].
All the results in this paper concern classes of unlabeled combinatorial structures, i.e., the structures are considered up to isomorphism. In the case of the class of all graphs, the labeled and the unlabeled model do not differ much, which is due to the fact that almost all graphs do not have a non-trivial automorphism (see e.g., [5,Ch. 9.] and [17]). However, for many interesting classes of combinatorial structures (and for most of the classes studied in this paper), the difference between the labeled and the unlabeled setting does matter.
The Markov chain approach can be adapted also to sample unlabeled structures approximately uniformly at random, based on the orbit counting lemma [19]. Again, the rate of convergence is usually very difficult to analyze, and frequently these Markov chains do not have a polynomial convergence rate [15]. For unlabeled structures, the approach faces additional difficulties: to computationally implement the transitions of the Markov chain, we have to be able to generate structures with a given symmetry uniformly at random (this is for instance a difficult task for planar graphs), and we have to be able to generate a symmetry of a given structure (for the class of all finite graphs, this task is computationally equivalent to the graph isomorphism problem).
To enumerate unlabeled structures, typically the ordinary generating functions are the appropriate tool. Disjoint unions and products for unlabeled structures then still correspond to addition and multiplication of the associated generating functions as for labeled cases. Boltzmann samplers for classes described by recursive specifications involving these operations have recently been developed in [13]. However, the substitution operation for unlabeled structures no longer corresponds to the composition of generating functions, due to the symmetries an unlabeled structure might have. This problem can be solved by Pólya theory, which uses the generalization of generating functions to cycle index sums to take care of potential symmetries. Pólya theory provides a computation rule for the cycle index sum associated to a substitution construction. The presence of symmetries leads to another problem with the pointing (or rooting) operator: the fundamental property that a structure of size n gives rise to n pointed structures does not hold. Indeed, if a structure of size n has a non-trivial automorphism, then it corresponds to less than n pointed structures (because pointing at two vertices in symmetric positions produces the same pointed structure). Thus, for unlabeled structures, the classical pointing operator does not correspond to the derivative of the ordinary generating function.
In this paper, we introduce an unbiased pointing operator for unlabeled structures. The operator is unbiased in the sense that a structure of size n gives rise to n pointed structures. It produces cycle-pointed structures, i.e., combinatorial structures A together with a marked cyclic sequence of atoms of A that is a cycle of an automorphism of A. Accordingly, we call our operator cycle-pointing. The idea is based on Parker's lemma in permutation group theory [7] (this will be discussed in Remark 3.1). We develop techniques to apply this new pointing operator to enumeration and random sampling of unlabeled combinatorial classes. The crucial point is that cycle-pointing is unbiased. As a consequence, performing both tasks of enumeration and uniform random sampling on a combinatorial class is equivalent to performing these tasks on the associated cycle-pointed class.
To understand how we use our operator, it is instructive to look at the class of free trees, i.e., unrooted and nonplane trees (equivalently, acyclic connected graphs). Building on the work of Cayley and Pólya, Otter [26] determined the exact and asymptotic number of free trees. To this end, he developed the by-now standard dissimilarity characteristic equation, which relates the number of free trees with the number of rooted nonplane trees; see [16]. The best-known method to sample free trees uniformly at random is due to Wilf [34], and uses the concept of the centroid of a tree. The method is an example of application of the recursive method of sampling and requires a pre-processing step where a table of quadratic size is computed.
Cycle-pointing provides a new way to count and sample free trees. Both tasks are carried out on cycle-pointed (nonplane) trees. The advantage of studying cycle-pointed trees is that the pointed cycle provides a starting point for a recursive decomposition. In the case of cycle-pointed trees, we can formulate such a decomposition using standard constructions such as disjoint union, product, and substitution (which requires to suitably adapt these constructions to the cycle-pointed framework). We want to stress that, despite some superficial similarities, this method for counting free trees is fundamentally different from the previously existing methods mentioned above, and it proves particulary fruitful in the context of random generation. Indeed, the dissimilarity characteristic equation [26] and the dissymmetry theorem [2] both lead to generating function equations involving subtraction. However, subtraction yields massive rejection when translated into a random generator for the class of structures (both for Boltzmann samplers and for the recursive method). In contrast, the equations produced by the method based on cycle-pointing have only positive signs, and the existence of a Boltzmann sampler for (cycle-pointed) free trees, with no rejection involved, will follow directly from the general results derived in this paper. As usual, the Boltzmann samplers we obtain have a running time that is linear in the size of the structure generated, and have small pre-processing cost.
Similarly, we can decompose plane and nonplane trees, and more generally all sorts of treelike structures. By the observation that the block decomposition of a graph has also a tree-like structure, we can apply the method to classes of graphs where the two-connected components can be explicitly enumerated. This leads to efficient Boltzmann samplers, for instance, for cacti graphs and outerplanar graphs, improving on the generators of [4]. Further, our strategy is not limited to only tree-like structures, but can also be applied to other classes of structures that allow for a recursive decomposition. To demonstrate this, we sketch how the method can be applied to count and sample certain classes of planar maps.
Outline of the paper. To formalize our general results on enumeration and sampling for classes of unlabeled combinatorial structures in full generality, we apply the concept of combinatorial species; we recall this concept in Section 2. In Section 3 we introduce cycle-pointed species and the cycle-pointing operator. Section 4 is devoted to applications of our cycle-pointing operator in enumeration. The technique in which we use the operator to obtain recursive decomposition strategies for unlabeled enumeration is very generally applicable; we illustrate it by the enumeration of (unrooted) non-plane and plane trees, cacti graphs, and maps. Finally, in Section 5, we present how to apply the concepts introduced in this paper to obtain highly efficient random sampling procedures for unlabeled structures. This is illustrated by applications for sampling of concrete and fundamental classes of unlabeled combinatorial structures; several of these concrete sampling results are either new or improve the state-of-the-art of sampling efficiency.
In this paper we work with classes of combinatorial structures, such as graphs, relational structures, functions, trees, plane trees, maps, words, terms, or permutations. There are many ways to define formally these objects; for example, rooted trees can be coded as special types of directed graphs, but also as terms. Which formal representation of a combinatorial structure is best usually depends on the application.
When we are interested in combinatorial enumeration, the differences in representation are not essential; in the example above, we have the same number of rooted trees, no matter how they are represented. We would thus like to have a formalism that is sufficiently abstract so that our results apply to broad classes of combinatorial structures. At the same time, we would like to have a formalism for classes of combinatorial structures that supports fundamental construction operations for combinatorial classes, such as formation of disjoint unions, products, and substitution, and allows to state general results about enumeration and random sampling.
The theory of combinatorial species is an elegant tool that fully satisfies the needs mentioned above. We give a brief introduction to species, and refer to Bergeron, Labelle, and Leroux [2] for a broader treatment of the topic. The fundamental and well-known classes of combinatorial structures that we treat in Section 4 provide ample illustration of the concepts we define in this section.
2.1. Combinatorial Species. We closely follow the presentation in [2]. A species of structures is a functor A from finite sets U to finite sets A[U ], together with a rule that produces for each bijection σ : U → V a function from A[U ] to A[V ]. Slightly abusing notation, this function will also be denoted by A[σ]. The functions A[σ] must satisfy the following two (functorial) properties:
• for all bijections σ : U → V and τ :
• for the identity map Id U : U → U ,
The advantage of species is that the rule A that produces the structures A[U ] and the transport functions A[σ] can be described in any way; the book [2] gives instructive examples where this description is by axiomatic systems, explicit constructions, algorithms, combinatorial operations, or functional equations. Definition 2.2. A species A is a subspecies of a species B (and we write A ⊆ B) if it satisfies the following two conditions:
• for any finite set
2.2. Enumeration. For all finite sets U , the number of A-structures on U depends only on the number of elements of U (and not on the elements of U ). If A is a species, we write a n for |A[{1, . . . , n}]|, the cardinality of the set of A-structures on {1, . . . , n}. The series
is called the exponential generating series of the species A.
In this article we focus on unlabeled structures, i.e., we consider structures up to isomorphism. We may restrict ourselves to structures on sets of the form U = [1..n] := {1, . . . , n}, and write
We also say that A 1 and A 2 have the same isomorphism type, and the equivalence classes of A-structures on [1..n] with respect to ∼ are also called unlabeled A-structures of size n, The set of all those equivalence classes is denoted by A n , and we write a n for its cardinality | A n |. The series
is called the ordinary generating series (OGS) of A. We use the classical notation [x n ] A(x) to denote the n-th coefficient a n in the power series A(x).
Cycle index sums. For a species A and each n ≥ 0, a symmetry of A of size n is a pair (A, σ) where A is from A[n] and σ is an automorphism of A. We call A the underlying structure of the symmetry (A, σ). Notice that the automorphism σ can be the identity. We denote by Sym(A) the species defined by
The definition of the transport of Sym(A) is obvious (the definition of symmetries as an auxiliary species is standard; see Section 4.3 in [2]). The weight-monomial of a symmetry (A, σ) of size n is defined as
where, for i ∈ [1.
.n], s i is a formal variable and c i (σ) is the number of cycles of σ of length i. For simplicity, in the following we will write c i for c i (σ) if the corresponding automorphism is clear from the context. The cycle index sum of A, denoted by Z A (s 1 , s 2 , . . .), or shortly Z A , is the formal power series defined as the sum of the weight-monomials over all the symmetries of A,
Basic species Notation Cycle index sum Empty species 0
Set
The cycle index sums of basic species and of species composed by +,
•, and •.
Cycle index sums for classes of combinatorial structures have been introduced by Pólya [28]. The following fact, which is based on Burnside's lemma, shows that cycle index sums are a refinement of ordinary generating series.
Lemma 1 (Pólya). Let A be a species of structures. For n ≥ 0, each unlabeled structure à ∈ A n gives rise to n! symmetries, i.e., there are n! symmetries (A, σ) such that A ∈ Ã. Hence,
In the proofs of combinatorial identities it will be convenient to identify species that are essentially the same. Definition 2.3. Let A and B be two species. An isomorphism from A to B is a family of bijections α U : A[U ] → B[U ] which satisfies the following naturality condition: for any bijection σ : U → V between two finite sets U and V , and for any A-structure A on U , one must have
It is known (see [2]) that when A is isomorphic to B, then A(x) = B(x), A(x) = B(x), and Z A (x 1 , x 2 , . . . ) = Z B (x 1 , x 2 , . . . ). Hence, for the purposes of combinatorial enumeration we can even identify isomorphic species, and as in [2] we write A = B when A and B are isomorphic species.
2.4. Basic species and combinatorial constructions. We now recall a collection of basic species and combinatorial constructions. We start with the description of some basic species, and then introduce the three fundamental constructions of disjoint union, product, and substitution. • the species Cyc of oriented cycles (or cyclic permutations) defined by
In each case, the definition of the transport A[σ] of A-structures is obvious.
If A is a species, then A [n] denotes the species defined by
In particular, X is Set [1] . 2.4.2. Constructions. We now describe the three fundamental constructions that we use to construct species from other species.
Disjoint union. For two species A and B, the species A + B, called the disjoint union (or sum) of A and B, is defined by
Note that when A, A ′ , B, B ′ are species such that A = A ′ (i.e., A and A ′ are isomorphic) and
The disjoint union of a countably infinite sequence A 1 , A 2 , . . . of species is defined analogously,
) (this species is well-defined provided the set on the right is finite for each finite set U ).
Product. The cartesian (often also called partitional or dinary) product A • B of two species A and B is the species defined as follows.
where σ i is the restriction σ| Ui of σ on U i for i ∈ {1, 2}.
Again we note that when A, A ′ , B, B ′ are species such that A = A ′ and B = B ′ , then
Substitution. Given two species A and B such that B[∅] = ∅, the (partitional) composite of B in A, denoted by A • B, is the species C obtained as follows. A C-structure on U is a triple (π, A, B) where
• π is a partition of U ;
• A is an A-structure on the set of classes of π;
• B := (B p ) p∈π where for each class p of π, B p is a B-structure on p. The transport along a bijection σ : U → V is carried out by setting, for any
• π is the partition of V obtained by transport of π along σ;
• the structure Ā is obtained from the structure A by A-transport along the bijection σ : π → π induced by σ on π; • for each p = σ(p) ∈ π, the structure Bp is obtained from the structure B p by B-transport along σ| p . We call the A-structure A the core of (π, A, B), and B p in B = (B p ) p∈π a component of (π, A, B). Also for the substitution construction, we have that when A, A ′ , B, B ′ are species such that
Together with the basic species, these constructions provide an extremely powerful device for the description of combinatorial families. The substitution construction is particularly interesting, as it allows us to express other combinatorial constructions, such as the formation of sequences, sets, and cycles of structures of a species A, which are specified as Seq • A, Set • A, and Cyc • A, respectively.
As usual, and to avoid clumsy expressions with many brackets, we make the convention that in expressions involving several of the symbols +, •, •, the symbol • binds stronger than the symbol •, and the symbol • binds stronger than the symbol +.
There are explicit rules to compute the cycle index sum for the basic species and for each construction.
Definition 2.4. Given two power series f := f (x 1 , x 2 , x 3 . . .) and g := g(x 1 , x 2 , x 3 , . . .) such that g(0, 0, . . .) = 0, the plethystic composition of f and g, as defined in [2], is the power series
In other words, f • g is the series f where each variable x k is replaced by g(x k , x 2k , x 3k , . . .).
Proposition 2 (Pólya, Bergeron et al. [2]). For each of the basic species {0, 1, X, Seq [k] , Set [k] , Cyc [k] , Seq, Set, Cyc}, the associated cycle index sum has an explicit expression, as given in Figure 1. For each of the fundamental constructions ∧ ∈ {+, •, •}, there is an explicit rule to compute the cycle index sum of the species A ∧ B, as given in Figure 1.
Remark 2.1. The ordinary generating series of a species can be obtained from the cycle index sums for the species. For the sum and product constructions, we obtain
For the substitution construction, the computation rule is
2.5. Recursive Specifications. It is possible to define species via recursive specifications that involve the fundamental constructions introduced above.
Definition 2.5. A (standard) recursive specification with variables x 1 , . . . , x m over the species A 1 , . . . , A ℓ is a system Ψ of equations x 1 = e 1 , . . . , x m = e m where each e i is
Under a certain condition, a recursive specification Ψ with variables x 1 , . . . , x m defines new species X 1 , . . . , X m as follows. We first define for each i ≥ 0 a vector of species
m ) = (0, . . . , 0). For i > 1 and 1 ≤ j ≤ m, the species X (i) j is defined from e j by substituting for all 1 ≤ k ≤ m the occurrences of x k in e j by X (i-1) k . The resulting expression only contains species and symbols +, •, and •, and hence evaluates to a species, unless e j = a • b and b is substituted by a species that contains structures of size 0. If this case never occurs, i.e., if B[∅] = ∅ whenever a species B is substituted for b in an expression a • b, then we call Ψ admissible.
Note that the sequence X
k , X
k , . . . is monotone, that is,
for all 1 ≤ k ≤ m and all i ≥ 1; this follows from the following basic fact (we omit the proof which is straightforward). Proposition 3. If A, A ′ , B, B ′ are species, and
Note that due to this proposition, it is easy to decide for a given recursive specification whether or not it is admissible (given also the information which of the species A 1 , . . . , A ℓ contains structures of size 0). Definition 2.6. Let Ψ be an admissible recursive specification with variables x 1 , . . . , x m over the species A 1 , . . . , A ℓ , and let the species X (i) j be as described above. If for each n, the set i≥1 X (i) [n] is finite, then the species X 1 , . . . , X m specified by Ψ are defined as follows. We set
. By monotonicity, for each n we have that X j [n] equals X (k) j [n] for some k, and so we can define the transport of X j by X j
Definition 2.7. Let A be a class of species. The class of species that is decomposable over A is the smallest class of species B that contains A, and that contains all species that can be defined by recursive specifications over species from B.
2.6. Decomposition of symmetries. In this subsection we provide a proof of Proposition 2. The proof relies on a precise description of the nature of the automorphisms for the basic species and for the species obtained from one of the constructions +, •, or •. Even though proofs and details can already be found in [2], we give our own presentation here since we build on this later on, in particular to define random generation rules (Section 5).
2.6.1. Automorphisms of some basic species. By convention, the neutral species 1 has the cycleindex sum 1 (the structure of size 0 is assumed to be fixed by the "empty" automorphism, of weight 1).
The cycle index sum of X is s 1 . The only X-structures are over U with |U | = 1, and all automorphisms of such structures consist of a single cycle that has weight s 1 .
For the species Seq, the only automorphisms are the identity, and the identity has weight s k 1 /k!. As there are k! sequences of length k, the cycle index sum of Seq [k] is s k 1 . Since Seq = k≥0 Seq [k] , the cycle index sum for Seq is
For the species Cyc, the automorphisms are exactly the 'shifts': for a cycle (v 1 , . . . , v k ), the shift of this cycle by m ∈ [0..k -1] maps v i to v i+m where the indices are modulo k. The automorphisms consist of k/r cycles of length r, where r is the order of m in Z/(kZ). For each divisor r of k, there are φ(r) elements of order r in Z/(kZ), where φ(.) is the Euler totient function. Hence, for each cycle of length k, the sum of the weight-monomials over all the cycles of size r is φ(r)s k/r r /k!, and the sum of the weight-monomials over all the symmetries is r|k φ(r)s k/r r /k!. As there are (k -1)! cycles of length k, the cycle index sum of Cyc [k] is 1/k • r|k φ(r)s k/r r . For the species Cyc = k≥1 Cyc [k] of all cycles, the sum of the weight-monomials over all symmetries of size r, which we denote by Z Cyc,r , is thus
Therefore, summing Z Cyc,r over r ≥ 1, one obtains the expression of Z Cyc given in Figure 1.
For the species Set, the automorphisms are all permutations. Hence, the cycle index sum for Set is simply the exponential generating series for all permutations, where each cycle of length i is marked by a variable s i . In other words, when f (x) denotes the exponential generating series for permutations, and f (x; s 1 , s 2 , s 3 , . . .) denotes the same generating series where each variable s i marks the number of cycles of length i, then the rules for computing exponential generating series yield
Hence, f (1; s 1 , s 2 , s 3 , . . .) is the cycle index sum for Set, and the cycle index sum for Set [k] is [x k ]f (x; s 1 , s 2 , s 3 , . . .), which corresponds to a restriction to permutations of size k. Notice that Z Set [k] is always a polynomial; for instance, we have Disjoint union. Clearly, each symmetry of C = A + B is either a symmetry of A or of B depending on whether the underlying structure is in A or in B. Consequently, Sym(A + B) is the disjoint union of Sym(A) and Sym(B). This directly yields the formula
Product. Consider a product species C = A • B. Then there is the following bijective correspondence between symmetries of C and ordered pairs of a symmetry of A and a symmetry of B: when ((A, B), σ) is a symmetry of C where U are the atoms of A and V are the atoms of B, then ((A, B), σ) is in correspondence with the symmetry (A, σ| U ) of A and the symmetry (B, σ| V ) of B (where σ| U and σ| V denote the restriction of σ to U and V , respectively).
Hence,
(Indeed, for the species C the cycle index sum acts like an exponential generating series for the species Sym(C) of symmetries when taking the refined weights
To understand the automorphisms of C-structures, consider a C-structure C = (π, A, (B p ) p∈π ) over [1..n]. Let σ be an automorphism of C. It is clear that if two atoms v 1 and v 2 of C are in the same component, then σ(v 1 ) and σ(v 2 ) have to be on the same component as well, by definition of the transport for C; moreover, σ induces an automorphism τ of A.
Consider an atom v from a component B p of C, and let k be the length of the cycle of τ containing v. Note that σ k maps B p to itself, so σ k induces an automorphism on B p , the resulting symmetry being denoted by (B p , σ p ). Consider the cycle c = (p 1 , . . . , p k ) in τ where p 1 = p. Observe that the symmetries (B pi , σ pi ), for i ∈ [k], can be seen as k copies of the same symmetry of B, which we denote by (B c , σ c ). For each cycle d = (w 1 , . . . , w ℓ ) of σ c , let (d 1 , . . . , d k ) be the copies of d at (p 1 , . . . , p k ), respectively. Then one can merge d 1 , . . . , d k into a unique cycle of length ℓ • k using a specific operation which we call composition of cycles. Definition 2.8. Let d = (v 1 , . . . , v ℓ ) be a cycle of atoms from [1..n], with v 1 the atom of d having the smallest label. Let d 1 , . . . , d k be a sequence of k copies of the cycle. Then the composed cycle of d 1 , . . . , d k is the cycle of atoms of length ℓk such that, for 1 ≤ i < k and 1 ≤ j ≤ ℓ, the successor of the atom v j in d i is the atom v j in d i+1 ; and for 1 ≤ j ≤ ℓ, the successor of the atom v j in d k is the atom v (j+1) mod ℓ in d 1 .
This definition correctly reflects how each cycle of σ is assembled from copies of cycles that are on isomorphic components. Indeed, walking k steps forward on the composed cycle corresponds to walking one step forward on one fixed cycle, which corresponds to the fact that the induced automorphism on each component B vi is the effect of σ iterated k times.
In each symmetry (A, σ) where A is an A-structure over [1.
.n], the automorphism σ induces a partition of [1..n] corresponding to the cycles of σ. We say that σ has type t = (c 1 , . . . , c n ) when c i is the number of cycles of length i in σ. Note that n = n i=1 ic i (and such integer sequences (c 1 , . . . , c n ) will also be called partition sequences (of order n)).
To compute the cycle index sum
where ρ induces on A a permutation σ of type t, we choose a symmetry of A of type t, and then choose c i symmetries of B for each i. The sum Z (t) A•B of the weight-monomials for all those symmetries is therefore
Summing over all possible types t of permutations, one obtains
In this section we introduce cycle-pointed species, and our unbiased pointing operator.
3.1. Cycle-Pointed Species. Let A be a species. Then the cycle-pointed species of A, denoted by A • , is defined as follows. For a finite set U , the set A • [U ] is defined to be the set of all pairs P = (A, c) where A ∈ A[U ] and c = (v 1 , . . . , v ℓ ) is a cycle of atoms of A such that there exists at least one automorphism of A having c as one of its cycles (i.e., (v 1 , . . . , v ℓ ) is mapped to (v 2 , . . . , v ℓ , v 1 )). The cycle c is called the marked cycle (or pointed cycle) of P , and A is called the underlying structure of P . Note that cycle-pointed species are in particular species. Thus, the theory from the previous section also applies to cycle-pointed species. An automorphism σ of A having c as one of its cycles is called a c-automorphism of P , and the other cycles of σ are called unmarked. By definition, two cycle-pointed structures P and P ′ are isomorphic if there exists an isomorphism from the underlying structure of P to the underlying structure of P ′ that maps the marked cycle of P to the marked cycle of P ′ (i.e., each atom of the marked cycle of P is mapped to an atom of the marked cycle of P ′ , and the cyclic order is preserved).
is the species that consists of those structures from P whose marked cycle has length ℓ.
We define A ⊛ to be the subspecies of A • where in all structures the marked cycle has length greater than 1, i.e., A ⊛ = ℓ≥2 (A • ) (ℓ) . (All cycle-pointed species that will be considered in the applications -except for maps -are either of the form A • or A ⊛ .) Definition 3.2. Let P be a cycle-pointed species. We define P † to be the species obtained from P by removing the marked cycle from all P-structures; that is, the set of
Clearly, for any species A we have that (A • ) † = A.
In order to develop Pólya theory for cycle-pointed species, we introduce the terminology of c-symmetry and rooted c-symmetry. Given a cycle-pointed species P, a csymmetry on P is a pair (P, σ) where P = (A, c) is a cycle-pointed structure in P and σ is a c-automorphism of A. A rooted c-symmetry is a triple (P, σ, v), where (P, σ) is a c-symmetry, and v is one of the atoms of the marked cycle of P ; this atom v is called the root of the rooted c-symmetry. The species of rooted c-symmetries of P is denoted by RSym(P).
The weight-monomial of a rooted c-symmetry of size n is defined as
where t ℓ and the s i 's are formal variables, ℓ is the length of the marked cycle, and for i ∈
For simplicity, in the following we will write n i for n i (σ) if the corresponding automorphism is clear from the context. We define the pointed cycle index sum ZP (s 1 , t 1 ; s 2 , t 2 ; . . .) of P, denoted by ZP , as the sum of the weight-monomials over all rooted c-symmetries of P, ZP (s 1 , t 1 ; s 2 , t 2 ; . . . ) := (P,σ,v)∈RSym(P)
The following lemma is the counterpart of Lemma 1 for cycle-pointed species; it shows that pointed cycle index sums refine ordinary generating functions. Recall that for a species P, the set P denotes the unlabeled P-structures, see Section 2.2.
Lemma 4. Let P be a cycle-pointed species. For n ≥ 0, each unlabeled structure P ∈ P n gives rise to exactly n! rooted c-symmetries, i.e., there are n! rooted c-symmetries (P, σ, v) such that P ∈ P . As a consequence, P(x) = ZP (x, x; x 2 , x 2 ; . . .).
Proof. Lemma 1 implies that Pn gives rise to n! symmetries. Now we establish a bijection between these symmetries and rooted c-symmetries from Pn . Fix a c-symmetry (P 0 , σ 0 ) where P 0 is from Pn (by definition of cycle-pointed species, such a c-symmetry exists). Now, let (P 1 , σ 1 ) be a symmetry of P where P 1 is also from Pn . The marked cycle (v 1 , . . . , v ℓ ) of P 1 is preserved by σ 1 , so that all its elements are shifted by the same value r ∈ [1.
.ℓ] modulo ℓ, i.e., σ 1 maps v i to v (i+r) mod ℓ . Since both P 0 and P 1 are from Pn , there is an isomorphism σ from P 1 to P 0 . Moreover, the permutation τ := σ -1 σ 0 σ is a c-symmetry of P 1 . Observe that the permutation τ -r+1 σ 1 is an automorphism of P 1 that moves an atom of the marked cycle r steps forward (because of σ 1 ) and then r -1 steps backward (because of τ -r+1 ). Hence, τ -r+1 σ 1 is a c-symmetry of P 1 . The desired bijection maps (P 1 , σ 1 ) to the rooted c-symmetry (P 1 , τ -r+1 σ 1 , v) where v is the atom of the marked cycle having the r-th smallest label. This correspondence can be inverted easily, and hence we have found a bijection between the symmetries and the rooted c-symmetries for structures having Pn as unlabeled structure.
Observe that a rooted c-symmetry of A • is obtained from a symmetry (A, σ) of A by choosing an atom v of A and marking the cycle of σ containing v. Therefore, each symmetry (A, σ) of size n of the species A yields n rooted c-symmetries on A • , and hence
Further, a rooted c-symmetry of A • can equivalently be obtained by marking a cycle of atoms that corresponds to a cycle of σ and choosing an atom of the cycle as the root of the rooted c-symmetry. Together, these observations yield the equality
For ℓ = 1, which corresponds to structures of A with a unique distinguished vertex (the root ), we recover the well-known equation relating the cycle index sum of a species and of the associated rooted species; see [2, Sec. 1.4.] and [16].
As stated below and illustrated in Figure 2, pointing a cycle of symmetry instead of a single atom (as the classical pointing operator does) yields an unbiased pointing operator in the unlabeled setting.
Theorem 5 (unbiased pointing). Let A be a species. Then, for n ≥ 0, each unlabeled structure of A n gives rise to exactly n unlabeled structures in
Proof. Given à ∈ A n , let S be the set of unlabeled pointed structures of A • whose underlying unpointed structure is Ã. The proof of the lemma reduces to proving that S has cardinality n. Let Sym( Ã) be the set of symmetries for the structure Ã, and let RSym(S) be the set of rooted c-symmetries for structures from S. Remark 3.1. Theorem 5 is equivalent to a result known as Parker's lemma [7,Section 2.8]. For a subgroup G of the symmetric group S n , say that a cycle c in g ∈ G is equivalent to a cycle c ′ in g ′ ∈ G if there exists h ∈ G that maps the elements of c to the elements of c ′ and preserves the Basic cycle-pointed species: A • , A ⊛ , where A ∈ {0, 1, X, Seq [k] , Set [k] , Cyc [k] , Seq, Set, Cyc} Pointed cycle-index sum: cyclic order, i.e., for each element x ∈ c, the successor of x in c is mapped by h to the successor of h(x) in c ′ . Let a k be the number of inequivalent cycles of length k. Then Parker's lemma states that n k=1 a k = n. If this lemma is applied to the automorphism group of a fixed structure A of size n, then a k is the number of unlabeled cycle-pointed structures arising from A and such that the marked cycle has length k. So Parker's lemma states that there are n unlabeled cycle-pointed structures arising from A, i.e., it implies Theorem 5 (conversely, each permutation group is the automorphism group of a structure, so Parker's lemma can be deduced from Theorem 5).
Remark 3.2. The classical pointing operator, which selects a single atom in a structure, yields an equation similar to (20) for exponential generating series, which is useful for labeled enumeration. Given a species A, let A • be the species of structures from A where an atom is distinguished. Then
An important contribution of this article is to define a pointing operator A → A • that yields the same equation, Equation (20), in the unlabeled case.
3.3. Basic Cycle-pointed Species and Constructions.
Basic cycle-pointed species. Species of the form A • or A ⊛ where A is a basic species as introduced in Subsection 2.4 will be called basic cycle-pointed species. The derivation rule in Equation ( 19) allows us to compute the pointed cycle index sum of basic cycle-pointed species, as indicated in Figure 3 (upper part).
It is clear that the disjoint union of two cycle-pointed species as defined in Section 2.4.2 is again a cycle-pointed species. We now adapt the constructions of product and substitution to obtain a pointed product and a pointed substitution operation that produce cycle-pointed species.
Pointed product. Let A, B be species, and let P ⊆ A • be a cycle-pointed species. Then the pointed product P ⋆ B of P and B is the subspecies of (A • B) • of all those structures ((A, B), c) in (A • B) • where the pointed cycle c is from A, and (A, c) ∈ P. We call (A, c ′ ) ∈ A • the core-structure of E.
We can now define a substitution construction for cycle-pointed species. Let P ⊆ A • and let B be a species such that B[∅] = ∅. Then P ⊚ B is defined as the subspecies of structures from (A • B) • whose core-structure is in P.
As in the previous section, we make the convention that in expressions involving several of the symbols +, •, ⋆, •, ⊚, the symbols •, ⊚ bind stronger than the symbols •, ⋆, and the symbols •, ⋆ bind stronger than the symbol +.
In a similar way to the labeled framework [2], our pointing operator behaves well with the three constructions +, •, and •: Proposition 6. The cycle-pointing operator obeys the following rules:
Proof. It is easy to see that (A + B) • is isomorphic to A • + B • . For the product, note that the pointed cycle c of a structures from (A • B) • has to be entirely on A or entirely on B. The species that contains all structures ((A, B), c) where c is on A is isomorphic to A • ⋆ B, and the species that contains all structures ((A, B), c) where c is on
For the expression for the substitution operation, we in fact have not only isomorphism, but even equality of species. This is clear from the fact that all core-structures of structures from
As in the unpointed case, there are explicit rules to compute the pointed cycle index sums for each basic species and for each construction. To this end we need the following notion of composition for power series. Definition 3.3. Let f and g be two power series of the form f := f (x 1 , y 1 ; x 2 , y 2 ; . . .) and g := g(x 1 , x 2 , . . .) such that g(0, 0, . . .) = 0. Then the pointed plethystic composition of f with g is the power series f
with g k = g(x k , x 2k , x 3k , . . .) and hk = h(x k , y k ; x 2k , y 2k ; . . .) for h := ℓ≥1 ℓt ℓ ∂ ∂s ℓ g. The following proposition is the counterpart of Proposition 2 for cycle-pointed species.
Proposition 7 (computation rules for pointed cycle index sums). For each basic species A, the pointed cycle index sum of the pointed species A • and A ⊛ is given by the explicit expression given in Figure 3 (upper part) in terms of the cycle index sum of A. For each of the fundamental pointed constructions +, ⋆, and ⊚ there is an explicit rule, given in Figure 3 (lower part), to compute the pointed cycle index sum of the resulting species.
Proof. The statement is clear for the cycle-pointed atomic species and the pointed union. Let A, B be species, and let P ⊆ A • be a cycle-pointed species. For the cycle-pointed product Q = P ⋆ B notice that, similarly as for a partitional product for species, a rooted c-symmetry on Q decomposes into a rooted c-symmetry on P and a symmetry on B, since the automorphism has to act separately on the two component structures. Therefore, RSym(Q) can be considered as a partitional product of RSym(P) and Sym(B), which yields ZQ = ZP • Z B .
For the substitution construction, the proof is similar. Let P ⊆ A • be a cycle-pointed species and let B be a species such that
As we have seen in Section 2.6, the core structure A is endowed with an induced automorphism σ ′ . In addition, the automorphism is naturally rooted at the atom v ′ ∈ A where the B • -component that contains the root v is substituted. We denote the cycle of σ ′ that contains v ′ by c ′ . Now we have that (A, c ′ ) is cycle-pointed and the automorphism σ ′ rooted at v ′ is a rooted c-symmetry on P = (A, c ′ ). In addition, the components substituted at each atom of a cycle c = (u 1 , . . . , u k ) of σ ′ are isomorphic copies of a same symmetry on B. The components that are substituted at the atoms of the marked cycle c ′ are naturally rooted at the isomorphic representant of v. Finally, this decomposition is reversable: one can go back to the original composed symmetry using the composition of cycle operation.
To express these observations in an equation, we define the type of the rooted c-symmetry (P, ρ, v) with P = (A, c ′ ) to be the sequence (ℓ; n 1 , n 2 , . . . , n k ) where ℓ is the length of c ′ , and n i is the number of unmarked cycles of length i in ρ. Note that the size of
The core type of a rooted c-symmetry on P ⊚ B is defined as the type of the rooted c-symmetry induced on the core structure.
Let Z(t) R be the pointed cycle index sum of P restricted to the rooted c-symmetries with core-type t = (ℓ; n 1 , n 2 , . . . , n n ). From the above discussion, we have
Summing over all possible types of rooted c-symmetries t, we obtain ZR = ZP (b 1 , q 1 ; b 2 , q 2 ; . . .), where b i := Z B (s 1 , s 2 , . . .), q ℓ := ZB • (s ℓ , t ℓ ; s 2ℓ , t 2ℓ ; . . .).
In the following we introduce recursive specifications that involve pointed constructions. Cyclepointed specifications are like standard recursive specifications (Definition 2.5), but with two sorts of variables (where one is reserved for cycle-pointed species) and where we are allowed to use additionally the pointed constructions. Definition 3.4. A recursive cycle-pointed specification with variables x 1 , . . . , x m , y 1 , . . . , y m ′ over the species A 1 , . . . , A ℓ and over cycle-pointed species B 1 , . . . , B k is a system Ψ of equations
and each f i is
To define the species X 1 , . . . , X m , Y 1 , . . . , Y m ′ that are given by a recursive cycle-pointed specification Ψ with variables x 1 , . . . , x m , y 1 , . . . , y m ′ over the species A 1 , . . . , A ℓ , B 1 , . . . , B k where B 1 , . . . , B k are pointed, we again (as in Section 2.5) consider sequences of species X (i) j and Y (i)
j are obtained by evaluating the corresponding expressions for x j and y j , respectively (as in Section 2.5). We say that Ψ is admissible if in expressions of the form a • b or a ⊚ b the species substituted for b never contain structures of size 0.
Note that also the new pointed constructions are monotone, so in case that for each n the sets i≥1 X (i) [n] and i≥1 Y (i) [n] are finite it is straightforward (and analogous to Definition 2.6) to define the species X 1 , . . . , X m , Y 1 , . . . , Y m ′ specified by admissible recursive specifications Ψ over A 1 , . . . , A ℓ , B 1 , . . . , B k . Definition 3.5. Let A be a class of species. The class of species that is cycle-pointing decomposable over A is the smallest class of species B that contains A, contains all species that can be specified by cycle-pointed recursive specifications over pointed and unpointed species from B, and contains all species obtained from species of the form A • in B by applying the unpointing operation.
Plenty of examples of species that are decomposable over simple basic species can be found in Section 4.
Proposition 8. If a species A is decomposable (in the sense of Definition 2.7), then the pointed species A • is cycle-pointing-decomposable.
Proof. Follows directly from Proposition 6.
Remark 3.3. The ordinary generating series inherit simple computation rules from the ones for pointed cycle index sums. As expected, for the sum and product constructions, one gets
For the substitution construction, the computation rule is:
where
Hence, to compute the ordinary generating series of a decomposable cycle-pointed species, the only place where the cycle index sum or pointed cycle index sum is needed (as a refinement of ordinary generating series) is for the species that is the first argument of a substitution or pointed substitution construction.
Remark 3.4. As an exercise, the reader can check just by standard algebraic manipulations that the computation rules for cycle-index sums are consistent with Proposition 6. For instance, proving Z(A•B) • = ZA • ⊚B is equivalent (by the computation rules) to proving the equality Z(A•B) • = ZA • ⊚ Z B , which reduces to checking the following identity on power series:
where ∆ is the operator that associates to a power series f (x 1 , x 2 , x 3 , . . .) the power series ∆f (x 1 , y 1 ; x 2 , y 2 ; . . .
Similarly, to prove Z(A+B)
In this section we demonstrate that cycle-pointing provides a new way of counting many classes of combinatorial structures in the unlabeled setting. Typically, species satisfying a "tree-like" decomposition are amenable to our method. This includes of course species of trees, but also species of graphs (provided that the species is closed under taking 2-connected components, and that the sub-species of 2-connected graphs is tractable), and species of planar maps.
The general scheme to enumerate unlabeled structures of a species A, i.e., to obtain the coefficients | A n |, is as follows. First, we observe that the task is equivalent to the task to enumerate unlabeled cycle-pointed structures from A
Enumeration for A • turns out to be easier since the marked cycle usually provides a starting point for a recursive decomposition.
For a cycle-pointed species of trees (and more generally for species satisfying tree-like decompositions), the first step of the decomposition scheme is to distinguish whether the marked cycle has length 1 or greater than 1. The general equation is
where A ′ is the derived species of A, consisting of structures from A where one atom is marked with a special label, say * , as defined in [2] 1 . Then each of the two species A ′ and A ⊛ has to be decomposed. For derived structures (the species A ′ ) we follow the classical root decomposition.
For symmetric cycle-pointed structures (the species A ⊛ ) our decomposition strategy is different, and leads us to introduce the notion of center of symmetry.
Trees. We first illustrate our decomposition method for trees, which are defined as connected acyclic graphs (i.e., unless mentioned otherwise, trees are unrooted ), and we start with the formal definition of the center of symmetry. Let T be a symmetric cycle-pointed tree. A path of T connecting two consecutive atoms of the marked cycle is called a connecting path (thus the number of connecting paths is the size of the marked cycle).
Claim 9 (center of symmetry). Given a symmetric cycle-pointed tree T , all connecting paths of T share the same middle v c , called the central point for the marked cycle of T . The central point v c is the middle of an edge e if these paths have odd length, and is a vertex v if these paths have even length. In the first (second) case, the edge e (the vertex v, resp.) is called the center of symmetry of T .
Proof. We prove here that all connecting paths share the same middle. Let U be the subgraph of T formed by the union of all connecting paths. Observe that U is connected, so U is a subtree of T . In addition, U is globally fixed by any c-automorphism of T (indeed, the property of being on a connecting path is invariant under the action of a c-automorphism), and it contains the atoms of the marked cycle of T . Hence U is the underlying structure of a cycle-pointed tree (U, c).
Consider the classical center of U , obtained by pruning the leaves (at each step, all leaves are simultaneously deleted) until the resulting tree is reduced to an edge or a vertex [2]. The central point v c of U is defined as follows: if the center of U is a vertex v, then v c := v, if the center of U is an edge e, then v c is the middle of e. Let σ be a c-automorphism of (U, c) and let σ be the group of automorphisms generated by σ. It is well known that the central point is fixed by any automorphism on the tree, hence v c is equidistant from all atoms of the marked cycle, as the group σ acts transitively on the vertices of the marked cycle. In addition, v c is on at least one connecting path of T (because U is the union of these connecting paths). Hence, v c has to be on all connecting paths, as the group σ acts transitively on the connecting paths. Thus, v c has to be the middle of all connecting paths simultaneously.
Remark 4.1. Notice that the center of symmetry might not coincide with the classical center of the tree, as shown in Figure 5. However, in the case of plane trees, the two notions of center coincide.
4.1.1. Nonplane trees. Let F be the species of free trees, i.e., unrooted nonplane trees (equivalently, acyclic connected graphs), where the vertices are taken as atoms. Let F ′ be the derived species of F (also the species of derived nonplane trees). Rooted nonplane trees can be decomposed at the root. Since the root does not count as an atom and since the children of the root node are unordered, we classically have
In contrast, the decomposition of symmetric cycle-pointed trees does not start at atoms of the marked cycle, but at the center of symmetry, which is either an edge or a vertex (see Figure 5 for an illustration of the decomposition). In order to write down the decomposition, we introduce the Figure 5. Decomposition of a nonplane tree at its center of symmetry (in case the center of symmetry is a vertex).
species L consisting of a single one-edge graph. Note that L ≃ Set [2] , and that L ⊛ consists of the link graph carrying a marked cycle of length 2 that exchanges the two extremities of the edge.
Claim 10. The species F ⊛ of symmetric cycle-pointed free trees satisfies
where R := X • F is the species of all pointed trees.
Proof. Consider a tree produced from the species L ⊛ ⊚ R + X • Set ⊛ ⊚ R (for an example of a tree produced from X • Set ⊛ ⊚ R, see the transition between the right and the left drawing in Figure 5). Clearly, such a tree is free and cycle-pointed and it is symmetric because the marked cycle of the core-structure -an edge e in the first case, a cycle-pointed set attached to a vertex v in the second case -already has length greater than 1. Hence
Notice also that in the first (second) case, e (v, respectively) is the center of symmetry of the resulting tree. Indeed each connecting path connects vertices on two different subtrees attached at the center of symmetry, which, by symmetry, stands in the middle of such a path. Conversely, for each symmetric cycle-pointed free tree T , we color blue its center of symmetry, which plays the role of a core-structure for T . Partition F ⊛ as F ⊛ v + F ⊛ e , where F ⊛ v (F ⊛ e , respectively) gathers the trees in F ⊛ whose center of symmetry is a vertex (an edge, respectively). Define also M v (M e ) as the species of free trees with a distinguished vertex (edge, resp.) that is colored blue. Clearly
where the blue vertex (edge, resp.) is the center of symmetry. It is clear that the structures of X • ⋆ (Set • R) have their marked cycle of length 1, so they are not in
⊚ R, the atoms of the marked cycle are on a same subtree attached at the blue vertex, so that the blue vertex is not the center of symmetry. Hence the structures of
Therefore we obtain the second inclusion
Proposition 11 (decomposing and counting free trees). The species F • of cycle-pointed free trees has the following cycle-pointed recursive specification over the species Set, L ⊛ , X:
The ordinary generating function f (x) := F(x) of free trees satisfies the equations
where r(x) is specified by r
Proof. The first three lines of the grammar are Equations ( 31), (32), and (33), respectively. The fourth line 2 is obtained from the second line (i.e., R = X • (Set • R)) using the derivation rules of Proposition 6.
Concerning the OGSs, let r(x) := R(x) be the OGS of the species R. Note that F • (x) = xf ′ (x) and R • (x) = xr ′ (x) by Theorem 5. By the computation rules for OGSs (Remark 2.1 and Remark 3.3), the second line of the grammar, i.e., R = X•(Set•R), yields r(x) = x exp( i≥1 r(x i )/i); and the third line of the grammar yields
Applying the derivation rule (19) to L = Set [2] and Set, we get the expressions Z L ⊛ = t 2 and
Finally, the first line of the grammar yields
) clearly agrees with Otter's formula [26]:
which can be obtained either from Otter's dissimilarity equation or from the dissymmetry theorem [2]. The new result of our method is to yield an expression for xf ′ (x) -Equation (35) -that has only positive signs, as it reflects a positive decomposition grammar. This is crucial to obtain random generators without rejection in Section 5.
All the arguments we have used for free trees can be adapted to decompose and enumerate species F Ω of trees where the degrees of the vertices lie in a finite integer set Ω that contains 1. It is helpful to define the auxiliary species R Ω that consists of trees from F Ω rooted at a leaf that does not count as an atom. By decomposing trees at the root, we note that R Ω has the recursive specification
The species R Ω serves as elementary rooted species to express the pointed species arising from F Ω .
Proposition 12 (decomposing and counting degree-constrained trees). For any finite set Ω of positive integers containing 1, let F Ω be the species of nonplane trees whose vertex degrees are in Ω. Then the species F • Ω has the following cycle-pointed recursive specification, where Set Ω := ∪ k∈Ω Set [k] and Set Ω-
(38) 2 The fourth line of Equation ( 34) is not needed for enumeration, but it is necessary to make the grammar completely recursive, and, as such, will be necessary for writing down a random generator in Section 5.
The ordinary generating function f Ω (x) := F Ω (x) satisfies the equation
) where r Ω (x) is specified by r Ω (x) = x • Z SetΩ-1 (r Ω (x), r Ω (x 2 ), r Ω (x 3 ), . . .). The power series Z Set Ω-1 , Z Set Ω and Z Set ⊛ Ω appearing in the equation are polynomials that can be computed explicitly.
Example 1. Unrooted nonplane binary trees. Trees whose vertex degrees are in Ω := {1, 3} are called unrooted nonplane binary trees (note that rooting such a tree at a leaf, one obtains a rooted nonplane binary tree, i.e., each internal node has two unordered children). In that case, the elementary cycle index sums required in Equation (39) are
Let f (x) be the OGS of unrooted nonplane binary trees and r(x) the OGS of rooted nonplane binary trees (rooted at a leaf that does not count as an atom). Firstly, from the expression of ZSet Ω-1 one obtains
Then, Equation (39) yields
From this equation one can extract the counting coefficients of unrooted nonplane binary trees with respect to the number of vertices (after extracting first the coefficients of r(x)):
Hence the first counting coefficients with respect to the number of internal nodes (starting with 0 internal nodes) are 1, 1, 1, 1, 2, 2, 4, 6. Pushing further one gets 1, 1, 1, 1, 2, 2, 4, 6, 11, 18, 37, 66, 135, 265, 552, 1132, which coincides with Sequence A000672 in [32] (the number of trivalent trees with n nodes). . .
A plane tree is a tree endowed with an explicit embedding in the plane. Hence, a plane tree is a tree where the cyclic order around each vertex matters. Let E be the species of plane trees, where again the atoms are the vertices. As usual the startegy to count plane trees is to decompose E • , distinguishing whether the marked cycle has length 1 or larger than 1:
The species E ′ is decomposed with the help of another species of plane trees: denote by A the species of plane trees rooted at a leaf which does not count as an atom. Decomposing A at the root, we get
Again the species A serves as elementary rooted species to express species of pointed plane trees:
Proposition 13 (decomposing and counting plane trees). The species E • of cycle-pointed plane trees has the following cycle-pointed recursive specification.
The ordinary generating function E(x) of plane trees satisfies the equation:
where a(x) is the series of Catalan numbers: a
By coefficient extraction, one gets the following formula for the number e n of plane trees with n + 1 vertices (entry A002995 in [32]):
Proof. The grammar is obtained by arguments similar to those used to derive the grammar (34) for free trees. The only difference is that the cyclic order of the neighbors around each vertex matters, so a Set construction in the grammar for free trees typically has to be replaced by a Cyc construction in the grammar for plane trees.
All the arguments apply similarly for species of plane trees where the degrees of vertices are constrained. As a counterpart to Proposition 11, we obtain: Proposition 14 (decomposing and counting degree-constrained plane trees). For any finite set Ω of positive integers containing 1, let E Ω be the species of free trees where the degrees of vertices are constrained to lie in Ω. Then the cycle-pointed species E • Ω is decomposable, it satisfies the following decomposition grammar, where Cyc Ω := ∪ k∈Ω Cyc [k] and Seq Ω-
The ordinary generating function e Ω (x) := E Ω (x) satisfies the equation 3 :
where a Ω (x) is specified by a Ω (x) = x k∈Ω a Ω (x) k-1 .
Example 2. d-regular plane trees. For d ≥ 3, a d-regular plane tree is a plane tree such that each internal node has degree d, which corresponds to the case Ω = {1, d} in Proposition 14. It is easily shown that such a tree with n internal nodes has m = n(d -2) + 2 leaves. Let E [d] be the species of d-regular plane trees, where the atoms are the leaves (it proves here more convenient to take leaves as atoms and to write the counting coefficients according to the number of internal vertices). Let
′ be the corresponding derived species, which satisfies A
Hence, the OGS e
The coefficients of each of the summand series (such as a
) have a closed formula, which can be found for instance using the univariate Lagrange inversion formula. From these formulas, we obtain the following expression for the number e n,[d] of d-regular plane trees with n internal nodes:
3 To obtain this equation we use the formula
where
One can extend this formula to any degree distribution on vertices, by adding variables marking the degree of each vertex and applying the multivariate Lagrange inversion formula. A general enumeration formula is given in [6] 4.2. Graphs. We extend here the decomposition principles which we have developed for trees to the more general case of a species of connected graphs, by taking advantage of a well-known "tree-like" decomposition of a connected graph into 2-connected components. (A 2-connected graph is a graph that has at least two vertices and has no separating vertex.) Given a connected graph G, a maximal 2-connected subgraph of G is called a block of G. The set of vertices of G is denoted V(G) and its set of blocks is denoted B(G). The Bv-tree of G is the bicolored graph with vertex-set V(G) ∪ B(G) and edges corresponding to the adjacencies between the blocks and the vertices of G, see Figure 6. It can be shown that the Bv-tree of G is indeed a tree, see [16, p.10] and [24] for details.
Proposition 15. Let G be a species of connected graphs that satisfy the following stability property: "a connected graph is in G iff all its blocks are in G". Let B be the subspecies of graphs in G that are 2-connected. Then G admits a decomposition grammar from the species of 2-connected structures B ′ , B ⊛ , and
Hence, if the species of 2-connected structures B ⊛ and B ′ are decomposable (the latter implies that (B ′ ) • is decomposable), then the cycle-pointed species G • is decomposable as well. More generally, if Z B ′ and Z B ⊛ are both solutions of an equation system involving the operations {+, ⋆, •, ⊚} and basic cycle-index sums, then Z G • is also a solution of such an equation system.
Proof. The first line of the grammar is obtained as usual by distinguishing whether the marked cycle has length 1 or greater than 1. The second line easily follows from the block decomposition, as shown for instance in [14]. To wit, the marked vertex of a graph in G ′ is incident to a collection of blocks, and a connected graph is possibly attached at each non-marked vertex of these blocks.
Let us prove the third line in a similar way as for free trees (Claim 10). Consider a graph G in G ′ , and let T be the Bv-tree of G. Clearly the Bv-tree of a graph has less structure than the graph itself, so any automorphism of G induces an automorphism on T . In particular T is a symmetric cycle-pointed tree, hence it has a center of symmetry that either corresponds to a block or to a vertex of G. The species of graphs in G ⊛ whose center of symmetry in the Bv-tree is a vertex (a block) is denoted (G ⊛ ) v ((G ⊛ ) B , resp.). Let G v (G B ) be the species of graphs in G with a marked (block, resp.) that is colored blue. Then clearly
Hence, following the notations introduced in the grammar,
) such that the center of symmetry of the associated Bv-tree is the blue vertex (block, resp.). It is easy to check, in a similar way as for free trees, that this property holds only for the graphs of G v ⊛ that are in X • Set ⊛ ⊚ K and only for the graphs of G B ⊛ that are in B ⊛ ⊚ H. Finally, the 4th line, which is necessary to have only species of 2-connected structures as terminal species, is obtained from the second line by applying the derivation rules (Proposition 6).
Remark 4.4. Trees are exactly connected graphs where each block is an edge. In other words, the species F of free trees is the species G of connected graphs formed from the species B = L (the one-element species that consists of the link graph). One easily checks that, in that case, the grammar (48) for G is equivalent to the grammar (34) for free trees. 4.2.1. Cacti graphs. Cacti graphs form an important class of graphs that have several algorithmic applications. They consist of cycles attached together in a tree-like fashion; in other words, the species of cacti graphs arises from the species of 2-connected structures as B = L + P where L is the species of the link graph and P is the speices of polygons with at least 3 edges (i.e., B is the species of polygons where one allows the degenerated 2-sided polygon).
Thanks to the grammar (48), the unlabeled enumeration of connected cacti graphs reduces to the calculation of the cycle-index sums for the species of 2-connected structures B ′ and B ⊛ (the cycle-index sum Z (B ′ ) • is also required, but it can directly be deduced from Z B ′ by differentiation). Since the 2-connected cacti graphs are polygons, the possible automorphisms are from the dihedral group. In addition, the presence of a marked vertex (for B ′ ) or cycle (for B ⊛ ) restricts the symmetries. For instance, if a structure in B ′ has a marked (unlabeled) vertex v, then the automorphisms have to fix v; there are only two such symmetries for each polygon, the identity and the unique reflection whose axis passes by v. Accordingly, we have two terms in the expression of Z B ′ below, the first one for the identity, and the second one for reflections (where one distinguishes whether the polygon has odd or even length).
For B ⊛ , all symmetries must be nontrivial and have to respect the marked cycle. These symmetries are of two types: rotation and reflection, which yields the two main terms in the expression of Z B ⊛ below.
The expressions for Z B ′ and Z B ⊛ can be used to enumerate unlabeled cacti graphs. We just have to translate (using the computation rules in Remark 2.1 and Remark 3.3) the grammar (48) -applied to the species of cacti graphs -into an equation system satisfied by the corresponding ordinary generating functions.
n c n x n of unlabeled cacti graphs counted with respect to the number of vertices satisfies
from which one can extract4 the counting coefficients c n (after firstly extracting the coefficients of H(x)):
Outerplanar graphs are graphs that can be drawn in the plane so that all vertices are incident to the outer face. They form a fundamental subspecies of the species of planar graphs, which already captures some difficulties of the species of all planar graphs; for example, the convergence rate of sampling procedures using the Markov Chain approach is not known. However, outerplanar graphs are easier to tackle with the decomposition approach. For enumeration, we use the well-known property that 2-connected outerplanar graphs, except for the one-edge graph, have a unique hamiltonian cycle. Hence, the species B of 2-connected outerplanar graphs can be identified with the species of dissections of a polygon (allowing a degenerated 2sided dissection). This time, to obtain the cycle index sums Z B ′ and Z B ⊛ , we have to count not polygons (as for cacti graphs) but dissections of a polygon under the action of the dihedral group. We only sketch the method here (the principles for counting such dissections are well known, going back to earlier articles of Read [30], see also [3] for more detailed calculations).
For each type of symmetry (rotation or reflection), one considers the "quotient dissection", as shown in Figure 7. Notice that a dissection fixed by a rotation has either a central edge (only for the rotation of order two) or a central face. In case of a central edge e, it turns out to be more convenient to "double" e, so as to always have a central face before taking the quotient. Thus, the quotient dissection has a marked face (the quotient of the central face) that might have degree one (only for rotations of order at least three) or two (only for rotations of order at least two). Concerning quotient dissections under a reflection, there are two special vertices v 1 and v 2 on the boundary (the intersections of the original polygon with the reflection-axis), and there might be some other special vertices, all of degree three, on the boundary path from v 1 to v 2 ; see Figure 7.
The second ingredient is to take the dual of such quotient dissections in order to obtain plane trees, which are easier to decompose and to count. Notice that if the rotation is the identity rotation, then the associated plane tree is in the species F of plane trees with no vertex of degree two. Notice also that each leaf of the tree corresponds to a vertex of the dissection; see Figure 8 for an example. The generating function of F with respect to the number of leaves satisfies
To calculate Z B ′ and Z B ⊛ , one computes separately the contributions of rotations and reflections to Z B ′ and to Z B ⊛ . In each case, using duality, the contribution is easily expressed in terms of the series F (x). All calculations done, one finds:
where
Similarly as for cacti graphs, the expressions for Z B ′ and Z B ⊛ make it possible to enumerate unlabeled connected outerplanar graphs. Translating the grammar (48) into an equation system on the corresponding generating functions, we obtain the following.
Proposition 17 (Enumeration of unlabeled connected outerplanar graphs). The ordinary generating function o(x) = n o n x n of unlabeled connected outerplanar graphs counted with respect to the number of vertices satisfies the system:
where the series F , G, P , Q, R, S are defined above. One extracts from this system (extracting firstly the coefficients in F , Q, R, S, then in H and K, then in I) the counting coefficients o n :
A map is a planar graph embedded on a sphere up to isotopic deformation, i.e., it is a planar graph together with a cyclic order of the neighbors around each vertex. There is a huge literature on maps since the pioneering work of Tutte [33]. As we show next, the decomposition grammar (48) for maps is actually simpler than for graphs, and it allows us to enumerate (unrooted unlabeled) 2-connected maps in terms of not necessarily connected maps.
To write down the grammar, it turns out to be more convenient to take half-edges as atoms instead of vertices. Denote by M the species of maps -so R := Z • M ′ is the species of rooted maps (maps with a marked half-edge) -and by B the species of 2-connected maps (the loop-map is considered as 2-connected).
Proposition 18. The species of rooted maps and symmetric cycle-pointed maps (in each length ℓ ≥ 2 of the marked cycle) have the following recursive specification over the corresponding species of rooted and cycle-pointed 2-connected maps.
Proof. The arguments are similar to the proof for graphs (Proposition 15). The only difference is that one takes the embedding into account, hence corners (which are in one-to-one correspondence with half-edges for a given map) play for maps a similar role as vertices do for graphs, and a Set construction typically becomes a Cyc construction here. Let us comment here on the decomposition for symmetric cycle-pointed maps (the one for rooted maps is well known, see [33]).
One has
where the first (second) term takes account of the maps whose center of symmetry -for the associated block-decomposition tree -is a block (vertex, respectively). Further simplification is possible, since a rooted map has only the identity as automorphism. Hence, the species of rooted maps H and K satisfy
. Thus, Equation (51) can be "sliced" into a collection of equations, one for each length ℓ ≥ 2 of the marked cycle.
An important property of any map automorphism -as shown by Liskovets [22] -is that all its cycles have the same length ℓ, which is also the order of the automorphism. Hence, the number of half-edges of a cycle-pointed map with a marked cycle of length ℓ is divisible by ℓ. For ℓ ≥ 1, denote by M (ℓ) (x) (B (ℓ) (y)) the series counting unlabeled cycle-pointed maps (2connected maps, respectively), according to the number of half-edges, divided by ℓ. In particular, R(x) := M (1) (x) and S(x) := B (1) (x) are the series counting rooted maps and rooted 2-connected maps, respectively. We clearly have
Given this simplification, the grammar (50) is translated into the following system relating the series counting species of maps and species of 2-connected maps:
In the case of maps, the decomposition grammar is used in the other direction, i.e., one obtains the enumeration of (unrooted) 2-connected maps from maps. Indeed, unconstrained maps are easier to count, by a method of quotient [22] similar to the one we have used for counting dissections in Section 4.2.2.
Let us first review (from Tutte [33]) how one obtains an expression for the series S(y) counting rooted 2-connected maps from an expression for the series R(x) counting rooted maps. One starts from the following expression of R(x):
Next, notice that the change of variable y = H(x) = x(1 + R(x)) between rooted maps and rooted 2-connected maps is such that y 2 is also rational in β:
Equivalently:
, where η := β 1 -3β , so the dependence between η and β is invertible: β = η/(1 + 3η). Notice also from (52) that
Replacing β by η/(1 + 3η), one gets
which can also be written as S(y) = S(y 2 ), with S(y) = η(2 -3η), and η := η(y) specified by η = y (1η) 2 .
(55)
In a similar way, if two series f (x) = g(y) are related by the change of variables y = H(x) and if f (x) is rational in β(x), then g(y) is rational in η(y) (replacing β by η/(1 + 3η)). For instance, for ℓ ≥ 3, it has been shown by Liskovets using the quotient method (see [12] for the reformulation on series) that
.
Since H(x)/x = 1 + R(x) and K(x) = R(x)/(1 + R(x)) are rational in β, as well as xH ′ (x) and
, one finds from (53) a rational expression in β for the series B (ℓ) (H(x)). Replacing β by η/(1 + 3η) in that expression, one finally gets:
which can also be written as
In a similar way, starting from the expression (given in [12])
, one obtains the following expression for B (2) (y):
which can also be written as
Proposition 19 (counting unrooted 2-connected maps, recover [23]). The number t n of (unrooted unlabeled) 2-connected maps with n edges satisfies:
where
Proof. Cycle-pointing ensures that the generating function
Extracting the coefficient [y 2n ] in this equation yields
where
if n is even and y (n-1)/2 Q(y) if n is odd. Notice that the series S(y), G(y), P (y), and Q(y) are rational in the simple series η(y) = y/(1η(y)) 2 . Hence the Lagrange inversion formula [2, Section 3.1] allows us to extract exact formulas for the coefficients s n , v n , and u n . Substituting these exact expressions in (59), one obtains the announced formula for t n .
The enumeration of unrooted 2-connected maps has first been done by Liskovets and Walsh [23] using the quotient method in a quite involved way. More recently, the counting formula has been recovered in [12] using a method of extraction at a center of symmetry on quadrangulations. What we do here is equivalent to [12], but the cycle-pointed framework allows us to write the equations on functions in a more systematic way. 4.4. Asymptotic enumeration. Cycle-pointing makes it possible to easily obtain an asymptotic estimate for the coefficients counting the number of unlabeled structures from a species, provided that the singular behavior of the OGS counting the associated species of rooted unlabeled structures is known.
We illustrate the method on free trees. Let R(x) be the OGS of rooted unlabeled nonplane trees, which is specified by R(x) = x exp( i≥1 R(x i )/i). It is well known that R(x) has a dominant singularity ρ < 1 of the square-root type [10,VII.5]. That is, in the slit complex neighborhood
∈ R + and |x -ρ| < ǫ} we have the expansion
which yields -using transfer theorems of analytic combinatorics [10, VI] -the asymptotic estimate
for the number of rooted unlabeled nonplane trees with n vertices.
To obtain a similar estimate for free trees, we consider the OGS P (x) of cycle-pointed nonplane trees and start from the expression of P (x) obtained in Proposition 11:
Notice that, since ρ < 1, the series A := x 2 R ′ (x 2 ) and B := 1 + ℓ≥2 x ℓ R ′ (x ℓ ) are analytic at x = ρ, and the value at x = ρ of B is the positive constant
Therefore, from the singular expansion (60) of R(x), we obtain
Let us simplify further the positive constant b = B(ρ). First, by deriving the equation that specifies R(x), one obtains
By deriving the singular expansion of R(x), one obtains
Proposition 20 (asymptotic enumeration of free trees). The number F n of unlabeled free trees with n vertices satisfies
where c is the constant and ρ -1 is the growth ratio in the estimate (61) for rooted nonplane trees (R n ∼ c n -3/2 ρ -n ).
Proof. From the singular expansion (63) of P (x) we obtain (again by the transfer theorems of singularity analysis)
Using c = a/(2 √ π) and b = a 2 /2, we have ab/(2 √ π) = 2πc 3 . Finally, Theorem 5 (unbiased pointing) yields
It is also possible to get the estimate of F n from Otter's dissimilarity equation (or from the dissymmetry theorem). However, we find that cycle-pointing provides a more transparent explanation why the asymptotic estimate of the coefficients F n counting unlabeled structures from an unrooted "tree-like" species F is of the universal type cn -5/2 ρ -n . The argument is very simple:
• The OGS P (x) of the cycle-pointed species F • is positively expressed in terms of the OGS of the rooted species, which has a square-root dominant singularity. Therefore, P (x) inherits the same singularity and singularity type (square-root). • Transfer theorems of singularity analysis ensure that a square-root dominant singularity yields an asymptotic estimate in cn -3/2 ρ -n for the coefficients [x n ]P (x). Since F n = 1 n [x n ]P (x), one gets F n ∼ c n -5/2 ρ -n . This strategy applies to all species of trees encountered in this section, as well as to cacti graphs and connected outerplanar graphs (in all cases one starts from the singular expansion of the OGS counting the corresponding rooted species).
Recently, so-called Boltzmann samplers have been introduced by Duchon et al [9] as a general method to efficiently (typically in linear time) generate uniformly at random combinatorial structures that admit a decomposition. In contrast to the more costly recursive method of sampling [25], which is based on counting coefficients of the recursive decomposition, Boltzmann samplers are primarily based on generating functions. Until now Boltzmann samplers were developed in the labeled setting [9] and partially in the unlabeled setting [13].
In this section we provide a more complete method in the unlabeled setting. In order to deal with the substitution construction and the cycle-pointing operator (which are not covered in [13]), we have to describe samplers not solely based on ordinary generating functions, but on cycle index sums -also known as Pólya operators. Therefore we call these random generators Pólya-Boltzmann samplers.
With these refined samplers we are able to design in a systematic way (via specific generation rules) a Pólya-Boltzmann sampler for species that admit a recursive decomposition, thereby allowing in the decomposition all operators that have been described in this article. When specialized suitably, a Pólya-Boltzmann sampler reduces to an ordinary Boltzmann sampler, hence it provides a uniform random sampler for species of unlabeled structures. In particular, we obtain highly efficient random generators for the species in Section 4: for trees, cacti graphs, outerplanar graphs, etc. 5.1. Ordinary Boltzmann Samplers. Let A be a species of structures, and let A(x) be the ordinary generating series for A. A real number x > 0 is said to be admissible iff the sum defining A(x) converges (x within the disk of convergence of the series). Given a fixed admissible value x > 0, an ordinary Boltzmann sampler for unlabeled structures from A is a random generator Γ A(x) that draws each structure γ ∈ A with probability
Notice that this distribution has the fundamental property to be uniform, i.e., any two unlabeled structures of the species with the same size have the same probability.
5.1.1. Automatic rules to design Boltzmann samplers. As described in [9], there are simple rules to assemble Boltzmann samplers for the two classical constructions Sum and Product (Bern(p) stands for a Bernoulli law, returning "true" with probability p and "false" with probability 1p).
These rules can be used recursively. For instance, the species T of rooted binary trees satisfies
which translates to the following Boltzmann sampler:
Γ T(x): if Bern x T (x) return leaf else return (Γ T(x), node, Γ T(x)).
5.1.2. The complexity model. Typically, when F has a recursive specification over the species A 1 , . . . , A l , then our sampling procedure Γ F(x) will require that we can evaluate the ordinary generating functions for A 1 , . . . , A l at real values x. Indeed, for a species F defined as A 1 + A 2 , each Bernoulli choice requires to draw a uniform value in [0, 1] and compare it with a ratio of the form A 1 (x)/ F(x). In the following we work with the complexity model where we assume that there exists an oracle that provides at unit cost the exact values of these generating functions at x, and that a random number in [0, 1] can be generated and compared with a fixed value such as A(x)/ C(x) in constant time as well. We will refer to this complexity model as the real-arithmetic complexity model in the following.
The model is justified since in many applications we obtain expressions for the ordinary generating series that allow a rapid numeric evaluation of those series at given values, for example with the Newton method. Then, in practice, one works at a fixed precision, say N bits (typically N = 64, correspondingly roughly to 20 decimal digits).
Let us mention that the Boltzmann samplers for the constructions Multiset and Cycle-as given in [13] and recovered in a more general framework here-require typically the values of the generating functions not only at x, but at all powers x i . Since combinatorial species often have exponential growth rate (which is the case for all examples presented here), the dominant singularity ρ satisfies ρ < 1, hence x ≤ ρ < 1. Therefore the values A(x i ) decrease exponentially fast with i. When working at fixed precision of N bits, one can thus discard the powers greater than k = N/ log 2 (1/ρ) and assume that the oracle provides the evaluations of the generating functions at x, x 2 , . . . , x k . For a more detailed study and implementation of the evaluation procedures we refer to the recent article by Pivoteau, Salvy, and Soria [27].
Proposition 21 (Duchon et al. [9]). Let F be a species that can be decomposed recursively from {1, X} in terms of the constructions {+, •} (this is meant analogous to but more restricted than Definition 2.7). Then one can obtain in a systematic way (from the recursive specification) an ordinary Boltzmann sampler Γ F(x) for F. In addition, in the real-arithmetic complexity model, Γ F(x) operates in linear time in the size of the output.
This result was recently extended in [13] to other constructions, such as the Multiset and Cycle constructions (and their counterpart with fixed number of components). In this section we extend this result to the substitution construction, and to the cycle-pointed constructions. It follows that any species that is cycle-pointed decomposable over species where we already have a Pólya-Boltzman sampler also has a Pólya-Boltzman sampler.
Remark 5.1. Note that in the general results on sampling we consider species up to isomorphism. Theoretically, this is a necessary assumption, since isomorphisms between species might in artificial examples be non-effective. However, in all the presented examples and applications in this article, the isomorphisms between species are straightforward and efficiently computable. The effectiveness of isomorphisms between the output species of the sampling procedures and the actual species are actually more related to the question how combinatorial structures are represented on a computer, and in particular they do not concern the complexity of the sampling task itself. 5.1.3. Targeting Boltzmann samplers. Boltzmann samplers often lead to very efficient exact-size and approximate-size random samplers. In order to draw unlabeled structures uniformly at random from a species A at (in case of exact-size sampling) or around (in case of approximate-size sampling) a target-size n, one simply repeats calling the Boltzmann sampler Γ A(x) -with a suitably chosen value of x -until the size of the output is n (exact-size sampling) or is in [n(1ǫ), n(1 + ǫ)] (approximate-size sampling), where ǫ is a tolerance-parameter fixed by the user. It turns out that for a wide class of species, which covers the species encountered in Section 4 (trees, cacti graphs, outerplanar graphs), this method works very well, as proved in [9].
Proposition 22 (Duchon et al [9]). Let F be a species such that asymptotically
for some positive constant c, and where ρ is the radius of convergence of F(x), assuming that F(ρ) is convergent 5 . Also suppose that there is a Boltzmann sampler Γ F(ρ) at x = ρ such that the cost of generating a structure is linearly bounded by the size of the structure all along the generation process. Then Γ F(ρ) yields an exact-size (approximate-size, resp.) sampler for unlabeled structures from F with expected complexity O(n 2 ) (O(n/ǫ), resp.), where n is a target-size and ǫ is a toleranceratio.
The exact-size and approximate-size samplers are obtained by running Γ F(ρ) until the size of the output is in the target domain Ω n (that is, Ω n = {n} for exact-size sampling, and Ω n = [n(1ǫ), n(1 + ǫ)] for approximate-size sampling). To obtain the stated complexity, it is necessary that the generation of too large structures is aborted as soon as the size of the generated object gets larger than Max(Ω n ). 5.2. Pólya-Boltzmann Samplers for classical species. Let A be a species. Recall that a symmetry on a species A is a pair (A, σ) where A ∈ A and σ is an automorphism of A. A symmetry has a weight-monomial w (A,σ) , as defined in (3); and the cycle index sum Z A (s 1 , s 2 , . . .) is the sum of the weight-monomials over all the symmetries on A. Similarly as for the onevariable case, a vector (s i ) i≥1 of nonnegative real values is said to be admissible if the sum defining Z A (s 1 , s 2 , . . .) converges. Given an admissible vector (s i ) i≥1 , a Pólya-Boltzmann sampler is a procedure ΓZ A (s 1 , s 2 , . . .) that randomly samples symmetries on A such that each symmetry (A, σ) is drawn with probability
where the weight-monomial w (A,σ) is evaluated at (s 1 , s 2 , . . .). This probability distribution is called the Pólya-Boltzmann distribution for A at (s i ) i≥1 . The following simple lemma ensures that Pólya-Boltzmann samplers are a refinement of ordinary Boltzmann samplers, in the same way as cycle index sums are a refinement of ordinary generating functions.
Lemma 23 (Pólya-Boltzmann samplers extend ordinary Boltzmann samplers). Consider a species A having a Pólya-Boltzmann sampler ΓZ A (s i ) i≥1 . Then, for any value x admissible for A(x), the sampler ΓZ A (x, x 2 , x 3 , . . .) is an ordinary Boltzmann sampler for unlabeled structures from A at x.
Proof. The generator ΓZ A (x, x 2 , . . .) gives weight x n /(n!Z A (x, x 2 , . . .)) to each symmetry of size n. Since Z A (x, x 2 , . . .) = A(x) by Lemma 1, this weight simplifies to x n /(n! A(x)). In addition, we have seen in Lemma 1 that each unlabeled structure γ ∈ Ãn gives rise to n! symmetries. Hence, each unlabeled structure of size n has weight x n / A(x) when calling ΓZ A (x, x 2 , . . .), i.e., ΓZ A (x, x 2 , . . .) is an ordinary Boltzmann sampler for unlabeled structures from A.
In the next two sections, we describe Pólya-Boltzmann samplers for unlabeled structures from basic species and for the constructions {+, •, •}. Note that the output of a Pólya-Boltzmann sampler for a species A consists of a species from A[n] together with an automorphism on that structure. In all the random generators to be described (as well as in the procedures for cycle-pointed species), the resulting structure S is made well-labeled by applying a procedure DistributeLabels that substitutes [1, . . . , |S|] for the atoms of S uniformly at random (i.e., according to a permutation of size |S| taken uniformly at random).
We also assume to have generators for classical distributions:
• Geom(p) returns an integer under the geometric law of parameter p ∈ [0, 1]: Pr(k) = p k-1 (1p); • Pois(λ) returns an integer under the Poisson law of parameter λ: Pr(k) = e -λ λ k /k!; and • Loga(λ) returns an integer under the distribution Pr(k) = (log(1/(1λ))) -1 λ k /k; this distribution we call Loga law of parameter λ. The generators for those distributions (more generally, any generator for an explicit distribution on integers) can be easily obtained from the "inversion method" [8, §2.1] and [21, §4.1]).
Pólya-Boltzmann samplers for basic species. At first, let us describe Pólya-Boltzmann samplers for the basic species Seq, Set, Cyc, and their counterparts Seq [k] , Set [k] , Cyc [k] . In case, the design of the sampler is guided by the expressions of the cycle index sums for the basic species as given in Figure 1.
Proposition 24. The random generators shown in Figure 9 are Pólya-Boltzmann samplers for the corresponding species.
Proof. For Seq, the proof is easy. Since Z Set = k≥1 s k 1 , the probability of a sequence to have size k must be s k 1 /Z Set , i.e., the size distribution is a geometric law of parameter s 1 . For Set, observe that the sum of weight-monomials over all symmetries of type (n
Therefore, a Pólya-Boltzmann sampler has to draw a collection of cycles such that the number n i of cycles of length i follows a Poisson law of parameter s i /i for i ≥ 1, and the n i 's are independent. This is precisely what the algorithm ΓZ Set in Figure 9 does, upon choosing a priori the size of the largest cycle to be drawn. For Cyc, the argument is similar. As we have seen in Section 2.6.1, the sum of the weightmonomials over all the symmetries of order r is
Therefore the order of the automorphism has to be chosen with probability Z (r)
Cyc /Z Cyc . In addition, for each fixed order r, the probability of the cycle being r × k has to be s k r /k/ log(1/(1s r )), i.e., the size (divided by r) has to follow a Loga law of parameter s r . Finally, for all automorphisms of size r and size r × k, all possible 'rotation angles' (there are φ(r) possibilities) have to be equiprobable. This is exactly what the generator ΓZ Cyc does.
The proof that the generators for the species Seq [k] , Set [k] , and Cyc [k] are Polya-Boltzmann samplers follows similar arguments upon restricting to structures of size k.
Pólya-Boltzmann samplers for combinatorial constructions. As shown in Figure 10, Pólya-Boltzmann samplers make it possible to have a simple sampling rule for each of the standard constructions; that is, sampling rules not only for sum and product, but also for the substitution construction.
Proposition 25. Let C = A ∧ B, with ∧ ∈ {+, •, •}. When there are Pólya-Boltzmann samplers for A and B, then there is also a Pólya-Boltzmann sampler ΓZ C (s 1 , s 2 , . . .) for C that can be constructed from the samples for A and B, as given in Figure 10.
Proof. For C = A + B, the proof is easy. Note that Sym(A + B) = Sym(A) + Sym(B). As shown in [9] (for the standard weight x n /n!), a disjoint union yields a Bernoulli switch on Boltzmann samplers, with probability corresponding to the ratio of the series for A divided by the series for C (this argument works as well here, where we take the refined weight s n1 1 s n2 2 . . . s n k k /n!). Therefore the probability of the Bernoulli switch has to be Z A /Z C .
(1) Sequence.
Algorithm ΓZSeq(s1, s2, . . .), with s1 < 1:
k ← Geom(s1); return a sequence of k atoms (endowed with the identity-automorphism).
(2) Set. Define the probability distribution relative to (si) i≥1 :
Let Max Index(s1, s2, . . .) be a generator for this distribution.
Algorithm ΓZSet(s1, s2, . . .) : J ← Max Index(s1, s2, . . .);
{Poisson conditioned to output a strictly positive integer} return a collection of cycles of atoms where there are kj cycles in each length j > 0.
(3) Cycle. Given (si) ≥1 such that ZCyc(s1, s2, . . .) converges, consider the probability distribution
Let ReplicOrder(s1, s2, . . .) be a generator of this distribution.
Algorithm ΓZCyc(s1, s2, . . .) r ←-ReplicOrder(s1, s2, . . .); j ←-Loga (sr); Draw an integer b ∈ [1..r -1] that is relatively prime to r uniformly at random ; return the cycle of length j × r endowed with the automorphism:
"each atom is mapped to the atom that is j × b units further on the cycle".
(1') Sequence of size k.
Algorithm ΓZ Seq [k] (s1, s2, . . .) : return a sequence of k atoms (endowed with the identity-automorphism).
(
return a collection of n1 cycles of length 1, n2 cycles of length 2, . . ., n k cycles of length k. "each atom is mapped to the atom that is kb/r units further on the cycle".
Figure 9. Pólya-Boltzmann samplers for basic species. In all these random samplers, the output structure is made well-labeled using the procedure Dis-tributeLabels.
. The rules to specify a Pólya-Boltzmann sampler for a species that has a recursive specification. In all these random samplers, the finally returned structure is made well-labeled using the procedure DistributeLabels.
For product, C = A • B, Sym(C) is like a partitional product of Sym(A) and Sym(B). Therefore, a Boltzmann sampler classically consists of two independent calls to Boltzmann samplers, as shown in [9] (again, for the standard weight x n /n!). All the arguments work the same way for the refined weight s n1 1 s n2 2 . . . s n k k /n!. Therefore, one has to call independently a Pólya-Boltzmann sampler for A and a Pólya-Boltzmann sampler for B.
For substitution, C = A • B, recall (from Section 2.6.2, Equation ( 13)) that for each partition sequence π, the sum of the weight-monomials over all the symmetries on C of type π = (n 1 , . . . , n n ) satisfies the expression
Hence, a Pólya-Boltzmann sampler for C must draw the core structure following the Pólya-Boltzmann distribution for A with parameters (b 1 , b 2 , . . .). In addition, as discussed in Section 2.6.2, once the type π of the core symmetry is fixed, the structures substituted at the cycles of the core-automorphism form a partitional product of the form
Recall that a partitional product yields independent Boltzmann samplers. Hence, once the coreautomorphism (A, σ A ) is drawn, the symmetries in B that are substituted at each cycle of σ A must be independent calls of a Pólya-Boltzmann sampler for B, and the parameters of the sampler must closed under taking 2-connected components -is decomposable in terms of the subspecies B ′ of rooted 2-connected graphs:
). As we have explained in Sections 4.2.1 and 4.2.2, there is a decomposition strategy for rooted 2-connected cacti graphs (rooted polygons) and rooted 2-connected outerplanar graphs (rooted dissections of a polygon). The linear Pólya-Boltzmann sampler ΓZ B ′ yields in turn a linear Pólya-Boltzmann sampler for G ′ (using the specification of G ′ in terms of B ′ stated above). Hence, each of the rooted species stated above has a linear Pólya-Boltzmann sampler, which becomes a linear ordinary Boltzmann sampler when specialized to s i = x i . Moreover, all these rooted species obey the universal asymptotic form c ρ -n n -3/2 , as shown in [26] for trees, in [31] for cacti graphs, and in [3] for outerplanar graphs. Hence, by Proposition 22, a Boltzmann sampler run at the dominant singularity yields an exact-size (approximate-size, respectively) sampler with expected complexity O(n 2 ) (O(n/ǫ), respectively). 5.3. Pólya-Boltzmann samplers for cycle-pointed species.
Definition. Given a cycle-pointed species P, a vector (s i , t i ) i≥1 of nonnegative real values is said to be admissible if the sum of weight-monomials defining Z P converges when evaluated at this vector. Given a fixed admissible vector (s i , t i ) i≥1 , a Pólya-Boltzmann sampler is a procedure ΓZ P (s i , t i ) i≥1 that generates a rooted c-symmetry on P at random such that each rooted csymmetry (P, σ, v) of R(P) is drawn with probability
with w (P,σ,v) as defined in (15). This probability distribution is called the Pólya-Boltzmann distribution for P at (s i , t i ) i≥1 . Similarly as for classical species, the procedure of calling ΓZ P (x i , x i ) i≥1 (where x is admissible for the ordinary generating function P(x)) and then returning the underlying unlabeled structure yields an ordinary Boltzmann sampler Γ P(x). The following sampling rules make it possible to systematically assemble Pólya-Boltzmann samplers for cycle-pointed species. Proof. The arguments are very similar to the ones in the proof of Proposition 24. Observe that Seq • = (X • ⋆ Seq) ⋆ Seq. The marked atom (the atom bearing the marked cycle, which has length 1 here) must be preceded by a sequence of k 1 atoms and followed by a sequence of k 2 atoms such that k 1 and k 2 follow independently a geometric law of parameter s 1 . Next, we have Set • (ℓ) = C (ℓ) ⋆ Set where C (ℓ) is the cycle-pointed species of cycles of ℓ atoms (the cycle being marked), which explains the samplers ΓZ Set • and ΓZ Set ⊛ in Figure 11.
A cycle-pointed structure in Cyc • ℓ consists of ℓ isomorphic copies (attached cyclically in a chain) of an object in X • ⋆ Seq. Additionally, one needs to specify the shift of the automorphism; if the cycle has length nℓ, the possible shifts are n • i where i ∈ [1..ℓ] is relatively prime to ℓ, hence there are φ(ℓ) possibilities for the shift.
The proofs for the samplers with k components follow similar arguments. Proposition 29. Let R be a species with a recursive specification over other species having a Pólya-Boltzmann sampler. Then the random sampler ΓZ R (s 1 , t 1 ; s 2 , t 2 ; . . .), as given in Figure 12, is a Pólya-Boltzmann sampler for R.
Proof. The arguments are very similar to the ones in the proof of Proposition 25. For the cyclepointed sum, R = P + Q, we have RSym(R) = RSym(P) + RSym(Q). Therefore the Pólya-Boltzmann sampler has to be a Bernoulli switch with probability Z P /Z R followed by a call to the Pólya-Boltzmann sampler of either P or Q (depending on the Bernoulli ouput to be "true" or
(1) Cycle-pointed sequence.
Algorithm ΓZ Seq • (s 1 , s 2 , . . .) : k 1 ← Geom(s 1 ); k 2 ← Geom(s 1 ); {indep. calls} return a sequence of k 1 + k 2 + 1 atoms (endowed with the identity-automorphism)
where the atom at position k 1 + 1 is marked.
(2) Cycle-pointed (symmetric cycle-pointed, resp.) set.
Given (s i , t i ) i≥1 such that i≥1 t i converges, define the distribution: (3) Cycle-pointed (symmetric cycle-pointed, resp.) cycle.
Given (s i , t i ) ≥1 such that Z := Z Cyc • (s 1 , t 1 ; s 2 , t 2 ; . . .) (Z := Z Cyc ⊛ (s 1 , t 1 ; s 2 , t 2 ; . . .), resp.) converges, consider the probability distribution Pr(R = r) = 1 Z ϕ(r) tr 1 -sr for r ≥ 1 (r ≥ 2, resp.).
Let ReplicOrder(s 1 , t 1 ; s 2 , t 2 ; . . .) be a generator of this distribution.
Algorithm ΓZ Cyc • (s 1 , t 1 ; s 2 , t 2 ; . . .) ( ΓZ Cyc ⊛ (s 1 , t 1 ; s 2 , t 2 ; . . .), resp.) :
r ←-ReplicOrder(s 1 , t 1 ; s 2 , t 2 ; . . .); j ←-1 + Geom (sr); Draw an integer b ∈ [1..r -1] that is relatively prime to r uniformly at random ; return the cycle of length j × r with a marked atom, and endowed with the automorphism:
"each atom is mapped to the atom that is j × b units further on the cycle" (the marked cycle is the automorphism-cycle containing the marked atom).
(1') Cycle-pointed sequence of size k; denote E := Seq [k] .
Algorithm ΓZ E • (s 1 , t 1 ; s 2 , t 2 ; . . .) : return a sequence of k atoms (with the identity-automorphism) where one atom taken u.a.r. is marked (2') Cycle-pointed (symmetric cycle-pointed, resp.) set of size k; denote S := Set [k] .
A marked (marked symmetric, resp.) partition sequence of order k is a sequence π = (ℓ, n 1 , n 2 , . . . , n k ) such that ℓ ≥ 1 (ℓ ≥ 2, resp.) and ℓ + i in i = k (one block is marked). The corresponding coefficient [t ℓ s n 1 1 . . . s n k k ] is denoted coef π (Z S • ) (coef π (Z S ⊛ ), resp.) Algorithm ΓZ S • (s 1 , t 1 ; s 2 , t 2 ; . . .) ( ΓZ S ⊛ (s 1 , t 1 ; s 2 , t 2 ; . . .), resp.) : Draw a partition-sequence π of order k with probability: Draw an integer b ∈ [1..r -1] that is relatively prime to r uniformly at random; return the cycle of length k with a marked atom and endowed with the automorphism:
"each atom is mapped to the atom that is kb/r units further on the cycle".
Figure 11. Pólya-Boltzmann sampler for basic cycle-pointed species. In all these random samplers, the finally returned structure is made well-labeled using the procedure DistributeLabels.
symmetries on R whose core has type π = (ℓ; n 1 , n 2 , . . . , n k ) satisfies the expression
, where b i = Z B (s i , s 2i , . . .), q ℓ = Z B • (s ℓ , t ℓ ; s 2ℓ , t 2ℓ ; . . .). Hence, a Pólya-Boltzmann sampler for R must draw the core structure following the Pólya-Boltzmann distribution for P with parameters (b 1 , q 1 ; b 2 , q 2 ; . . .). In addition, once the type π of the core symmetry is fixed, the structures substituted at the cycles of the core automorphism form a partitional product of the form Recall that a partitional product yields independent Boltzmann samplers. Hence, once the core symmetry (P, σ) is drawn, the symmetries in B that are substituted at each cycle of σ A must be independent calls of a Pólya-Boltzmann sampler for B, except for the marked cycle where we have to call a Pólya-Boltzmann sampler for B • . In addition, for an unmarked (marked, resp.) cycle, the parameters of ΓZ B (of ΓZ B • , resp.) must be (s i , s 2i , . . .) ((s i , t i ; s 2i , t 2i ; . . .), resp.) if the cycle has length i, as indicated by the expression of Z (π) R given above. This is precisely what the generator ΓZ R does. 5.3.4. Pólya-Boltzmann samplers for decomposable cycle-pointed species. Similarly as for decomposable species, the random generation rules shown in Figure 11 (basic cycle-pointed species) and Figure 12 (cycle-pointed constructions) can be combined to design a Pólya-Boltzmann sampler for any species with a cycle-pointed recursive decomposition over basic species. We assume here again that an oracle provides the required evaluations of cycle-index sums and pointed cycle-index sums for the species appearing in the decomposition; and that the cost of drawing k under a specific integer distribution (such as ReplicOrder in ΓZ Cyc ) has linear cost in k.
Theorem 30. Any species P with a cycle-pointed recursive specification (Definition 3.4) over species A 1 , . . . , A l having a linear Pólya-Boltzmann sampler can be endowed with a linear Pólya-Boltzmann sampler ΓZ P (s 1 , t 1 ; s 2 , t 2 ; . . .).
Proof. Analogous to the proof of Theorem 26.
Consequently, the unrooted species we have encountered in Section 4 can be endowed with efficient random samplers.
Proposition 31. In the real-arithmetic complexity model (oracle assumption), the following unlabeled species admit an exact-size sampler and an approximate-size sampler of expected complexities O(n 2 ) and O(n/ǫ) (n being the target-size, ǫ the tolerance-ratio): unrooted nonplane (plane, resp.) trees, unrooted nonplane (plane, resp.) trees whose node degrees are constrained to lie in a finite integer set Ω, unrooted cacti graphs, unrooted connected outerplanar graphs.
Proof. The crucial point is that cycle-pointing is unbiased, hence finding an exact-size (approximate-size, resp.) sampler for a species A is equivalent to finding one for the cycle-pointed species A • .
For each of the unrooted tree species listed above, we have shown in Section 4 that the corresponding cycle-pointed species is decomposable. If G is a species of connected graphs (closed under taking 2-connected components), the grammar (48) given in Proposition 15 ensures that the cycle-pointed species G • is decomposed in terms of the 2-connected graph species B • , B ′ , and (B ′ ) • . For cacti graphs and outerplanar graphs, there is a decomposition strategy for the 2connected subspecies (polygons for cacti graphs, dissections of a polygon for outerplanar graphs), which easily yields linear Pólya-Boltzmann samplers for the species B • , B ′ , and (B ′ ) • . Since G • is specified over these three species, there is also a linear Pólya-Boltzmann sampler for G • .
Hence, for each unrooted species A stated above, there is a linear Pólya-Boltzmann sampler for A • , which becomes a linear ordinary Boltzmann sampler when specializing to (s i = x i , t i = x i ). Moreover, the counting coefficients | A n | obey the asymptotic form c ρ -n n -5/2 , as shown in [26] for trees, [31] for cacti graphs, and [3] for outerplanar graphs. Therefore the coefficients | A • n | obey the asymptotic form c ρ -n n -3/2 .
Note that the derived species A ′ is not cycle-pointed. However, it can be identified with A • (1) . Indeed, by adding a new label and a pointing loop on the marked atom, one obtains a bijective correspondence between A • (1) and X • ⋆ A ′ .
The calculations have been done with the help of the computer algebra system Maple.
The asymptotic behaviour cρ -n n -3/2 is called universal[1], as it is widely encountered in combinatorics.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment