The BIN_COUNTS Constraint: Filtering and Applications

The bin counts Constrain t: Filtering and Applications Rob erto Rossi ∗ 1 , ¨ Ozg ¨ ur Akg ¨ un 2 , Stev en Prestwic h 3 , and Armagan T arim 4 1 Business Sc ho ol, Universit y of Edin burgh, Edin burgh, UK 2 Departmen t of Computer Science, Universit y of Sain t Andrews, UK 3 Insigh t Centre for Data Analytics, Univ ersit y College Cork, Ireland 4 Departmen t of Management, Cank ay a Univ ersit y , T urkey Abstract W e introduce the bin counts constrain t, which deals with the problem of counting the n umber of decision v ariables in a set whic h are assigned v alues that lie in giv en bins. W e illustrate a decomp osition and a ﬁltering algorithm that ac hieves generalised arc consistency . W e contrast the ﬁltering p o wer of these tw o approaches and we discuss a n umber of appli- cations. W e show that bin counts can b e employ ed to develop a decomp osition for the χ 2 test constrain t, a new statistical constraint that we introduce in this w ork. W e also show how this new constraint can b e employ ed in the context of the Balanced Academic Curriculum Problem and of the Balanced Nursing W orkload Problem. F or b oth these problems w e carry out n umerical studies inv olving our reformulations. Finally , w e present a further application of the χ 2 test constraint in the con text of conﬁdence interv al analysis. Keyw ords: bin counts ; generalised arc consistency; decomposition; statistical constraint; χ 2 test constraint. ∗ Corresponding author. Address: 29 Buccleuch place, EH89JS, Edinburgh, UK. Email: rob erto.rossi@ed.ac.uk 1 1 In tro duction In Constraint Programming (CP) [34] coun ting constraints and v alue constrain ts represent tw o imp ortan t constraint classes which can b e used to mo del a wide range of practical problems in areas suc h as scheduling and rostering. Giv en a list of n umbers, coun ting the num ber of elements in the list whose v alues lie in suc- cessiv e bins of given widths represen ts a problem that is often faced in spreadsheet mo deling and mathematical computation. T o ols that deal with this problem are a v ailable in most modern spread- sheets and mathematical computation programs. In Excel TM , this mo deling to ol takes the form of command FREQUENCY ; in Mathematica TM , of command BinCounts . In this work w e in troduce the bin counts constraint, whic h models the aformen tioned problem in a declarative fashion. T o the b est of our knowledge, no comparable constraint exists in state- of-the-art CP solv ers. W e ﬁrst in tro duce a decomposition for this constrain t, as well as a p olynomial ﬁltering algorithm that ac hieves generalized arc consistency . W e then present a num b er of applications for this new constrain ts. In our ﬁrst application, w e sho w ho w bin counts can b e used to dev elop a decomp osition for a new statistical constrain t [35]: the χ 2 test constraint. In our second and third applications, we demonstrate the adv an tage of enforcing generalised arc consistency in the context of t wo standard CP benchmark problems: the balanced academic curriculum problem [6] and the balanced n ursing workload problem [5]. Finally , w e present a further application of a v ariant of the χ 2 test constrain t in the context of conﬁdence interv al analysis. This w ork is structured as follows. In Section 2 w e provide relev an t notions on CP . In Section 3 we in tro duce the bin counts constraint. In Section 4 we discuss a decomp osition for this constrain t and in Section 5 w e introduce a p olynomial ﬁltering algorithm that ac hiev es generalised arc consistency . In Section 6 w e contrast p erformance of these t wo approaches. In Section 7 w e discuss applications of the constraint. In Section 8 w e survey related w orks in the literature and in Section 9 w e draw conclusions. 2 F ormal bac kground A Constrain t Satisfaction Problem (CSP) is a triple h V , C, D i , where V is a set of decision v ariables, D is a function mapping each elemen t of V to a domain of potential v alues, and C is a set of constrain ts stating allow ed combinations of v alues for subsets of v ariables in V [34]. A solution to a CSP is an assignment of v ariables to v alues in their resp ective domains such that all of the constrain ts are satisﬁed. There are diﬀerent kinds of constraints used in CP: e.g. logic constraints, linear constraints, and glob al c onstr aints [32]. A global constraint captures a relation among a non-ﬁxed num b er of v ariables. Constrain ts typically embed dedicated ﬁltering algorithms able to remov e prov ably infeasible or sub optimal v alues from the domains of the decision v ariables that are constrained and, therefore, to enforce some degree of c onsistency , e.g. arc consistency (AC) [18], b ounds consistency (BC) [7] or generalised arc consistency (GAC) [10]. A constrain t is gener alize d ar c- c onsistent if and only if, when a v ariable is assigned any of the v alues in its domain, there exist compatible v alues in the domains of all the other v ariables in the constraint. A classical example of an arc consistency algorithm is the one presented in [30] for the all different constrain t, whic h constrains a giv en num b er of decision v ariables to take v alues that are all diﬀerents. Filtering algorithms are rep eatedly called until no more v alues are pruned; this pro cess is called c onstr aint pr op agation . In addition to constraints and ﬁltering algorithms, constraint solvers also feature a heuristic se ar ch engine , e.g. a bac ktracking algorithm guided by dedicated v ariable and v alue selection heuristics. During search, the constraint solv er explores partial assignments and exploits ﬁltering algorithms in order to proactiv ely prune parts of the searc h space that cannot lead to a feasible or to an optimal solution. 2 3 The bin counts constrain t Consider a list of n in teger v alues and a set of m bins cov ering interv als [ b j , b j +1 ), j = 1 , . . . , m , our aim is to coun t, for each bin, the num b er of elemen ts in the list whose v alues lie in it. Example. Consider the following list of n = 10 v alues { 1 , 1 , 5 , 3 , 1 , 2 , 1 , 1 , 3 , 1 } and a set of m = 3 bins co vering interv als [1 , 3), [3 , 4), [4 , 6). By using command BinCounts in Mathematica TM w e obtain the following counts for the 3 bins considered: { 7 , 2 , 1 } . Without loss of generality and for ease of exp osition, in what follows, we will keep referring to the case in which v alues and bin sizes are integer. Ho wev er, it should b e noted that the bin counts constraint can b e extended to the case in which b oth v alues and bin sizes are real v alues; the generalized arc consistency propagator in Section 5 seamlessly applies to this case. No w, let b 1 , . . . , b m +1 b e scalar v alues, x i for i = 1 , . . . , n b e decision v ariables with domain Dom( x i ) and c j for j = 1 , . . . , m b e decision v ariables with domain Dom( c j ). Deﬁnition 1. bin counts b 1 ,...,b m +1 ( x 1 , . . . , x n ; c 1 , . . . , c m ) holds iﬀ c j is e qual to the c ount of values assigne d to x 1 , . . . , x n which lie within interval [ b j , b j +1 ) . 4 A decomp osition strategy The global cardinality v 1 ,...,v m ( x 1 , . . . , x n ; c 1 , . . . , c m ) constraint [21] requires that, for each j = 1 , . . . , m , decision v ariable c j is equal to the n umber of v ariables x 1 , . . . , x n that are assigned scalar v j . W e decomp ose bin counts by means of b m +1 − b 1 auxiliary v ariables, a global cardinality constrain t, and a set of linear equalities as shown in Fig. 1. Essentially , we count o ccurrences o k of individual v alues k ∈ S m j =1 [ b j , b j +1 ) that app ear in any of the bins (constraint 1). Then we sum all occurrences that b elong to a given bin (constrain t 2). Finally , w e mak e sure that bin coun ts c j sum to n (constraint 3). Note that this latter constraint can b e formulated as an inequalit y ( ≤ ) if w e wan t to allo w the x i to tak e v alues that fall outside the range of v alues co vered b y bins. Constrain ts: (1) global cardinality b 1 ,...,b m +1 ( x ; o ) (2) P b j +1 − 1 k = b j o k = c j j = 1 , . . . , m (3) P m j =1 c j = n P arameters: b 1 , . . . , b m +1 bin b oundaries n n umber of v alue v ariables m n umber of bins Decision v ariables: x i v alue v ariables c j bin coun ts o k o ccurrences of v alue k Figure 1: bin counts decomposition Example. Consider the following numerical example with n = 3 v ariable x i , such that Dom( x 1 ) = { 3 , 4 } , Dom( x 2 ) = { 1 , 2 , 4 } , and Dom( x 3 ) = { 2 , 3 , 4 } ; and m = 2 bins ( b 1 = 1, b 2 = 3, and b 3 = 5) where Dom( c 1 ) = { 1 , 2 , 3 } and Dom( c 2 ) = { 0 , 1 } . W e apply constraint prop- agation to the ab ov e decomp osition b y enforcing GAC on each constraint until a ﬁxed p oint in reac hed. After ﬁltering, domains are reduced as follo ws: Dom( x 1 ) = { 3 , 4 } , Dom( x 2 ) = { 1 , 2 , 4 } , Dom( x 3 ) = { 2 , 3 , 4 } , Dom( c 1 ) = { 2 , 3 } , and Dom( c 2 ) = { 0 , 1 } . As we will sho w in the next section, these domains can b e further reduced, therefore this decomp osition do es not achiev e GA C. W e shall next in tro duce GA C ﬁltering for the bin counts . 3 s v x 1 v x 2 v x 3 v c 1 v c 2 t 1 1 1 { 3 , 4 } { 4 } { 3 , 4 } { 1 , 2 } { 2 } 1,2,3 0,1 Figure 2: Bipartite netw ork ﬂo w graph for the numerical example; arc ( v x i , v c j ) lab els are shown in curly brac kets, arc ﬂows are sho wn in b old; w e omit arc ( v x i , v c j ) capacities, whic h are all set to 1 5 Enforcing generalized arc consistency In this section w e ﬁrst illustrate theoretical properties and then present our ﬁltering strategy that ac hieves GA C. 5.1 Theoretical properties W e reform ulate bin counts as a bipartite netw ork ﬂo w problem [2]. W e generate a bipartite graph with one set of vertexes v x 1 , . . . , v x n and another set of vertexes v c 1 , . . . , v c m . An arc ( v x i , v c j ) exists b et ween no de v x i and node v c j if and only if there exists at least a v alue in the domain of x i whic h falls in bin [ b j , b j +1 ). W e lab el arc ( v x i , v c j ) with the set of relev ant v alues from Dom( x i ) that fall within bin [ b j , b j +1 ); note that these lab els do not reﬂects arc capacities, which are all set to 1 for arcs ( v x i , v c j ). W e add a source no de s linked to no des v x 1 , . . . , v x n and a terminal no de t linked to no des v c 1 , . . . , v c m ; arc ﬂows from s to x i are set to 1; ﬂow c j passing through arc ( v c j , t ) must tak e a v alue in Dom( c j ). W e shall initially assume that all Dom( c j ) are compact; this assumption will b e relaxed at the end of this section. Example. Consider once more the example in tro duced in Section 4. The asso ciated bipartite graph is sho wn in Fig. 2. The feasible region of bin counts can b e expressed as a system of linear equations. In the aforemen tioned graph theoretical construct let f ij denote the ﬂo w from v x i to v c j . The linear inequalities that deﬁne our problem are P m j =1 f ij = 1 i = 1 , . . . , n (1) P n i =1 f ij ≤ c j j = 1 , . . . , m (2) P n i =1 f ij ≥ c j j = 1 , . . . , m (3) where c j and c j represen t the upp er and the lo wer bounds of the domain of c j , resp ectiv ely . Note that the fact that a decision v ariable x i is assigned a v alue v , where v falls within bin [ b j , b j +1 ), can b e expressed b y adding a constraint f ij = 1. Similarly , if during search constraint propagation on x i ﬁlters all v alues listed in the label of arc ( v x i , v c j ), 1 this can b e expressed by adding a constrain t f ij = 0. As w e will discuss later, these observ ations can b e exploited to dev elop incremental propagators. Theorem 1. The fe asible r e gion asso ciate d with the system of line ar e quations stemming fr om the deﬁnition of bin counts is an inte gr al p olyhe dr on. Pr o of. W e show that the constraint matrix is totally unimo dular (TUM); this implies that feasible region asso ciated with the system of linear equations is an in tegral p olyhedron [40]. In [14] the authors prov ed that, if A is a matrix whose rows can b e partitioned into tw o disjoint sets B and C , the following four conditions together are suﬃcient for A to b e totally unimo dular: 1. every column of A contains at most tw o non-zero entries; 2. every entry in A is 0, 1, or − 1; 1 The remo v al of all v alues in the lab el of arc ( v x i , v c j ) does not necessarily lead to a wip eout of Dom( x i ) since some v alues of the domain may appear in the lab el of arcs ( i, k ), where k 6 = j . 4 f 11 f 12 . . . f 1 m f 21 f 22 . . . f 2 m . . . f n 1 f n 2 . . . f nm 1 1 . . . 1 0 0 . . . 0 . . . 0 0 . . . 0 0 0 . . . 0 1 1 . . . 1 . . . 0 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 . . . 0 0 0 . . . 0 . . . 1 1 . . . 1 1 0 . . . 0 1 0 . . . 0 . . . 1 0 . . . 0 0 1 . . . 0 0 1 . . . 0 . . . 0 1 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 . . . . . . . . . 0 0 . . . 1 0 0 . . . 1 . . . 0 0 . . . 1          B          C Figure 3: TUM constraint matrix A 3. if tw o non-zero en tries in a column of A hav e the same sign, then the row of one is in B , and the other in C ; 4. if tw o non-zero entries in a column of A ha ve opp osite signs, then the rows of both are in B , or b oth in C . W e reformulate the ab ov e set of linear inequalities in standard form as follo ws P m j =1 f ij = 1 i = 1 , . . . , n (4) P n i =1 f ij + s j = c j j = 1 , . . . , m (5) P n i =1 f ij − e j = c j j = 1 , . . . , m (6) where s j and e j are nonnegative. W e then consider the co eﬃcien t matrix A of this system of linear equations and show that it satisﬁes the four conditions ab o ve. It is p ossible to show that TUM matrices are closed under the following op erations: adding a row or column with at most one non-zero entry , and rep eating a row or a column [41, lemma 9.2.2]. W e therefore remo ve from A all columns with one or less non-zero entries (i.e. columns corresp onding to slack v ariables s j and excess v ariables e j ), as well as rep eated rows and columns — after removing matrix columns asso ciated with slack/excess v ariables, matrix rows corresp onding to constraints 5 and 6 b ecome iden tical. The resulting constrain t matrix is sho wn in Fig. 3. If we partition ro ws as sho wn in the ﬁgure, this will satisfy the four conditions. Prop osition 1. bin counts admits a fe asible assignment iﬀ the ab ove system of line ar e quations admits a solution. Pr o of. follows b y construction and from Theorem 1. Prop osition 2. BC and GAC ar e e quivalent for bin counts . Pr o of. the feasible region is a conv ex integral polyhedron (Theorem 1). More intuitiv ely , holes in Dom( c j ) do not lead to ﬁltering since each arc ( v x i , v c j ) contributes at most one unit of ﬂo w; for this reason, we can now relax the assumption that Dom( c j ) is compact. Domain contraction on x i leads to ﬁltering only when x i is instantiated to a v alue (i.e. f ij = 1) or when w e observe a lab el wip eout on arc ( v x i , v c j ) (i.e. f ij = 0). 5.2 Filtering f ij and c j v ariables GA C on c j and x i v ariable can b e enforced by solving 2( n + nm ) linear programs based up on the ab o ve system of linear equations; solving linear programs is in P [16]. F or all j = 1 , . . . , n a low er b ound (resp. upp er b ound) for Dom( c j ) can be found by solving min (resp. max) c j (7) sub ject to Eq. (1) , . . . , (3) 5 F or all i = 1 , . . . , m and j = 1 , . . . , n a low er b ound (upp er b ound) on f ij can b e found by solving min (resp. max) f ij (8) sub ject to Eq. (1) , . . . , (3) If f ij = 0, all v alues in arc ( i, j ) lab el should b e remov ed from Dom( x i ); conv ersely , if f ij = 1, all v alues that do not app ear in arc ( i, j ) lab el should be remov ed from Dom( x i ). The GAC propagator pseudoco de is provided in Algorithm 1. The algorithm takes as input the set X ≡ { x 1 , . . . , x n } of v alue decision v ariables, the set C ≡ { c 1 , . . . , c m } of bin counts decision v ariables, and the set of bin b oundaries B ≡ { b 1 , . . . , b m +1 } . The ﬁrst lo op (line 1) tigh tens upper and lo wer b ounds for the v ariables in C . The second lo op (line 11) tightens upp er and lo wer b ounds for the v ariables in X . Both lo ops use the ﬁltering logic introduced in the previous paragraphs of this section. Algorithm 1: bincounts GAC propagator Input: Sets of v alue X ≡ { x 1 , . . . , x n } and bin coun ts C ≡ { c 1 , . . . , c m } decision v ariables; set of bin boundaries B ≡ { b 1 , . . . , b m +1 } Output: Filtered domains for decision v ariables in X and C 1 for j ← 1 to m do 2 c lb j ← min c j sub ject to Eq. (1),. . . ,(3) 3 c ub j ← max c j sub ject to Eq. (1),. . . ,(3) 4 if the line ar pr o gr am is infe asible then 5 Dom( c j ) ← ∅ 6 else 7 inf(Dom( c j )) ← c lb j 8 sup(Dom( c j )) ← c ub j 9 end 10 end 11 for i ← 1 to n and j ← 1 to m do 12 f lb ij ← min f ij sub ject to Eq. (1),. . . ,(3) 13 f ub ij ← max f ij sub ject to Eq. (1),. . . ,(3) 14 if the line ar pr o gr am is infe asible then 15 Dom( f ij ) ← ∅ 16 else if f lb ij = 1 then 17 for l ← 1 to m , l 6 = j do 18 Dom( x i ) = Dom( x i ) \ [ b l , b l +1 ) 19 end 20 else if f ub ij = 0 then 21 Dom( x i ) = Dom( x i ) \ [ b j , b j +1 ) 22 end 23 end A pseudo co de for incremental propagation triggered by a change in the domain of a v ariable v ∈ X ∪ C is sho wn in Algorithm 2. Incremental propagation requires a set of stored b o oleans g ij , for i = 1 , . . . , n and j = 1 , . . . , m ; in CP solv ers, stored b o oleans are b o olean v ariables whose state is track ed during search and restored by backtrac king. A t the b eginning of search, after Algorithm 1 has b een applied to enforce GAC, eac h g ij tak es v alue true if and only if Dom( x i ) contains at least a v alue that ﬁts in [ b j , b j +1 ). Algorithm 2 tak es then as input the set X ≡ { x 1 , . . . , x n } of v alue decision v ariables, the set C ≡ { c 1 , . . . , c m } of bin counts decision v ariables, the set of bin boundaries B ≡ { b 1 , . . . , b m +1 } , and the stored bo oleans g ij . The propagation logic is divided into tw o procedures. Let v be the decision v ariable for whic h a change in the domain triggered propagation. Pro cedure updateDomainsBin do es not presen t any substantial diﬀerence 6 from the ﬁrst lo op of Algorithm 1. Con versely , pro cedure updateDomainsValue presents a n umber of diﬀerences from the second lo op of Algorithm 1. More sp eciﬁcally , if v b elongs to X (line 2) and i is the index of v in X , w e record in b o olean v ariable ˆ g j the current v alue of stored b o olean g ij for all bins j = 1 , . . . , m . Then, we iterate for all combinations of i and j (line 8). F or a given com bination of i and j , if v b elongs to X and ˆ g j and g ij are not b oth true (line 9) — i.e. domains of v ariable x i and v ariable v do not contain at least a v alue eac h that ﬁts in [ b j , b j +1 ) — we mo ve to the next combination of i and j . Otherwise, we up date domains similarly to the second lo op (line 11) of Algorithm 1 while also updating stored b o oleans g ij . Algorithm 2: bincounts GAC incremen tal propagator Input: Sets of v alue X ≡ { x 1 , . . . , x n } and bin coun ts C ≡ { c 1 , . . . , c m } decision v ariables; set of bin boundaries B ≡ { b 1 , . . . , b m +1 } ; stored bo oleans g ij 1 Algorithm propagate( v ) 2 updateDomainsBin() 3 updateDomainsValue( v ) 1 Pro cedure updateDomainsBin() 2 for j ← 1 to m do 3 c lb j ← min c j sub ject to Eq. (1),. . . ,(3) 4 c ub j ← max c j sub ject to Eq. (1),. . . ,(3) 5 if the line ar pr o gr am is infe asible then 6 Dom( c j ) ← ∅ 7 else 8 inf(Dom( c j )) ← c lb j 9 sup(Dom( c j )) ← c ub j 10 end 11 end 1 Pro cedure updateDomainsValue( v ) 2 if v ∈ X then 3 Let i be the index of v in X 4 for j ← 1 to m do 5 ˆ g j ← g ij 6 end 7 end 8 for i ← 1 to n and j ← 1 to m do 9 if v ∈ X and ¬ ( ˆ g j ∧ g ij ) then 10 con tinue 11 end 12 f lb ij ← min f ij sub ject to Eq. (1),. . . ,(3) 13 f ub ij ← max f ij sub ject to Eq. (1),. . . ,(3) 14 if the line ar pr o gr am is infe asible then 15 Dom( f ij ) ← ∅ 16 else if f lb ij = 1 then 17 for l ← 1 to m , l 6 = j do 18 Dom( x i ) = Dom( x i ) \ [ b l , b l +1 ) 19 g il ← false 20 end 21 else if f ub ij = 0 then 22 Dom( x i ) = Dom( x i ) \ [ b j , b j +1 ) 23 g ij ← false 24 end In the context of an incremental propagator, instead of rebuilding the linear programs from scratc h at every propagation round, one may store the current state of the constraint matrix and 7 enforce additional constraints when some f ij are ground. Dual simplex can then b e used to ﬁnd a new feasible and optimal solution. Ho wev er, in practice the ov erhead associated with rebuilding the linear programs during propagation is minimal. Example. F or the example introduced in Section 4 ﬁltered domains are Dom( x 1 ) = { 3 , 4 } , Dom( x 2 ) = { 1 , 2 } , Dom( x 3 ) = { 2 } , Dom( c 1 ) = { 2 } , and Dom( c 2 ) = { 1 } . These domains are smaller than those obtained via the decomposition discussed in Section 4. It is ﬁnally worth mentioning that the ﬁltering algorithm describ ed requires all v ariables x i to b e assigned a v alue that can b e mapp ed to one of the a v ailable bins. Alternativ ely , one ma y w ant to implement a semantics that simply disregards v ariables x i that are assigned a v alue that cannot b e mapped to one of the av ailable bins. This extension of the ab ov e propagator is relatively straigh tforward: one simply needs to add to the graph theoretical construct a “hidden” bin to whic h all v alues that cannot be mapp ed to any other bin will b e mapp ed. 6 Computational study In this section w e contrast p erformance of the decomposition presented in Section 4 and of the GA C ﬁltering presented in Section 5. All exp erimen ts in this pap er are carried out on a 2.2GHz In tel Core i7 Macb o ok Air ﬁtted with 8Gb of RAM. W e implemented the decomp osition (Dec) discussed in Section 4 b y using tw o systems featur- ing implementation of global cardinality : JaCoP 2 [17], which features a BC implementation based on [15]; and Cho co 3.3.0 3 [28]. W e dev elop ed a bin counts global constrain t in Choco implemen ting the GAC ﬁltering in Section 5; the linear programming library used in the ﬁltering algorithm is o j! Algorithms. 4 W e consider a set of 50 randomly generated instances. Eac h instance features n = 15 v ariables v i and m = 9 bins such that b i = 5( i − 1) for i = 1 , . . . , m + 1. Dom( v i ) comprises up to 10 in teger v alues uniformly distributed in[0 , 60) — if a v alue is generated more than once, the domain will con tain less than 10 v alues. Recall that c j is a decision v ariable representing the count of bin j , Dom( c j ) comprises v alues { 0 , . . . , U j } , where U j is a random in teger uniformly distributed in [0 , n + 1). W e set as a goal the instantiation of a given fraction f ∈ { 0 , 0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 } of v ariables v , i.e. v 1 , . . . , v b n ∗ f c under a min domain/min v alue search strategy . All instances generated do admit a solution. In Fig. 4 w e contrast ﬁltering p ow er for these tw o approaches. In Fig. 5 w e con trast searc h time to ac hieve the goal. In both ﬁgures, w e report the 5th and the 95th p ercentiles, as well as the mean. GAC ﬁlters more, but it is slo wer than the decomp osition b oth in its basic and incremental v ariants. The decomp osition in Cho co features a stronger ﬁltering compared to the one in JaCoP . 7 Applications In this section we present a num b er of applications of bin counts . In our ﬁrst application we in tro duce a new statistical constraint [35] — the χ 2 test constraint — and we develop a decomp o- sition for it which relies on bin counts ; in our second and third applications, w e discuss v ariants of the Balanced Academic Curriculum Problem [6] and of the Balanced Nurse W orkload Problem [5] which makes use of this new statistical constraint; ﬁnally , in our fourth application, we employ a v ariant of the χ 2 test constrain t in the context of a well-kno wn problem from the literature on conﬁdence in terv al analysis. Since all mo dels presented feature a mix of contin uous and real v ariables, we rely on the extensions discussed in [9], which mak es it possible to mo del real v ariables and constrain ts within a Cho co mo del and delegate asso ciated reasoning during searc h to Ib ex, 5 a library for constraint pro cessing o ver real num b ers. 2 http://jacop.osolpro.com/ 3 http://www.choco- solver.org/ 4 http://ojalgo.org/ 5 http://www.ibex- lib.org/ 8 0 20 40 60 80 0 20 40 60 80 P ercentage of v ariables assigned P ercentage of v alues ﬁltered Dec JaCoP Dec Cho co GA C Figure 4: Relativ e p erformance of the decomp osition presented in Section 4 and of the GAC propagator presen ted in Section 5 in terms of ﬁltering p o wer. 0 20 40 60 80 10 − 3 10 − 2 10 − 1 10 0 10 1 P ercentage of v ariables assigned Searc h time (sec) Dec JaCoP Dec Cho co GA C GA C Incremen tal Figure 5: Relative p erformance of the decomp osition (Dec) presented in Section 4 and of the GAC (basic and incremen tal) propagator presented in Section 5 in terms of search time (sec). 9 Constrain ts: (1) bin counts b 1 ,...,b m +1 ( v 1 , . . . , v n ; c 1 , . . . , c m ) (2) P m i =1 ( c i − t i ) 2 /t i ≤ F − 1 χ 2 m − 1 (1 − α ) P arameters: b 1 , . . . , b m +1 bin b oundaries F − 1 χ 2 m − 1 in verse χ 2 distribution with m − 1 degrees of freedom α target signiﬁcance for the χ 2 test Decision v ariables: v 1 , . . . , v n observ ed v alues c 1 , . . . , c m bin coun ts t 1 , . . . , t m target coun ts for each bin Figure 6: χ 2 test statistical constrain t decomp osition 7.1 P earson’s χ 2 test statistical constrain t Statistical constraints were originally discussed in [35]. A statistical constraint is a constraint that em b eds a parametric or a non-parametric statistical mo del and a statistical test with signiﬁcance lev el 6 α that is used to determine which assignmen ts satisfy the constraint. Recen t applications include [22, 33]. Existing statistical constraints include the Student’s t test constrain t and the Kolmogorov- Smirno v constraint, b oth of whic h rely on the asso ciated statistical tests. Although no implemen- tation exists for the Studen t’s t test constrain t, if one exploits Cho co extensions for mo deling real v ariables, a decomp osition is immediate since the asso ciated test statistic can b e easily mo delled as an algebraic expression and forced to be less or equal to a giv en critical v alue. A statistical constrain t not yet discussed in the literature is the χ 2 test constrain t. This constrain t relies on Pearson’s χ 2 test of go o dness of ﬁt [23], which establishes whether an observed frequency distribution diﬀers from a theoretical distribution. In the χ 2 test statistical constrain t χ 2 -test α b 1 ,...,b m +1 ( v 1 , . . . , v n ; c 1 , . . . , c m ; t 1 , . . . , t m ) v i is a decision v ariable that represen ts a random v ariate; c j is a decision v ariable that represen ts the n umber of v ariables v i whic h tak e a v alue in [ b j , b j +1 ); t j is a decision v ariable that represen ts the theoretical reference coun t for bin [ b j , b j +1 ). An assignment satisﬁes χ 2 test statistical constrain t iif a χ 2 test at signiﬁcance level α fails to reject the null hypothesis that the observed diﬀerence b et ween the theoretical reference counts and the observ ed coun ts arose by chance. A decomp osition for the χ 2 test statistical constraint is shown in Fig. 6: constraint (1) deals with the computation of the coun ts, constraint (2) restricts the χ 2 test statistic. F or the χ 2 test statistical constrain t it is worth observing that an α close to one restricts the solution space to those assignments for which the observed diﬀerence b etw een the theoretical reference counts and the observ ed counts is small. As α decreases, higher ﬂuctuations from the theoretical reference coun ts will b e tolerated. Example. W e consider a problem with n = 24 v ariables v i and m = 6 bins such that b i = 5( i − 1) for i = 1 , . . . , m + 1. Dom( v i ) comprises v alues { 0 , . . . , U x i } , where U x i is a random integer n umber uniformly distributed in [0 , 30). Dom( c j ) comprises v alues { 0 , . . . , n } . W e implemented the model and set as a goal the instan tiation of v ariables v 1 , . . . , v n . Theoretical reference coun ts t j for the m bins are t = { 2 , 4 , 10 , 4 , 2 , 2 } . The bin counts for tw o solutions obtained for signiﬁcance lev els α ∈ { 0 . 95 , 0 . 99 } are sho wn in Fig. 7 and con trasted against theoretical reference coun ts. 6 The signiﬁcance level is the probability of rejecting a null hypothesis, given that it is true. 10 0-5 5-10 10-15 15-20 20-25 25-30 0 1 2 3 4 5 6 7 8 9 10 Bins Bin coun ts bin coun ts α = 0 . 95 bin coun ts α = 0 . 99 target bin coun ts Figure 7: Sample bincoun ts obtained in the context of our n umerical example for diﬀeren t v alues of α : when α = 0 . 95, χ 2 = 1 . 10 ≤ F − 1 χ 2 m − 1 (1 − α ) = 1 . 14; when α = 0 . 99, χ 2 = 0 . 35 ≤ F − 1 χ 2 m − 1 (1 − α ) = 0 . 55. 7.2 Balanced academic curriculum problem CSPLib [11] problem 30, the Balanced Academic Curriculum Problem (BA CP) [6], asks to assign courses to semesters in suc h a wa y as to balance academic load — the sum of credits from courses in a semester — among semesters. In addition, there are constraints on minimum and maximum n umber of courses p er semester, and some courses are prerequisite to others, e.g. course B must b e assigned to a semester follo wing the one in whic h course A is assigned. The ob jective that is optimised in order to ac hieve a balanced curriculum v aries from work to w ork: the original form ulation seeks to minimise the maximum load in any given semester, other formulations, see e.g. [24], employ the L 2 -deviation to measure balance. In this section we discuss a diﬀerent strategy , based on the χ 2 test statistic, to measure curriculum compliance with a target credit load distribution among semesters. Constrain ts: (1) global cardinality 1 ,...,S ( s ; c ) (2) bin p acking w ( s ; l ) (3) s i < s j (course prerequisites) (4) bin counts b 1 ,...,b m +1 ( l ; o ) (5) P m k =1 ( o k − t k ) 2 /t k ≤ F − 1 χ 2 m − 1 (1 − α ) P arameters: b 1 , . . . , b m +1 bin b oundaries S n umber of semesters w i course i credits t k target o ccurrences in bin k Decision v ariables: s i course i semester l j semester j load c j n umber of courses in semester j o k load o ccurrences in bin k Figure 8: A CP formulation for the BACP 11 As discussed in [24], a CP formulation for the BACP (Fig. 8) includes one decision v ariable s i p er course i , whic h indicates to whic h semester the course is assigned; one decision v ariables l j p er semester j recording its academic load; and one decision v ariable c j p er semester recording the n umber of courses allo cated to it. On these v ariables w e enforce a global cardinality constrain t (1) that links s i and c j to record the n umber of courses in eac h semester; a bin p acking constrain t (2) that links s i and l j to capture course credits in each semester; and binary inequalities (3) b et ween pairs of s i v ariables to record course prerequisites. In addition to the standard constructs ab ov e, our formulation includes a list of scalar bin b oundaries b 1 , . . . , b k , . . . , b m +1 and associated scalar target o ccurrences t k , for k = 1 , . . . , m , which are employ ed to capture the ideal semester load distribution we aim for. The load o ccurrences in bin k are recorded by one decision v ariable o k p er bin via a bin counts constraint (4) that links v ariables l j and o k . Finally , the asso ciated χ 2 test statistic is forced to be less or equal to a relev ant critical v alue (5). Our form ulation leads to a constraint satisfaction and not to a constraint optimisation problem. Alternativ ely , one ma y decide to mo del target o ccurrences t k as decision v ariables and minimise or maximise a giv en measure associated with the load distribution, e.g. mean, v ariance, loss, etc; therefore our study can b e seen as a generalisation of existing studies that inv estigated the optimisation of speciﬁc moments of a distribution. The traditional smallest-domain-ﬁrst (v ariable) and semester with the smallest academic load (v alue) criteria [36] is not suitable for our problem, since our goal is not do minimise the maximum load or the deviation from the mean load. Our aim is chieﬂy to demonstrate the eﬀectiveness of GAC propagation, therefore w e adopted a simple min domain/min v alue search strategy on v ariables s i . W e also implemented the symmetry breaking strategy in [20]. W e considered the 28 instances in [11]. All instances feature 50 courses, 10 semesters, a min of 2 credits and a max of 100 credits per semester; a min of 2 courses and a max of 10 courses p er semester. Let L ub b e the load p er p erio d upp er b ound, we consider bin b ounds { 0 , 15 , 20 , 30 , 35 , L ub + 1 } and asso ciated target occurrences { 1 , 2 , 4 , 2 , 1 } ; signiﬁcance level α is set to 0.99. Example. W e consider the instance “bacp-1” from the CSPLib testb ed. A feasible schedule is sho wn in Fig. 9; semester loads are shown in T able 1. Semester 1 2 3 4 5 6 7 8 9 10 Load 50 34 29 17 7 33 23 29 24 17 T able 1: Semester loads of the feasible course sc hedule (Fig. 9) for instance “bacp-1” from the CSPLib testb ed. Fig. 12 shows the maxim um solution time and the maximum n umber of no des observed for an y giv en n umber of instances. Because of the high v alue of α , for all instances solved the observ ed χ 2 statistic is zero, i.e no deviation from target o ccurrences; in the next sections we will inv estigate cases in whic h a lo wer v alue of α , and thus larger deviations, are b oth relev ant and desirable. F or the GAC model the av erage solution time is 11.6s; half of the instances are solved in less than 2.5s; the av erage num b er of no des explored is 662; three instances (bacp-22,25,27) could not b e solved within the given time limit of 60s. The Dec mo del could only solve 7 instances within the giv en time limit; the av erage solution time is 49.1s; the av erage num b er of no des explored is 1466725. GA C therefore brings orders of magnitude improv ements in b oth nodes explored and search time. 7.3 Balanced n ursing w orkload problem W e further inv estigate application of bin count in the context of CSPLib [11] problem 69, the Balanced Nursing W orkload Problem (BNWP), which was originally in tro duced in [5]. The aim in this problem is to design a balanced workload for nurses caring newb orn patien ts requiring diﬀeren t amount of care ( acuity ). P atients belong to a zone and a nurse can only work in a single zone. There are lo wer and upp er limits on the n umber of patient the nurse can handle and on the asso ciated w orkload expressed in terms of total acuit y . The authors in [5] prop osed a Mixed Integer Programming approach. CP approac hes based on the spread constraint w ere discussed in [39, 37], where the authors proposed a decomp osition 12 Course load Figure 9: F easible course schedule for instance “bacp-1” from the CSPLib testb ed. strategy that pre-computes the n umber of nurses for each zone and then solves each zone separately . An alternative CP approach based on the dispersion constrain t was discussed in [24] and more recen tly in [25], where nurse dep endent w orkloads are mo delled. It is out of the scop e of this section to pro vide a comprehensiv e discussion on the BNWP . Our k ey concern here is to pro vide an alternativ e reform ulation for the problem based on the bin count constrain t. While most previous works on this problem fo cused on minimizing the L 2 -norm. W e suggest an alternative approac h in which the decision markers assigns a certain num b er of “slots” to each nurse and then sets an “ideal” workload distribution o ver these slots. F or instance, eac h n urse should b e dealing with up to 6 patien ts of which ideally , 2 should hav e acuity in [0,30), 2 in [30,60), and 2 in [60,100). Fluctuations from this ideal patient distribution are accepted, but should b e minimised for the n urse p opulation. While mo deling our v ariant of the problem we deviate from the standard CP formulation discussed in [24]. More sp eciﬁcally , in our mo del (Fig. 11) there are N n urses, P patients, and S patien t slots p er nurse. The acuit y of patient p is a p . The mo del features S · N decision v ariables g n s , each of whic h represents patient allo cated to slot s of nurse n . The acuit y condition of g n s is represen ted b y decision v ariable c n s . Acuit y occurrences in bin k of n urse n are represen ted b y decision v ariable o n k . There are b 1 , . . . , b m +1 acuit y bin b oundaries and, as mentioned, the decision maker m ust set a target num b er of patients in bin k for a n urse. Note that it is p ossible to express nurse dep enden t target o ccurrence distributions; but to keep the discussion simple, we will here assume all n urses share the same target o ccurrence distribution. On these v ariables and 13 0 5 10 15 20 25 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 Num b er of instances solved Maxim um solution time (s) Time (GA C) Time (Dec ) 10 − 2 10 0 10 2 10 4 10 6 No des Time (GAC) Time (Dec) No des (GAC) No des (Dec) Figure 10: Solution times for the BACP te st bed parameters w e enforce the follo wing constraints: an all different constraint (2) on v ariables g n s , to ensure all patients are assigned to diﬀerent nurses’ slots — note that to ensure that the num b er of patients is a multiple of S · N it is p ossible to insert a num b er of dummy patients with zero acuit y; an element constrain t (3) to ensure that v ariable c n s represen ts the acuit y of patien t g n s ; a bin counts constraint (4) for each n urse to relate v ariables v ariable c n s with acuit y occurrences o n k ; and a linear inequality (5) that ensures the normalised deviation from target o ccurrences (eﬀectiv ely a χ 2 statistic) remains b elow K for all nurses. K is ﬁnally minimised in the ob jective (1). Ob jectiv e function: (1) min K Constrain ts: (2) all different ( g ) (3) element ( c n s , a, g n s ) for all n and s (4) bin counts b 1 ,...,b m +1 ( c n ; o ) for all n (5) P m k =1 ( o n k − t k ) 2 /t k ≤ K for all n P arameters: N n umber of nurses, index n P n umber of patients, index p S n umber of patient slots p er nurse, index s a p acuit y of patient p b 1 , . . . , b m +1 acuit y bin b oundaries t k target # patien ts in bin k for a nurse Decision v ariables: g n s patien t allo cated to slot s of n urse n c n s acuit y condition of g n s o n k acuit y o ccurrences in bin k of nurse n Figure 11: A CP formulation for the BNWP By using the mo del introduced, we solved slightly mo diﬁed versions of the test instances 14 Nurse 1 Nurse 2 Nurse 3 s 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 g n s 1 4 7 11 13 18 2 5 8 12 14 17 3 6 9 10 15 16 c n s 59 44 39 32 26 0 57 42 39 27 22 11 50 40 33 33 20 17 T able 2: Optimal n urse-patien t allo cation for instance “2zones0” Bins [0,30) [30,60) [60,100) K ( χ 2 statistic) Nurse 1 2 4 0 4 Nurse 2 3 3 0 3 Nurse 3 2 4 0 4 T arget 2 2 2 T able 3: Observ ed patien t acuity occurrence distributions for instance “2zones0” originally prop osed in [39] and av ailable on CSPLib. More sp eciﬁcally , w e did adopt a de - comp osition approac h and w e th us solv ed each zone separately . Most imp ortantly , we set the n umber of slot p er nurse to 6 and adopted the previously mentioned “ideal” workload distribu- tion, which assigns for each nurse 2 patients to each of the 3 bins identiﬁed by bin b oundaries b 1 = 0 , b 2 = 30 , b 3 = 60 , b 4 = 100. W e precomputed the num b er of nurses needed for patients in an y given zone, i.e. N = d P /S e , and we extended the list of patients to include P 0 patien ts, with Z = P 0 − P zero-acuity patients to ensure that P 0 = N · S . W e ignored the min/max n umber of patients p er n urse as well as the maximum workload, since these instance parameters b ecome irrelev ant once an “ideal” workload distribution is deﬁned for the nurses. Example. W e consider instance “2zones0” av ailable on CSPLib. In this instance, we hav e tw o zones and 11 nurses. T o keep the example simple, w e decomp ose the problem and focus solely on zone 1. There are P = 17 patients with acuities a = { 59 , 57 , 50 , 44 , 42 , 40 , 39 , 39 , 33 , 33 , 32 , 27 , 26 , 22 , 20 , 17 , 11 } . W e assume each nurse should care of S = 6 patients in total therefore w e can cov er the zone by using N = 3 n urses in total; a dumm y patients (#18) with zero acuity m ust then b e added to ensure that P 0 = N · S . The allo cation plan that minimises K is shown in T able 2; the observ ed patien t acuity occurrence distributions are sho wn in T able 3. The minimised maximum χ 2 statistic v alue of the optimal plan is 4. It is clear that balancing allo cations in such a wa y as to minimise ﬂuctuations from an ideal w orkload distribution is not a trivial task even for an instance as small as this one. Once more, w e adopted a simple min domain/min v alue search strategy in which our goal is instan tiation of v ariables g n s . W e also implemented a naiv e symmetry breaking strategy that forces g n s < g n s +1 for all s and n ; and also g n 1 < g n +1 1 for all n . Results of our computational study are sho wn in Fig. 12, which as before illustrates the maximum solution time and maximum num b er of no des observ ed for any given n umber of instances. Because of the zone-based decomp osition w e adopted, we ev entually solved a total of 91 instances. Similarly to what we ha ve done for the BA CP , we here inv estigate diﬀerences b et ween the GAC and the decomp osition approach for the bin count . Surprisingly , the decomp osition approach is generally one order of magnitude faster for this problem. Ho wev er, by observing the num b er of no des explored, it is clear that the GAC approac h still leads to more eﬀective pruning. 7.4 Determining conﬁdence in terv als for the m ultinomial distribution In the previous tw o sections the χ 2 test statistic has b een essentially employ ed as as a least square measure of discrepancy from a desired distributional form. In the context of these applications, α therefore represents a constrain t “softening” co eﬃcient, rather than a signiﬁcance lev el. In this section, we introduce an application of a v ariant of the χ 2 test statistical constrain t in the context of the well-kno wn problem of determining sim ultaneous conﬁdence in terv als for the multinomial distribution. In the con text of this application, α retains its original nature of statistical signiﬁcance lev el. 15 0 20 40 60 80 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 Num b er of instances solved Maxim um solution time (s) Time (GA C) Time (Dec ) 10 1 10 2 10 3 10 4 10 5 10 6 No des Time (GAC) Time (Dec) No des (GAC) No des (Dec) Figure 12: Solution times for the BNWP test b ed In statistics a m ultiv ariate generalisation of the χ 2 test is the so-called “score test,” which can b e used to carry out hypothesis testing on multiv ariate distributions (see [19], chap. 5). In this section w e shall concentrate on the multinomial distribution (see [42], section 3). Consider a m ultinomial distribution with even t probabilities p 1 , . . . , p k , where k is the num b er of categories, and N trials. Let x 1 , . . . , x n b e n i.i.d. random v ariates and c 1 , . . . , c k b e asso ciated observed cell coun ts in a sample size of N = P k i =1 c i . The problem of determining sim ultaneous conﬁdence in terv als for p 1 , . . . , p k w as developed in the Sixties by [12, 13, 29]. The maxim um likelihoo d estimators of p j are ˆ p j = c j / N , j = 1 , . . . , k . The random vector ˆ p ≡ ( ˆ p 1 , . . . , ˆ p k ) is asymptotically distributed according to a multiv ariate normal distribution with mean v ector p ≡ ( p 1 , . . . , p k ) and cov ariance matrix Σ / N with elements σ j j = p j (1 − p j ) and σ ij = − p i p j for i 6 = j . In what follows, we shall concen trate on the w ork of [29], who discuss conﬁdence in terv als based on the quadratic form N ( ˆ p − p ) 0 Σ − 1 ( ˆ p − p ) whic h is asymptotically distributed as a χ 2 distribution with k − 1 degrees of freedom. Let 1 − α b e the desired conﬁdence level, conﬁdence interv als are obtained, for j = 1 , . . . , k , as the tw o solutions of equation N ( ˆ p j − p j ) 2 p j (1 − p j ) = F − 1 χ 2 k − 1 (1 − α ) The very same interv als can b e easily computed via a simple v ariant of the mo del originally presen ted in Fig. 6, in which Pearson’s χ 2 statistic is replaced b y Quesen b erry and Hurst’s statistic. The revised mo del is shown in Fig. 13; constraint (3) ensures this model computes conﬁdence in terv al low er b ounds. In [13] the authors discussed tighter version of Quesenberry and Hurst’s interv als. This and other v ariants such as the one in [12] can be modelled by simply modifying the original statistic in constrain t (2). Example. W e consider N = 10 i.i.d. observ ations dra wn from a m ultinomial with even t probabilit y vector p = { 0 . 3 , 0 . 3 , 0 . 4 } . The ten observ ations are x = { 1 , 1 , 2 , 0 , 1 , 1 , 1 , 0 , 2 } , the asso ciated cell counts are c = { 3 , 5 , 2 } . W e set the target signiﬁcance for the score test α = 0 . 1 (i.e. a conﬁdence level 1 − α = 0 . 9); in T able 4 we compare conﬁdence interv als obtained using Quesen b erry and Hurst’s closed form expressions and interv als obtained as solutions of our mo del (Fig. 13) based on the score test statistical constraint. 16 Constrain ts: (1) bin counts 1 ,...,k +1 ( x 1 , . . . , x n ; c 1 , . . . , c k ) (2) N ( c j / N − p j ) 2 p j (1 − p j ) = F − 1 χ 2 k − 1 (1 − α ) j = 1 , . . . , k (3) p j ≤ c j / N j = 1 , . . . , k P arameters: F − 1 χ 2 k − 1 in verse χ 2 distribution with k − 1 degrees of freedom α target signiﬁcance for the score test Decision v ariables: x 1 , . . . , x n random v ariates c 1 , . . . , c k observ ed cell counts p 1 , . . . , p k lo wer bounds for multinomial even t probabilities Figure 13: A score test statistical constraint decomp osition to compute lo wer b ounds of Que- sen b erry and Hurst’s conﬁdence in terv als. T o compute the resp ectiv e conﬁdence interv al upp er b ounds constrain t (3) should b e replaced b y p j ≥ c j / N . Quesen b erry and Hurst’s Score test decomp osition ( p lb 1 , p ub 1 ) (0.0981, 0.6280) (0.0981,0.6280) ( p lb 2 , p ub 2 ) (0.2192, 0.7808) (0.2192,0.7808) ( p lb 3 , p ub 3 ) (0.0509, 0.5383) (0.0509,0.5383) T able 4: Conﬁdence in terv als for our numerical example In the example here presen ted the ten observ ations and the asso ciated cell counts w ere scalar v alues. How ever, the constraint program in Fig. 13 mo dels random v ariates and bin counts as decision v ariables. This opens up opp ortunities for declarative mo deling with applications in m ultiple domains [35, 27, 22, 33]. 8 Related w orks Constrain t categories that are related to bin counts include counting constraints and v alue con- strain ts. Giv en a set of decision v ariables, the count constraint can be exploited to constrain the n umber of v ariables which take a given scalar v alue; among [3] constrains the n umber of decision v ariables which take v alues con tained within a given set of scalar v alues; essen tially this constraint can b e seen as a bin counts o ver a single bin. inter v al and count [8] and assign and counts [4] deal with the allo cation of tasks to bin. Ho wev er, in b oth cases the semantics in volv e prop erties of the task — suc h as b eing assigned a giv en colour — and determines a single common b ound on the num b er of item that are allo cated to a bins, rather than counting, for each bin, the num b er of elemen ts allo cated to it. The cumula tive constraint [1] enforces that at each p oint in time, the cum ulated heigh t of the set of tasks that ov erlap that p oin t do es not exceed a giv en limit; the inter v al and sum constraint, derived from the previous one, ﬁxes the origins of a collection of tasks in such a wa y that, for all the tasks that are allo cated to the same interv al, the sum of the heigh ts do es not exceed a given capacity . In b oth these constrain ts all interv als ha ve the same size and, once more, these constraints do es not count, for each interv al, the num b er of elements allo cated to it, they enforce instead a common capacity limit. The global cardinality v 1 ,...,v K ( x 1 , . . . , x n ; c 1 , . . . , c m ) constraint [21] requires that, for each j = 1 , . . . , m , decision v ariable c j is equal to the n umber of v ariables x 1 , . . . , x n that are assigned scalar v j . The bin counts constraint represen ts a generalisation of global cardinality in whic h scalar v alues v 1 , . . . , v n are replaced by interv als represen ting bins. A GAC algorithm for the global cardinality constraint, whic h builds up on and generalises the results in [30], w as 17 discussed in [31]. Our GAC approach for bin counts generalises this latter discussion, since the approac h in [31] do es not reduce the domains of the coun t v ariables c j . Finally , bin counts can be used to express generalisations of constraints such as spread [26] and devia tion [38]. These generalisations can b e used to mo del aspects of a distribution that go b eyond moments such as mean and standard deviation. As w e ha ve sho wn with the χ 2 test constrain t, this pav es the w ay to a range of applications in the context of statistical constraints. 9 Conclusions W e discussed the bin counts constraint, which deals with the problem of counting the num b er of decision v ariables in a set whic h are assigned v alues that lie in giv en bins. W e presen ted a decomp osition and a GA C propagation strategy , as well as a decomp osition for a new statistical constrain t — the χ 2 test constraint — based on bin counts . W e discuss three applications of the χ 2 test constrain t: reform ulations for the BACP and the BNWP , as well as an application in conﬁdence interv al analysis. In our computational study we illustrate the enhanced ﬁltering ac hieved by our GA C propagation strategy ov er a constraint decomp osition in the con text of a set of randomly generated instances. This enhanced ﬁltering led to order of magnitude improv ements observ ed for search p erformance — b oth in terms of computational time and num b er of no des explored — in the context of an existing CSPLib test b ed for the BACP . A decomp osition based on the global cardinality constraint led to sup erior p erformance in terms of computational time in the con text of an existing CSPLib test b ed for the BNWP; ho wev er, also in this case, a GA C propagation strategy led to stronger ﬁltering. Finally , we presented an application of the χ 2 test constrain t in the context of a w ell-known problem from the conﬁdence in terv al analysis literature: the problem of determining simultaneous conﬁdence in terv als for the m ultinomial distribution. Although this problem is well-kno wn in the literature, to the b est of our knowledge a declarative approac h based on statistical constraints has never been presented before. References [1] A. Aggoun and N. Beldicean u. Extending c hip in order to solv e complex sc heduling and placemen t problems. Math. Comput. Mo del. , 17(7):57–73, April 1993. ISSN 0895-7177. [2] Ravindra K. Ah uja, James B. Orlin, Cliﬀord Stein, and Rob ert E. T arjan. Impro ved algorithms for bipartite net work ﬂow. SIAM Journal on Computing , 23(5):906–933, Octob er 1994. [3] N. Beldiceanu and E. Contejean. In tro ducing global constraints in CHIP. Mathematic al and Computer Mo del ling , 20(12):97–123, Decem b er 1994. ISSN 08957177. [4] N. Beldiceanu, M. Carlsson, S. Demassey , and T. Petit. Global constrain t catalogue: P ast, presen t and future. Constr aints , 12(1):21–62, Marc h 2007. ISSN 1383-7133. [5] M. La wley C. Mullinax. Assigning patien ts to n urses in neonatal intensiv e care. The Journal of the Op er ational R ese ar ch So ciety , 53(1):25–35, 2002. [6] C. Castro and S. Manzano. V ariable and v alue ordering when solving balanced academic curriculum problems. CoRR , cs.PL/0110007, 2001. URL 0110007 . [7] C.W. Choi, W. Harvey , J.H.M. Lee, and P .J. Stuc key . Finite domain bounds consistency revisited. In A. Sattar and B. Kang, editors, AI 2006: A dvanc es in Artiﬁcial Intel ligenc e , v olume 4304 of LNCS , pages 49–58. Springer, 2006. IS BN 978-3-540-49787-5. [8] X. Cousin. Applic ation of Constr aint L o gic Pr o gr amming on Timetable Pr oblem . PhD thesis, Rennes I Univ ersity , F rance, 1993. [9] J.-G. F ages, G. Chab ert, and C. Prud’Homme. Combining ﬁnite and contin uous solvers. In Pr o c e e dings of the 4th international workshop on T e chniques foR Implementing Constr aint pr o gr amming Systems (TRICS) , 2013. 18 [10] E. C. F reuder. A suﬃcient condition for bac ktrac k-free search. J. A CM , 29(1):24–32, Jan uary 1982. ISSN 0004-5411. [11] I. P . Gen t and T. W alsh. CSPlib: A b enc hmark library for constraints. In J. Jaﬀar, editor, Principles and Pr actic e of Constr aint Pr o gr amming - CP’99 , volume 1713 of L e ctur e Notes in Computer Scienc e , pages 480–481. Springer Berlin Heidelb erg, 1999. [12] R. Z. Gold. T ests auxiliary to χ 2 tests in a mark ov c hain. The A nnals of Mathematic al Statistics , 34(1):56–74, Marc h 1963. [13] L. A. Go o dman. On simultaneous conﬁdence interv als for multinomial proportions. T e chno- metrics , 7(2):247–254, 1965. [14] I. Heller and C. B. T ompkins. An extension of a the or em of Dantzig’s , pages 247–254. Prince- ton Univ ersity Press, 1956. [15] I. Katriel and S. Thiel. F ast b ound consistency for the global cardinalit y constraint. In F rancesca Rossi, editor, Principles and Pr actic e of Constr aint Pr o gr amming - CP 2003 , v olume 2833 of L e ctur e Notes in Computer Scienc e , pages 437–451. Springer Berlin He idelberg, 2003. [16] L. G. Khac hiy an. Polynomial algorithms in linear programming. USSR Computational Math- ematics and Mathematic al Physics , 20(1):53–72, Jan uary 1980. [17] K. Kuchcinski and R. Szymanek. JaCoP Library . User’s Guide. http://www.jacop.eu , 2016. [18] A. K. Mackw orth. Consistency in netw orks of relations. Artiﬁcial Intel ligenc e , 8(1):99–118, F ebruary 1977. [19] R. G. Miller. Simultane ous Statistic al Infer enc e . Springer New Y ork, New Y ork, NY, 1981. [20] J.-N. Monette, P . Schaus, S. Zamp elli, Y. Deville, and P . Dupont. A cp approac h to the balanced academic curriculum problem. In Pr o c e e dings of the Seventh International Workshop on Symmetry and Constr aint Satisfaction Pr oblems (Symc on’07) , 2007. [21] A. Oplob edu, J. Marcovitc h, and Y. T ourbier. Charme: Un langage industriel de program- mation par con traintes, illustr´ e par une application chez renault. In Pr o c e e dings of the Ninth International Workshop on Exp ert Systems and their Applic ations: Gener al Confer enc e , pages 55–70, 1989. [22] F. P achet, P . Ro y , A. P apadop oulos, and J. Sak ellariou. Generating 1/f noise sequences as con- strain t satisfaction: The voss constrain t. In Pr o c e e dings of the 24th International Confer enc e on Artiﬁcial Intel ligenc e , IJCAI’15, pages 2482–2488. AAAI Press, 2015. ISBN 978-1-57735- 738-4. [23] K. P earson. On the criterion that a given system of deviations from the probable in the case of a correlated system of v ariables is such that it can b e reasonably supp osed to hav e arisen from random sampling. Philosophic al Magazine Series 5 , 50(302):157–175, 1900. [24] G. P esant. Achieving domain consistency and counting solutions for disp ersion constraints. INF ORMS Journal on Computing , 27(4):690–703, No vem b er 2015. [25] G. Pesan t. Balancing nursing w orkload by constraint programming. In C.-G. Quimp er, ed- itor, Inte gr ation of AI and OR T e chniques in Constr aint Pr o gr amming: 13th International Confer enc e, CP AIOR 2016, Banﬀ, AB, Canada, May 29 - June 1, 2016, Pr o c e e dings , pages 294–302. Springer In ternational Publishing, 2016. [26] G. Pesan t and J.-C. R´ egin. SPREAD: A balancing constraint based on statistics. In P . v an Beek, editor, Principles and Pr actic e of Constr aint Pr o gr amming - CP 2005 , v olume 3709 of L e ctur e Notes in Computer Scienc e , pages 460–474. Springer Berlin Heidelberg, 2005. [27] S. D. Prestwic h, R. Rossi, and S. A. T arim. Randomness as a constraint. In Gilles Pesan t, editor, Principles and Pr actic e of Constr aint Pr o gr amming , volume 9255 of L e ctur e Notes in Computer Scienc e , pages 351–366. Springer International Publishing, 2015. 19 [28] Charles Prud’homme, Jean-Guillaume F ages, and Xa vier Lorca. Cho c o Do cumentation . T ASC, INRIA Rennes, LINA CNRS UMR 6241, COSLING S.A.S., 2016. URL http: //www.choco- solver.org . [29] C. P . Quesenberry and D. C. Hurst. Large sample sim ultaneous conﬁdence interv als for m ultinomial prop ortions. T e chnometrics , 6(2):191–195, May 1964. [30] J.-C. R´ egin. A ﬁltering algorithm for constraints of diﬀerence in csps. In Pr o c e e dings of the Twelfth AAAI National Confer enc e on A rtiﬁcial Intel ligenc e , AAAI’94, pages 362–367. AAAI Press, 1994. [31] J.-C. R ´ egin. Generalized arc consistency for global cardinality constrain t. In Pr o c e e dings of the Thirte enth National Confer enc e on Artiﬁcial Intel ligenc e - V olume 1 , AAAI’96, pages 209–215. AAAI Press, 1996. ISBN 0-262-51091-X. [32] J.-C R ´ egin. Glob al Constr aints and Filtering A lgorithms . in Constrain ts and In teger Program- ming Com bined, Kluw er, M. Milano editor, 2003. [33] S. Riv aud, F. P achet, and P . Roy . Sampling marko v mo dels under binary equality constraints is hard. T echnical rep ort, Sony CSL, Paris, 2016. [34] F. Rossi, P . v an Beek, and T. W alsh. Handb o ok of Constr aint Pr o gr amming . Elsevier Science Inc., New Y ork, NY, USA, 2006. ISBN 0444527265. [35] R. Rossi, S. Prest wich, and S. A. T arim. Statistical constraints. In Pr o c e e dings of the 21st biennial Eur op e an Confer enc e on A rtiﬁcial Intel ligenc e, ECAI 2014 , volume 263 of F r ontiers in Artiﬁcial Intel ligenc e and Applic ations , pages 777–782. IOS Press, 2014. [36] P . Sc haus. Solving b alancing and bin-p acking pr oblems with c onstr aint pr o gr amming . PhD thesis, Univ ersite catholiqu´ e de Louv aine, Belgium, 2009. [37] P . Sc haus and J.-C. R´ egin. Bound-consistent spread constrain t. EURO Journal on Computa- tional Optimization , 2(3):123–146, 2014. [38] P . Sc haus, Y. Deville, P . Dupont, and J.-C. R´ egin. The deviation constrain t. In P . V an Henten- ryc k and L. W olsey , editors, Inte gr ation of AI and OR T e chniques in Constr aint Pr o gr amming for Combinatorial Optimization Pr oblems , v olume 4510 of L e ctur e Notes in Computer Scienc e , pages 260–274. Springer Berlin Heidelberg, 2007. [39] P . Schaus, P . v an Hentenryc k, and J.-C. R´ egin. Scalable load balancing in nurse to patien t assignmen t problems. In Pr o c e e dings of the 6th International Confer enc e on Inte gr ation of AI and OR T e chniques in Constr aint Pr o gr amming for Combinatorial Optimization Pr oblems , CP AIOR ’09, pages 248–262, Berlin, Heidelb erg, 2009. Springer-V erlag. ISBN 978-3-642- 01928-9. [40] A. Sc hrijver. Combinatorial Optimization (3 volume, A,B, & C) . Springer, 1 edition, F ebruary 2003. [41] K. T ruemp er. Matr oid De c omp osition . Academic Press Inc, 1992. ISBN 0127012257. [42] H. W ang. Exact conﬁdence co eﬃcients of simultaneous conﬁdence interv als for m ultinomial prop ortions. Journal of Multivariate Analysis , 99(5):896–911, May 2008. 20

The BIN_COUNTS Constraint: Filtering and Applications

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment