Mixture-Model Preference Learning for Many-Objective Bayesian Optimization

Mixtur e-Model Pr efer ence Learning f or Many-Objectiv e Bayesian Optimization Manisha Dubey * 1,2 Sebastiaan De Peuter † 3,4 W anr ong W ang 1 Samuel Kaski 1,3,5 1 Department of Computer Science, Univ ersity of Manchester, Manchester , UK 2 Centre for AI in Assistiv e Autonomy , Univ ersity of Edinbur gh, Scotland, UK 3 Department of Computer Science, Aalto Univ ersity , Espoo, Finland 4 Informatics Institute, Univ ersity of Amsterdam, Amsterdam, Netherlands 5 ELLIS Institute Finland, Helsinki, Finland, Abstract Preference-based many-objecti ve optimization faces two obstacles: an expanding space of trade- offs and heterogeneous, context-dependent hu- man v alue structures. T ow ards this, we propose a Bayesian frame work that learns a small set of latent preference archetypes rather than assuming a single ﬁxed utility function, modelling them as components of a Dirichlet-process mixture with uncertainty ov er both archetypes and their weights. T o query efﬁciently , we designing hybrid queries that target information about (i) mode identity and (ii) within-mode trade-offs. Under mild assump- tions, we provide a simple regret guarantee for the resulting mixture-aware Bayesian optimization procedure. Empirically , our method outperforms standard baselines on synthetic and real-world many-objecti ve benchmarks, and mixture-aware diagnostics re veal structure that re gret alone fails to capture. 1 INTR ODUCTION Real-world problems in volving design of materials and adi- tiv e manufacturing Hastings et al. (2025); Myung et al. (2025), power systems (in volving economic-emission dis- patch) Ah King et al. (2005), robotics and autonomous driving Li et al. (2023); Shen et al. (2025) are often as- sociated with optimizing se veral objecti ves, a setting kno wn as multi-objecti ve optimization. Multi-objecti ve optimiza- tion Deb et al. (2016) addresses problems in which sev- eral performance criteria must be optimized simultaneously . In realistic applications these criteria are often mutually * This work was carried out while the author was at Univ ersity of Manchester † This work was carried out while the author was at Aalto Univ ersity conﬂicting, so a single design cannot satisfy all goals at once. The aim therefore shifts from ﬁnding one “best” so- lution to characterizing and selecting among trade-of fs on the Pareto frontier—those designs for which improving one objectiv e would necessarily degrade at least one other . For many applications the number of objecti ves gro ws into the many-objecti ve regime (often L ≥ 4 ), which implies that the problem changes qualitativ ely . For instance, urban design framew orks include optimizing performance objectiv es like operational energy , comfort, daylight, cost and carbon Liu et al. (2023). Similarly , multi-objectiv e recommenders in- clude optimizing user utility , div ersity , f airness, rev enue and long-term value Jannach and Abdollahpouri (2023). It has been established in the literature that the complexity of computing the Pareto front dramatically increases when the number of objectiv es increase to four or more Ishibuchi et al. (2008); Binois et al. (2020). Dominance relations weaken, many candidates are mutually non-dominating, so selection pressure diminishes and exploration requires sub- stantially more ev aluations to achiev e comparable coverage. Summarization is also harder; be yond a fe w objectives, vi- sualizing trade-of fs, communicating them to stak eholders, and choosing a manageable set of representati ves become nontrivial. Established indicators (e.g., hyperv olume Zhang and Golovin (2020)) are costly to compute and less dis- criminativ e in higher dimensions, further complicating al- gorithm design and ev aluation. The burden shifts to both cognition and computation. For decision makers, comparing high-dimensional outcome vectors quickly e xceeds human working-memory limits; judgments become noisy and in- consistent, especially when objectiv es ha ve incommensurate units or span dif ferent risk horizons. In practice, people of- ten default to heuristics while attending to a few salient objecti ves or applying non-compensatory rules. Hence, pref- erences expressed on the full objecti ve set can be unstable and context dependent. Recent many-objectiv e BO methods often collapse deci- sions into a single compromise solution or reduce dimen- sionality by removing redundant objecti ves Binois et al. (2020); Lin et al. (2024); Martín and Garrido-Merchán (2021). These approaches assume calibrated objectiv es and a single global trade-of f. In practice, howe ver , decision mak- ing frequently exhibits multiple regimes (e.g., safety-ﬁrst, cost-ﬁrst, balanced), implying piecewise or multi-modal utilities rather than a single smooth scalarization. Many objectiv es also induce non-compensatory or conditional priorities that ﬁxed scalarizations misrepresent. These lim- itations motiv ate methods that actively elicit preferences, model multiple trade-off modes, and focus sampling on decision-relev ant regions of objecti ve space. W e build on this observ ation and propose an interacti ve, preference-based approach that treats heterogeneity as struc- ture rather than noise. Instead of collapsing everything to one utility , we represent preferences as a mixture of archety- pal trade-of f modes. Each mode captures a distinct w ay of valuing objecti ves and is e xplicitly context-aware. A mode is represented by a weight vector w k ∈ ∆ L − 1 that deﬁnes a Chebyshe v utility ov er outcomes y = f ( x ) . Under this view , preference elicitation naturally decomposes into tw o ques- tions: which mode is currently activ e (identity) and how that mode trades objectives (shape). This decomposition directly informs query design. W e introduce inter–mode queries that maximize mutual information about the acti ve mode, intra–mode queries that reﬁne the trade–off within a selected mode, and a hybrid polic y that balances the two. Coupled with standard GP surrogates for the objectives, the procedure concentrates sampling where the decision-maker (DM) cares, learns a calibrated posterior over modes, and remains robust when users persist in a giv en style or switch across contexts. W e sho w that treating user preferences as a mixture of archetypes and asking questions that are ex- plicitly informative about that mixture, yields faster and more calibrated preference-driv en many-objecti ve Bayesian optimization, specially when real users exhibit persistent modes, which is true for real humans. Our contributions can be summarized as follows: • W e formulate preference–based many–objecti ve Bayesian optimization with a Dirichlet–process mix- ture ov er Chebyshe v weights, abandoning the sin- gle–utility assumption and enabling multiple, distinct trade–off archetypes to co–e xist. • W e propose information-theoretic cluster-a ware meth- ods for activ e query selection in interactiv e preference learning. W e derive inter , intra, and hybrid query rules that explicitly tar get information about mode identity versus within–mode shape, turning preference elicita- tion into learning both the identity and the shape of archetypes. • Beyond simple regret, we introduce diagnostics for mixture-aware e valuation like mode co verage, calibra- tion of mixture weights that regret alone can hide. • Our experiments on simulated and real-world datasets show the effecti veness of the proposed method with respect to proposed mixture-aware diagnostics. 2 PR OBLEM FORMULA TION W e consider many-objecti ve Bayesian optimization (MaO- BO) with L ≥ 4 objectiv es. Let X ⊂ R d be a design space and f : X → R L an unkno wn vector -valued objecti ve. Querying a design x ∈ X returns a (possibly noisy) observ a- tion y = f ( x ) + ϵ where ϵ ∼ N (0 , σ 2 I ) and ev aluations are assumed expensi ve. Many-objecti ve problems typically ad- mit a large set of Pareto-optimal solutions. In human-in-the- loop settings, ho we ver , the goal is not to reco ver the entire Pareto front, but to identify solutions that are most desirable to a decision maker (DM). Direct ev aluations of the DM’ s utility are una v ailable. Instead, the learner may query pair- wise preferences between previously observed outcomes. Giv en two outcomes y i and y ′ i , the decision maker returns a binary comparison indicating whether y i ≻ y ′ i . Further, we assume that the DM e v aluates outcomes through an un- known utility function. T o model heterogeneous trade-of fs, we posit that the DM’ s utility is governed by one of K latent preference modes. Each mode k ∈ [ K ] is parame- terized by a weight vector w k ∈ ∆ L − 1 , and modes occur with unknown mixture weights η ∈ ∆ K − 1 . This formula- tion captures either a single DM with context-dependent trade-offs or a heterogeneous population of DMs. W e de- note the scalar utility induced by mode k as U ( f ( x ); w k ) . Our objectiv e is to identify designs that maximize the DM’ s mixture-expected utility , x ⋆ ∈ arg max x ∈X K X k =1 η k E [ U ( f ( x ); w k )] , (1) where the expectation is tak en ov er observ ation noise. The learner must therefore infer the latent mixture parameters { η k , w k } K k =1 from pairwise feedback while sequentially selecting designs to ev aluate. 3 PR OPOSED METHODOLOGY In this section, we introduce a preference-aware multi- objecti ve Bayesian optimization framew ork that jointly mod- els uncertainty ov er objecti ve functions and heterogeneous decision-maker trade-offs. The approach combines inde- pendent Gaussian process surrogates with a latent mixture model over preference scalarizations. W e describe posterior inference for the mixture parameters and present acquisition strategies that select both a new design point and an informa- tiv e pairwise comparison, balancing objectiv e improvement with reduction in preference uncertainty . W e also present the algorithm 1 here. 3.1 OBJECTIVE SURR OGA TE MODEL W e model each objectiv e with an independent Gaussian pro- cess (GP). Let f : X → R L denote the L objectiv es. Each objecti ve function f ℓ is modeled independently with a Gaus- sian process: f ℓ ∼ G P ( m ℓ , k ℓ ) , ℓ = 1 , . . . , L. Condition- ing on observed data yields predicti ve means and v ariances. Giv en observations D f = { ( x i , y i ) } , we obtain posterior predictiv e distributions f ℓ ( x ) | D f ∼ N ( µ ℓ ( x ) , σ 2 ℓ ( x )) . Independence across objecti ves keeps inference and acquisi- tion scalable in L Rasmussen and Nickisch (2010); W illiams and Rasmussen (2006). One could also employ multi-output gaussian process Alv arez and Lawrence (2008). 3.2 LA TENT MIXTURE PREFERENCE MODEL W e assume the decision maker’ s (DM’ s) latent utility is not captured by a single scalarization Ozaki et al. (2023); Astudillo and Frazier (2020) b ut by a mixture o ver K pr ef- er ence modes . W e model heterogeneous trade-of fs through a ﬁnite mixture of K latent preference modes. Each mode k is associated with a weight vector w k ∈ ∆ L − 1 . For a predicted outcome y , the utility under mode k is U ( y ; w k ) = − min ℓ =1 ,...,L y ℓ w kℓ , w kℓ > 0 (2) corresponding to Chebyshev scalarization for minimiza- tion. Utilities are e valuated using posterior predictive means of the GP model. W e model preferences as arising from a latent mixture of trade-of f modes. The mixture propor- tions o ver modes are drawn using a truncated stick-breaking construction Blei and Jordan (2006). The truncated stick- breaking construction provides a ﬂexible and regularized alternativ e to ﬁxing the exact number of archetypes a priori. Speciﬁcally , for k = 1 , . . . , K , v k ∼ Beta(1 , α ) , η k = v k Q j 0 as a concentration parame- ter controlling the e xpected number of acti ve modes. This yields mixture weights η ∈ ∆ K − 1 . For each mode k , a preference weight vector is dra wn w k ∼ Dir( β ) deﬁning a distinct trade-of f ov er the L objectiv es. For each observed comparison i between outcomes ( y i , y ′ i ) , a latent mode as- signment is ﬁrst sampled z i ∼ Categorical( η ) , indicating which trade-of f mode governs that comparison. Conditioned on z i = k , the observed preference is generated according to a probit model, P ( y i ≻ y ′ i | z i = k ) = Φ  U ( y i ; w k ) − U ( y ′ i ; w k ) √ 2 σ u  . (3) Then we marginalize latent assignments as p ( D pref | η , w 1: K ) = N Y i =1 K X k =1 η k Φ  U ( y i ; w k ) − U ( y ′ i ; w k ) √ 2 σ u  (4) Posterior inference over ( η , w 1: K ) is performed using stochastic variational inference, optimizing an e vidence lower bound on the mar ginal likelihood. 3.3 INFERENCE Let D pref = { ( y i , y ′ i ) } N i =1 denote the set of observ ed pref- erence comparisons. Under the generati ve model, the joint distribution is p ( D pref , z , η , w 1: K , α ) (5) = p ( α ) K Y k =1 p ( v k | α ) p ( w k ) N Y i =1 p ( z i | η ) p ( y i ≻ y ′ i | z i , w 1: K ) Our goal is to compute the posterior p ( η , w 1: K , z , α | D pref ) which is analytically intractable due to the mixture structure and the non-conjugate probit likelihood. W e there- fore introduce a mean-ﬁeld variational distrib ution q ( η , w 1: K , z , α ) = q ( α ) K Y k =1 q ( w k ) q ( v k ) N Y i =1 q ( z i ) (6) and approximate the true posterior by minimizing the KL div ergence. Equi valently , we maximize the evidence lo wer bound (ELBO) L ( q ) = E q [log p ( D pref , z , η , w 1: K , α )] − E q [log q ( η , w 1: K , z , α )] Optimization is performed using stochastic variational inference. 3.4 A CQUISITION FUNCTION At iteration t , the learner maintains a posterior over both the objective functions and the latent preference parame- ters. And let { y i } t i =1 be the observed objectiv e vectors. Let θ = ( η , w 1: K ) denote the latent mixture parameters, and let p ( θ | D t ) be their posterior under the preference model. The GP surrogate induces a posterior p ( f ( x ) | D t ) ov er objectiv e values at any candidate x . For a given out- come y and parameters θ , the mixture utility is deﬁned as U mix ( y ; θ ) = P K k =1 η k U ( y ; w k ) , which represents the ex- pected utility under the latent mixture of trade-off modes. T o select the next design, we employ a mixture expected improv ement (mixture-EI) criterion: α EI − mix ( x ) = E f ( x ) , θ [max { U mix ( f ( x ); θ ) − U best , 0 } ] . (7) where U best is the current best mixture utility as U best = max i ≤ t E θ ∼ p ( θ |D t ) [ U mix ( y i ; θ )] . Please note that the ex- pectation is taken jointly over the GP posterior p ( f ( x ) | D t ) which captures uncertainty about objecti ve v alues, and the preference posterior p ( θ | D t ) , which captures uncertainty about DM trade-offs. This reﬂects the best design identiﬁed so far under posterior uncertainty about DM preferences. W e approximate it using Monte Carlo sampling where sam- ples of f ( x ) are drawn from the GP posterior , and samples of θ are drawn from the v ariational posterior . 3.5 PREFERENCE QUER Y A CQUISITION In addition to e v aluating ne w designs, the learner may select a pairwise comparison between two previously observed outcomes ( y i , y j ) . Let r ∈ { 0 , 1 } denote the random re- Algorithm 1 M I X T U R E - B A S E D M A O B O Require: f : X → R L , truncation K , query mode 1: Initialize dataset D f and preference set D pref 2: for t = 1 , 2 , . . . do 3: Fit L independent GPs on D f 4: Update DP-mixture posterior via SVI to obtain ( ˆ η , { ˆ w k } K k =1 ) 5: x t ← arg max x ∈X EI mix ( x ; ˆ η , { ˆ w k } ) (using Eq. 7) 6: Evaluate y t = f ( x t ) and update D f 7: ( y a , y b ) ← S E L E C T P A I R ( { y i } , ˆ η , { ˆ w k } , mode ) (using Section 3.5) 8: Query for preference label y pref ≻ y other 9: Set D pref ← D pref ∪ { ( y pref , y rej ) } 10: end for sponse of the decision maker (DM), where r = 1 indicates y i ≻ y j . Under the current posterior over latent prefer- ence parameters θ = ( η , w 1: K ) , the predicti ve probability of the response is p ( r = 1 | y i , y j , D t ) = R p ( r = 1 | y i , y j , θ ) p ( θ | D t ) dθ . The DM’ s preference r updates the posterior p ( θ | D t ) , which in turn inﬂuences future design decisions. The value of a preference query can therefore be interpreted as its expected reduction in uncertainty about θ , and hence its expected improvement in do wnstream decision quality . Computing the full v alue of information exactly is intractable, as it would require recomputing the posterior and resolving the design optimization for each possible response. Inspired by Ozaki et al. (2023), we follow the Bayesian Activ e Learning by Disagreement (B ALD) frame- work Houlsby et al. (2011). W e adopt a B ALD-style crite- rion because it directly targets expected reduction in pos- terior uncertainty ov er latent preference parameters. Other information-theoretic criteria (e.g., predictiv e entropy) could be substituted within the same framework. W e approximate the value of a query by the mutual information between the response r and the latent parameters θ : α pref ( i, j ) = I ( r ; θ | y i , y j , D t ) (8) Using the entropy decomposition of mutual information, this can be written as I ( r ; θ ) = H [ r ] − E θ ∼ p ( θ |D t ) [ H [ r | θ ]] (9) where H [ · ] denotes Bernoulli entropy . The ﬁrst term mea- sures ov erall predicti ve uncertainty , while the second term measures expected uncertainty under a ﬁx ed parameter set- ting. Their difference quantiﬁes how much the response would reduce posterior uncertainty . The conditional prefer- ence probability giv en θ is p ( r = 1 | θ ) = K X k =1 η k Φ  U ( y i ; w k ) − U ( y j ; w k ) √ 2 σ u  The marginal predicti ve probability is obtained by inte grat- ing ov er the posterior p ( θ | D t ) , which we approximate via Monte Carlo sampling. In our mixture model, the queries naturally split into MI about the mode identity (inter-cluster), MI about the within-mode weights (intra-cluster) and a con- ve x combination gi ves the hybrid polic y . Inter -cluster query In the mixture preference model, un- certainty arises both from which trade-of f mode governs the DM and from the weights within each mode. The inter- cluster acquisition speciﬁcally targets uncertainty about the latent mode identity . Let K denote the discrete latent mode variable and r ∈ { 0 , 1 } the response to a comparison ( y i , y j ) . W e deﬁne the inter-cluster acquisition as the mu- tual information between the response and the mode identity , which can be further decomposed as: I ( r ; K ) = H [ r ] − E K [ H [ r | K ]] . (10) The entropy form becomes I ( r ; K ) = H [ p mix ] − K X k =1 η k H [ p k ] . (11) where mixture predictiv e is p mix = P K k =1 η k p k . This quan- tity is large when the mixture predicti ve is uncertain (high H [ p mix ] ) but each indi vidual cluster prediction is conﬁdent. Intra-mode query While the inter-mode criterion targets uncertainty ov er the acti ve mode, the intra-mode criterion targets uncertainty ov er the trade-of f weights within a partic- ular mode. For a candidate comparison between two pre vi- ously observed outcomes ( y i , y j ) with response r ∈ { 0 , 1 } and target mode c ∈ [ K ] ,we deﬁne the intra-mode acquisi- tion as the mutual information between r and the mode- c weight vector: A ( c ) intra ( i, j ) = I ( r ; w c | K = c, y i , y j , D t ) . (12) Using the B ALD entropy decomposition, we obtain A ( c ) intra ( i, j ) = H ( ¯ p c ) − E w ∼ q t ( w c ) [ H ( p w )] where ¯ p c = E w ∼ q t ( w c ) [ p w ] and q t ( w c ) denotes the cur- rent approximate posterior (variational factor) ov er w c after observing D t . Hybrid query Here, we combine the two goals of ex- ploring between the clusters which helps to ﬁnd out which preference cluster (archetype) user belongs to and e xploring within the clusters which helps to reﬁne archetype’ s weight vector . W e therefore consider a con vex combination: A hybrid = λA inter + (1 − λ ) A intra , λ ∈ [0 , 1] . (13) The hyperparameter λ controls the exploration trade-of f between mode disambiguation and weight reﬁnement. 4 THEORETICAL PROOFS Our analysis builds on standard high-probability concen- tration bounds for GP regression Sriniv as et al. (2010); Kandasamy et al. (2016); A V et al. (2022), which hold uni- formly ov er any sequence of e valuation points and are there- fore algorithm-agnostic. W e combine these bounds with Lipschitz properties of the Chebyshe v utility to translate objectiv e-level uncertainty into mixture-utility error . Since our acquisition differs from GP-UCB, the resulting result is a decomposition-based simple regret bound rather than a classical UCB-style regret guarantee. Theorem 1 (Simple regret decomposition) . Let X be com- pact and let f = ( f 1 , . . . , f L ) : X → R L be an L -objective function. Assume: (A1) Each objective f ℓ lies in an RKHS H ℓ with kernel k ℓ and ∥ f ℓ ∥ H ℓ ≤ B f . (A2) F or all x ∈ X , ∥ f ( x ) ∥ ∞ ≤ B y . (A3) Pr eference weights satisfy w kℓ ≥ c w > 0 for all modes k and objectives ℓ . (A4) Observations are corrupted by independent σ 2 –sub- Gaussian noise. Deﬁne the Chebyshev utility for minimization U ( y ; w ) = − min 1 ≤ ℓ ≤ L y ℓ w ℓ , and the mixtur e utility U mix ( y ; η, { w k } ) = K X k =1 η k U ( y ; w k ) . Let ( η ⋆ , w ⋆ k ) denote the true mixtur e parameter s and deﬁne U ⋆ = sup x ∈X U mix ( f ( x ); η ⋆ , { w ⋆ k } ) . After T evaluations x 1 , . . . , x T , deﬁne the simple r egr et R T = U ⋆ − max t ≤ T U mix ( f ( x t ); η ⋆ , { w ⋆ k } ) . Then, with pr obability at least 1 − δ , R T ≤ C 1 r β T γ T T + C 2 ∆ w ( T ) + C 3 ∆ η ( T ) + ε T , wher e γ T is the maximal GP information gain, β T is the standar d GP conﬁdence parameter , ∆ w ( T ) = max k ∥ ˆ w k − w ⋆ k ∥ 1 , ∆ η ( T ) = ∥ ˆ η − η ⋆ ∥ 1 , ε T is the optimization err or in maximizing the acquisition, C 1 , C 2 , C 3 depend only on B y and c w . The detailed proof is provided in the supplementary . 5 RELA TED WORK While there are sev eral recent efforts along the direction of multi-objectiv e Bayesian optimization Li et al. (2025); Haddadnia et al. (2025); Hung et al. (2025), a line of works integrate human-in-the-loop for Bayesian optimization Xu et al. (2024); A V et al. (2022). T ow ards interactiv e multi- objecti ve optimization, Ozaki et al. (2024) proposes a frame- work that optimizes multiple objecti ves by acti vely querying the decision-maker in order to steer Bayesian optimization to wards user’ s preferred trade-of fs. There are limited ef forts in the direction of man y-objectov e optimization. Classical many-objecti ve Bayesian optimization increasingly av oids recov ering large P areto sets and instead targets a principled single compromise. The Kalai-Smorodinsky (KS) solution equalizes beneﬁt ratios from a disagreement point to utopia; subsequent work further introduces a copula-in variant v ari- ant (CKS) to remove sensitivity to monotone rescalings, together with a GP-based stepwise-uncertainty-reduction (SUR) scheme that scales to 4-9 objectives Binois et al. (2020). These methods elegantly collapse decision making to a point solution, but the y assume access to full objecti ve vectors (rather than comparisons) and a single underlying taste of utility . A complementary line seeks a small set of solutions that collaborativ ely “cover” many objectives. Recent work proposes Tchebycheff-set (TCH-Set) scalar- ization and a smooth variant (STCH-Set) that mitigates the non-smooth max operator , with theory and empirical stud- ies showing that a handful of solutions can handle tens of objectiv es effecti vely Lin et al. (2024). Such approaches reduce do wnstream decision complexity , yet still optimize from full objective feedback and typically posit a single, ﬁxed scalarization rather than heterogeneous preferences. Orthogonal to both is objecti ve reduction where detecting re- dundant objectiv es via similarity between GP predictiv e dis- tributions and dropping them to sav e ev aluations. This can preserve solution quality on toy , synthetic, and real setups while cutting cost Martín and Garrido-Merchán (2021), but it maintains a single user model and does not reason about preference heterogeneity . Relative to single–compromise and fe w–solution paradigms Binois et al. (2020); Lin et al. (2024), our framew ork operates directly on preferences and embraces heterogeneity as signal, not noise. Relativ e to ob- jectiv e–reduction Martín and Garrido-Merchán (2021), we keep all objecti ves but learn which trade–of f modes matter and when, deliv ering interpretable decisions under many objectiv es. 6 EXPERIMENTS 6.1 D A T ASETS W e ev aluate on standard multi-objective benchmarks, DTLZ2 Deb et al. (2005) and WFG9 Huband et al. (2005). DTLZ2 uses L = 6 objectiv es and d = 7 decision vari- ables with search space [0 , 1] 7 . WFG9 uses L = 8 ob- jectiv es and d = 34 variables with position parameter k = 2( L − 1) = 14 and distance parameter l = 20 . All variables are scaled to [0 , 1] and objecti ves are minimized. WFG9 introduces bias and mix ed separability , pro viding a challenging testbed for query policies. T o study heteroge- neous preferences, we simulate K latent trade-off modes. Each mode k has weights w k ∈ ∆ L − 1 deﬁning a Cheby- shev utility , and pairwise comparisons are generated via a probit likelihood with noise σ u . For DTLZ ( L = 6 ), objec- tiv es are partitioned into three groups; for WFG ( L = 8 ), (a) DTLZ2 (b) WFG (c) Chemistry (d) Clusterless (e) Intra (f) Inter (g) Hybrid Figure 1: (T op) Mixture-aware policies reduce regret faster and achiev e lower ﬁnal regret than unimodal and random- scalarization baselines, with Hybrid performing best o verall. This is visible from the steeper early decline and lo wer terminal curves of Inter and Hybrid across DTLZ2, WFG, and PET , while Clusterless and scalarization methods plateau higher with greater v ariability . Curves sho w mean simple regret ov er three independent runs; vertical bars denote ± one standard error .(Bottom)Hybrid most accurately and stably recov ers true archetypes, sho wing the largest early drop and lo west ﬁnal error with the narro west band; Inter drops quickly but plateaus, Intra reﬁnes slo wly with higher variance, and Clusterless remains ﬂat. W e report mean aligned L1 error per outer iteration (band = per-true min–max; lower is better). Dataset: DTLZ2, persistent context. into four groups. Each archetype assigns 80% weight to its dominant group and distributes the remaining 20% across other objectiv es, yielding distinct but o verlapping trade-of f proﬁles. At each comparison t , a latent mode Z t gov erns preferences. Under the iid regime, Z t ∼ Categorical( η ⋆ ) . Whereas under the persistent regime, Z t follows a sticky process with persistence ρ , remaining unchanged with prob- ability ρ and resampled otherwise. W e further ev aluate the proposed framework on a real-world, chemistry-based multi-objectiv e process design benchmark for polyethylene terephthalate (PET) production W ang et al. (2022). The dataset comprises 10,000 simulated process designs, each deﬁned by 12 normalized decision v ariables and ev aluated on L = 7 objectiv es: Return on In vestment (R OI) and six Life-Cycle Assessment (LCA) indicators, namely global warming potential (GWP), terrestrial acidi- ﬁcation, water consumption, fossil depletion, surplus ore, and human toxicity . The dataset w as generated using a de- tailed process simulation and LCA workﬂo w and is treated here as a ﬁxed black-box multi-objective benchmark for ev aluating preference-aw are optimization. T o model het- erogeneous stakeholder preferences, we construct K = 3 archetypes reﬂecting economic, en vironmental, and health- oriented priorities, assigning 80% weight to the dominant objecti ve category in each case. T rue mixture weights are set to η = [0 . 40 , 0 . 35 , 0 . 25] , and under the persistent setting the latent archetype e volv es with stickiness parameter ρ = 0 . 8 . Additional e xperiment details detailed in the Supplementary . 6.2 LEARNING AND POLICIES W e adopt Chebyshev scalarization for its ability to repre- sent non-con vex trade-offs and its standard use in many- objectiv e optimization, though the framework can accom- modate alternati ve scalarizations. Each objecti ve is modeled with an independent GP , and preference modes are learned via a truncated stick-breaking mixture using SVI with priors v k ∼ Beta(1 , α ) and w k ∼ Diric hlet( β ) . Although we ﬁx a truncation lev el K , the stick-breaking prior shrinks redun- dant components to ward negligible mass, so effecti ve modes correspond to components with non-trivial posterior weight. The hybrid parameter λ is treated as a hyperparameter and selected via validation. Baselines W e compare against: 1) Multi-Attr-EI Astudillo and Frazier (2020), which learns a single preference vector via MAP; 2) MOBO-RS : random scalarization MOBO Paria et al. (2020); and 3) EI-FP EI with ﬁxed known preference. 4) MaOBO-WS : our method with weighted-sum scalariza- tion instead of Chebyshev . In terms of query policy base- lines, we compare four pair-selection criteria: 1) Random : (a) Clusterless (b) Hybrid Figure 2: Mixture weight trajectory for PET production pro- cess. Clusterless (left) collapses to a dominant component, while Hybrid (right) quickly identiﬁes and stabilizes near the correct mode proportions. selects uniform pair among previously e valuated outcomes. 2) Clusterless : This treat the preference model as unimodal using the mixture-mean weight ˆ w = P k η k ¯ w k and score H ( ˆ p ) where ˆ p = Φ  ( U ( y ; ˆ w ) − U ( y ′ ; ˆ w )) / ( √ 2 σ u )  . 3) Inter (Mode Identiﬁcation): This score the mutual infor- mation between the comparison label and the latent mode and selects query using Eq. 10 4) Intra (within-mode reﬁne- ment): Selects query by ﬁxing c = arg max k η k and scoring H ( E w ∼ q ( w c ) [ p ( w )]) − E w ∼ q ( w c ) [ H ( p ( w ))] (Eq. 12). 5) Hy- brid : uses con ve x combination λ Inter + (1 − λ ) In tra (Eq. 13). Existing preferential BO methods typically assume a single latent utility function. Random scalarization meth- ods sample trade-of f weights but do not infer heterogeneous preference structure from data. Our setting explicitly models multi-modal, switching trade-offs, which is not addressed by standard unimodal preferential BO or scalarization-based MOBO approaches. This structural dif ference motiv ates our mixture-based formulation. Metrics Across different setting, we see different plots like 1) Simple r egr et over outer iterations as r t = U ∗ − max i< = t U ( f ( x i ) , w true ) . Lower is better 2) Mixtur e- awar e r ecovery of true ar chetypes For each iteration, we align inferred components to ground-truth archetypes w true via Hung arian matching (minimizing L1), and plot the mean of the per-true-archetype L1 errors with a min-max band. 3) Mixtur e-weight trajectory W e plot the posterior mean mix- ture weights ˆ η k ( t ) ov er outer iterations t . Good behavior corresponds to (i) ˆ η k ( t ) con verging to η ⋆ k (after alignment) and (ii) reduced drift/oscillation ov er t . 4) L1 err or tra- jectories per inferr ed component: L1 error trajectories per inferred component: || ¯ w k − w ref || 1 . When mixture ground truth is available, we report errors after Hung arian alignment to true archetypes. This vie w re veals mode collapse, label switching, and whether all components actually learn. 7 RESUL TS AND ANAL YSIS Figure 1 (T op) compares simple regret under the true mix- ture utility across outer iterations. Across benchmarks, the mixture-aware methods achie ve faster re gret reduction and lower ﬁnal regret than unimodal and random-scalarization baselines. On DTLZ2, all methods impro ve rapidly due to the smooth landscape, but Inter and Hybrid con verge more consistently to near-zero re gret with lower v ariance, while Clusterless and ﬁxed-preference baselines lag slightly , in- dicating consistent gains from e xplicit mode identiﬁcation ev en in well-behaved settings. On the more irregular WFG benchmark, the separation is clearer: Hybrid exhibits the steepest early decline and the lowest ﬁnal regret, whereas Clusterless and random scalarization plateau at higher le v- els, suggesting that preference av eraging fails to resolve complex trade-of fs. In the PET process design task, the gap is most pronounced: Hybrid con verges faster and stabilizes at lo wer regret, while unimodal and scalarization baselines con verge more slo wly and with higher v ariance, highlight- ing the beneﬁt of modeling heterogeneous archetypes in structured, multimodal trade-off landscapes. Figure 1 (Bottom) shows archetype reco very for a persistent user on DTLZ2. Clusterless exhibits a ﬂat curve with a wide band, indicating mode a veraging and poor archetype sepa- ration. Intra-queries reduce error within activ e modes but produce unstable o verall alignment, reﬂecte d in persistent variability . Inter-queries yield a sharp early drop by rapidly disambiguating archetypes, follo wed by slower reﬁnement. Hybrid combines both effects: rapid initial mode identiﬁca- tion followed by steady within-mode calibration, achie ving the lowest ﬁnal aligned error and the narro west uncertainty band. Figure 2 shows that under persistent gating with true mix- ture weights (0.40, 0.35, 0.25). For clusterless, ˆ η ( t ) drifts and concentrates mass on a single component (e.g., the dom- inant blue curve), deviating from the true proportions; this indicates mode av eraging and weak mode identiﬁcation. On the contrary , hybrid (right) ˆ η ( t ) the hybrid query polic y re- cov ers three stable preference modes. It reallocates mass rapidly in early iterations (mode identiﬁcation), then stabi- lizes near the true mixture weights. The smallest mode is slightly under-estimated, reﬂecting the dif ﬁculty of learning minority archetypes under ﬁnite preference queries. This suggests the query policy activ ely resolves mode uncertainty and yields better-calibrated mixture weights. Importantly , no spurious dominant cluster emer ges and weight trajectories stabilize after approximately 20 iterations. Figure 3 sho ws error trajectories per inferred component for persistent user for DTLZ2 dataset for clusterless and inter - query modes. In the clusterless plot, we can observe here mode collapse happening evidently . One dominant curve imporov es a bit, and the other two sit high and ﬂat. In the inter-query plot, we can see cross-mode learning happening ﬁrst and then non-monotone adjustments as the algorithm probes identity boundaries. Figure 4 shows results for i.i.d user for DTLZ2 suite. The left plot shows errors of inferred preferences for intra-query . W e can see that the three curves representing mixture-mean, (a) Clusterless (b) Inter Figure 3: L 1 error trajectories per inferred component for a persistent user on DTLZ. After Hungarian alignment (lower is better), Inter queries prev ent mode collapse and rapidly reduce error for at least one archetype, while Clusterless largely collapses, with only one component impro ving and others remaining high (a) Errors of inferred prefer- ences for intra query (b) L1 error trajectory per in- ferred component Figure 4: Inferred preference errors under i.i.d. user (DTLZ2); intra-query reﬁnes the dominant archetype with small ﬂuctuations from mode switches, while inter-query rapidly reduces error for the dominant cluster but learns minority modes more slowly . MAP and expected tend to be close to each other because intra-query keeps asking questions that reﬁne the currently most-likely archetype. Since this is iid context, we can observe small bumps happen whenever the acti ve mode switches. The right plot shows inter-query aligned cluster error versus true archetypes. Here, we can see that the dom- inant group drops quickly , but others decline slo wly becase they are queried less often due to i.i.d conte xt. Comparing this to the persistent context, we can observe that persistent inter-query plot in 3 has step-like plateaus and occasional bumps from identity ﬂips in the alignment. While simple regret measures performance under a ﬁxed scalar utility , it does not reveal how optimization process explores the multi-objectiv e trade-off surface. In mixture- preference settings, the underlying objectiv e is not unimodal: different archetypes emphasize different regions of Pareto front. Consequently , an ef fecti ve algorithm should not only minimize regret b ut also explore multiple trade-of f regions before concentrating near mixture-preferred optimum. T o examine this beha vior , we visualize the distribution of ev al- uated points in objectiv e space relativ e to a reference Pareto front in Fig 5. W e can observe across all projections that Hybrid samples along the efﬁcient frontier and identiﬁes (a) R OI vs GWP (b) R OI vs T oxicity Figure 5: Pareto cov erage in objectiv e space. Grey points- reference Pareto front; colored points - e v aluated solutions; stars - ﬁnal best-utility solution under the true mixture pref- erence. Efﬁcient trade-of fs lie along the grey frontier . Hybrid achiev es broader frontier coverage and identiﬁes ﬁnal solu- tions closest to the mixture-preferred Pareto re gion. ﬁnal solutions closest to the Pareto boundary under the true mixture utility . While all methods generate reasonable trade- offs, mixture modeling results in slightly better localization in the preferred region of objecti ve space, particularly along the R OI–GWP and ROI–T oxicity trade-of fs. Limitations and Future W ork: The proposed method ev aluates the proposed framework in simulated settings and on a real-world process design dataset, but does not include human-subject experiments. Due to time and resource con- straints, validation with real decision makers was beyond the scope of the current study and remains an important direction for future work. W e also plan to test the method on a broader range of real-world multi-objectiv e datasets to assess generalization across domains. Finally , e xtending the framework to more adaptiv e mixture structures could further improv e ﬂexibility and practical applicability . 8 CONCLUSION W e address a central challenge in preference-based many- objecti ve optimization: large trade-of f spaces combined with heterogeneous, context-dependent human priorities. Rather than assuming a single scalarization, we model preferences as a small set of latent archetypes with a Dirichlet-process mixture over their weights. This mixture formulation en- ables information-efﬁcient querying through inter-mode queries that identify the activ e archetype and intra-mode queries that reﬁne within-mode trade-offs. Under mild as- sumptions, we establish a simple-regret guarantee for the resulting mixture-aw are Bayesian optimization procedure. Across synthetic and real-world benchmarks, the method consistently reduces regret faster than cluster-agnostic or single-utility baselines, while mixture-aware diagnostics ex- pose failure modes such as mode collapse and miscalibration that are not visible from regret alone. These results support mixture-aware preference learning as a scalable and inter- pretable approach to many-objecti ve Bayesian optimization. References Robert TF Ah King, Harry CS Rughooputh, and Kalyan- moy Deb. Evolutionary multi-objectiv e en vironmen- tal/economic dispatch: Stochastic versus deterministic approaches. In International confer ence on evolutionary multi-criterion optimization , pages 677–691. Springer, 2005. Mauricio Alvarez and Neil Lawrence. Sparse conv olved gaussian processes for multi-output re gression. Advances in neural information pr ocessing systems , 21, 2008. Raul Astudillo and Peter Frazier . Multi-attribute bayesian optimization with interactiv e preference learning. In Inter- national Confer ence on Artiﬁcial Intelligence and Statis- tics , pages 4496–4507. PMLR, 2020. Arun Kumar A V , Santu Rana, Alistair Shilton, and Svetha V enkatesh. Human-ai collaborativ e bayesian optimisation. Advances in neur al information pr ocessing systems , 35: 16233–16245, 2022. Mickaël Binois, V ictor Picheny , Patrick T aillandier, and Abderrahmane Habbal. The kalai-smorodinsky solution for many-objecti ve bayesian optimization. J ournal of Machine Learning Resear ch , 21(150):1–42, 2020. David M Blei and Michael I Jordan. V ariational inference for dirichlet process mixtures. Bayesian analysis , 1(1), 2006. Kalyanmoy Deb, Lothar Thiele, Marco Laumanns, and Eckart Zitzler . Scalable test problems for ev olutionary multiobjectiv e optimization. In Evolutionary multiobjec- tive optimization: theor etical advances and applications , pages 105–145. Springer , 2005. Kalyanmoy Deb, Karthik Sindhya, and Jussi Hakanen. Multi-objecti ve optimization. In Decision sciences , pages 161–200. CRC Press, 2016. Mohammad Haddadnia, Leonie Grashoff, and Felix Strieth- Kalthoff. Botier: multi-objecti ve bayesian optimization with tiered objectiv e structures. Digital Discovery , 2025. T revor Hastings, Mrinalini Mulukutla, Danial Khatamsaz, Daniel Salas, W enle Xu, Daniel Lewis, Nicole Person, Matthew Skokan, Braden Miller , James Paramore, et al. Accelerated multi-objective alloy discov ery through ef- ﬁcient bayesian methods: Application to the fcc high entropy alloy space. Acta Materialia , page 121173, 2025. Neil Houlsby , Ferenc Huszár , Zoubin Ghahramani, and Máté Lengyel. Bayesian activ e learning for classiﬁcation and preference learning. arXiv pr eprint arXiv:1112.5745 , 2011. Simon Huband, Luigi Barone, L yndon While, and Phil Hingston. A scalable multi-objectiv e test problem toolkit. In International Confer ence on Evolutionary Multi-Criterion Optimization , pages 280–295. Springer , 2005. Y u-Heng Hung, Kai-Jie Lin, Y u-Heng Lin, Chien-Y i W ang, Cheng Sun, and Ping-Chun Hsieh. Boformer: Learning to solve multi-objectiv e bayesian optimization via non- markovian rl. arXiv pr eprint arXiv:2505.21974 , 2025. Hisao Ishibuchi, Noritaka Tsukamoto, and Y usuke Nojima. Evolutionary many-objecti ve optimization: A short re- view . In 2008 IEEE congr ess on evolutionary compu- tation (IEEE world congr ess on computational intelli- gence) , pages 2419–2426. IEEE, 2008. Dietmar Jannach and Himan Abdollahpouri. A survey on multi-objectiv e recommender systems. F r ontiers in big Data , 6:1157899, 2023. Kirthe vasan Kandasamy , Gautam Dasarathy , Junier B Oli va, Jeff Schneider , and Barnabás Póczos. Gaussian pro- cess bandit optimisation with multi-ﬁdelity e valuations. Advances in neural information pr ocessing systems , 29, 2016. Bingdong Li, Zixiang Di, Y ongfan Lu, Hong Qian, Feng W ang, Peng Y ang, Ke T ang, and Aimin Zhou. Expensive multi-objectiv e bayesian optimization based on diffusion models. In Pr oceedings of the AAAI Confer ence on Artiﬁ- cial Intelligence , v olume 39, pages 27063–27071, 2025. Guoqiang Li, Hongliang Guo, Zhenpo W ang, and Meng W ang. Online trajectory optimization for safe au- tonomous overtaking with activ e obstacle avoidance. Robotics and Autonomous Systems , 169:104528, 2023. Xi Lin, Y ilu Liu, Xiaoyuan Zhang, Fei Liu, Zhenkun W ang, and Qingfu Zhang. Few for many: Tchebycheff set scalar- ization for many-objecti ve optimization. arXiv pr eprint arXiv:2405.19650 , 2024. Ke Liu, Xiaodong Xu, W enxin Huang, Ran Zhang, Lingyu K ong, and Xi W ang. A multi-objecti ve optimization framew ork for designing urban block forms considering daylight, energy consumption, and photovoltaic ener gy potential. Building and En vir onment , 242:110585, 2023. Lucia Asencio Martín and Eduardo C Garrido-Merchán. Many objective bayesian optimization. arXiv preprint arXiv:2107.04126 , 2021. Jay I Myung, James R Deneault, Jorge Chang, Inhan Kang, Benji Maruyama, and Mark A Pitt. Multi-objectiv e bayesian optimization: a case study in material extrusion. Digital Discovery , 4(2):464–476, 2025. Ryota Ozaki, Kazuki Ishikawa, Y ouhei Kanzaki, Shinya Suzuki, Shion T akeno, Ichiro T akeuchi, and Masayuki Karasuyama. Multi-objectiv e bayesian optimization with activ e preference learning. arXiv pr eprint arXiv:2311.13460 , 2023. Ryota Ozaki, Kazuki Ishikawa, Y ouhei Kanzaki, Shion T akeno, Ichiro T akeuchi, and Masayuki Karasuyama. Multi-objectiv e bayesian optimization with activ e pref- erence learning. In Proceedings of the AAAI confer ence on artiﬁcial intelligence , volume 38, pages 14490–14498, 2024. Biswajit Paria, Kirthe vasan Kandasamy , and Barnabás Póc- zos. A ﬂexible frame work for multi-objectiv e bayesian optimization using random scalarizations. In Uncertainty in Artiﬁcial Intelligence , pages 766–776. PMLR, 2020. Carl Edward Rasmussen and Hannes Nickisch. Gaussian processes for machine learning (gpml) toolbox. The J our- nal of Machine Learning Resear ch , 11:3011–3015, 2010. NanY an Shen, Hua Y ou, Jing Li, and Ping Song. Real-time trajectory planning for collaborative robots using incre- mental multi-objecti ve optimization. Intelligent Service Robotics , 18(1):43–59, 2025. Niranjan Srini v as, Andreas Krause, Sham M Kakade, and Matthias Seeger . Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML , 2010. Zhiyuan W ang, Jie Li, Gade Pandu Rangaiah, and Zhe W u. Machine learning aided multi-objectiv e optimization and multi-criteria decision making: Frame work and two appli- cations in chemical engineering. Computers & Chemical Engineering , 165:107945, 2022. Christopher KI W illiams and Carl Edward Rasmussen. Gaussian pr ocesses for mac hine learning , volume 2. MIT press Cambridge, MA, 2006. W enjie Xu, Masaki Adachi, Colin N Jones, and Michael A Osborne. Principled bayesian optimization in collabora- tion with human experts. Advances in Neural Information Pr ocessing Systems , 37:104091–104137, 2024. Richard Zhang and Daniel Golovin. Random hypervol- ume scalarizations for prov able multi-objectiv e black box optimization. In International conference on machine learning , pages 11096–11105. PMLR, 2020. SUPPLEMENT AR Y MA TERIAL A THEORETICAL PROOFS In this section, we detail the proof of the theorem stated in the main paper . W e state model setup and assumptions. Then we prov e Lipschitz properties of the utility . Further , we discuss surrogate mismatch bound. Further, we combine all the pieces for the regret decomposition theorem. MODEL ASSUMPTIONS W e consider m objectiv es to be minimized . For any outcome v ector y ∈ R m and weights w ∈ ∆ m − 1 , deﬁne the Chebyshev utility U ( y , w ) : = − min 1 ≤ j ≤ m y j w j . (14) Thus larger utility is preferred. W e assume all weights are bounded away from the simplex boundary: there exists c w > 0 such that w j ≥ c w for all j for ev ery true archetype w ⋆ k and ev ery estimated archetype ˆ w k,t . Let the true latent preferences be a mixture of K ⋆ archetypes with weights η ⋆ ∈ ∆ K ⋆ − 1 and archetypes { w ⋆ k } K ⋆ k =1 ⊂ ∆ m − 1 . Deﬁne the true mixture utility at design x ∈ X by U ⋆ ( x ) : = K ⋆ X k =1 η ⋆ k U  f ( x ) , w ⋆ k  , U ⋆ : = sup x ∈X U ⋆ ( x ) . (15) The algorithm ﬁts one GP per objectiv e f j and maintains posterior mean µ t − 1 ,j ( x ) and standard deviation σ t − 1 ,j ( x ) after t − 1 e valuations. Let µ t − 1 ( x ) ∈ R m be the vector of means. At round t , the preference model outputs estimates ˆ η t ∈ ∆ K − 1 and { ˆ w k,t } K k =1 . Deﬁne the surrogate mixture utility used for acquisition b U t ( x ) : = K X k =1 ˆ η k,t U  µ t − 1 ( x ) , ˆ w k,t  . (16) Assume the selected point x t is ε t -optimal for the surrogate: b U t ( x t ) ≥ sup x ∈X b U t ( x ) − ε t . (17) Assumption 1 (Bounded Objectiv es) . ∥ f ( x ) ∥ ∞ ≤ B y ∀ x ∈ X Alignment and estimation errors Let π t be a permutation (e.g., Hungarian matching) aligning estimated archetypes to true archetypes. Pad η ⋆ with zeros if K > K ⋆ . Deﬁne ∆ w t : = K ⋆ X k =1 η ⋆ k ∥ ˆ w π t ( k ) ,t − w ⋆ k ∥ 1 , ∆ η t : = ∥ ˆ η t − η ⋆ ∥ 1 . (18) Lemma 2 (Lipschitzness of Chebyshev utility) . Let U be as in (14) and assume w j , w ′ j ≥ c w > 0 for all j . Then for any y , y ′ ∈ R m and any such w, w ′ ∈ ∆ m − 1 ,   U ( y , w ) − U ( y ′ , w )   ≤ 1 c w ∥ y − y ′ ∥ ∞ , (19)   U ( y , w ) − U ( y , w ′ )   ≤ ∥ y ∥ ∞ c 2 w ∥ w − w ′ ∥ 1 . (20) Pr oof. For (19), ﬁx w and deﬁne g j ( y ) : = − y j /w j . Each g j is (1 /c w ) -Lipschitz under ∥ · ∥ ∞ because | g j ( y ) − g j ( y ′ ) | = | y j − y ′ j | /w j ≤ ∥ y − y ′ ∥ ∞ /c w . Since U ( y , w ) = min j g j ( y ) is the pointwise minimum of (1 /c w ) -Lipschitz functions, it is itself (1 /c w ) -Lipschitz, proving (19). For (20), ﬁx y and write U ( y , w ) = min j h j ( w ) with h j ( w ) : = − y j /w j . For an y j , | h j ( w ) − h j ( w ′ ) | = | y j |      1 w j − 1 w ′ j      = | y j | | w j − w ′ j | w j w ′ j ≤ ∥ y ∥ ∞ c 2 w | w j − w ′ j | . Therefore | U ( y , w ) − U ( y , w ′ ) | ≤ max j | h j ( w ) − h j ( w ′ ) | ≤ ∥ y ∥ ∞ c 2 w max j | w j − w ′ j | ≤ ∥ y ∥ ∞ c 2 w ∥ w − w ′ ∥ 1 , which prov es (20). GP CONFIDENCE W e use a standard high-probability conﬁdence bound for each objecti ve GP . Assume sub-Gaussian observ ation noise and conditions under which the standard GP-UCB conﬁdence ev ent holds for each objective (e.g., Srini vas et al. (2010)). Assumption 2 (Per-objecti ve GP conﬁdence event) . Ther e exists a nondecr easing sequence { β t } such that with pr obability at least 1 − δ , simultaneously for all t ≥ 1 , all x ∈ X , and all j ∈ { 1 , . . . , m } , | f j ( x ) − µ t − 1 ,j ( x ) | ≤ p β t σ t − 1 ,j ( x ) . (21) Under (21), we hav e for all x , ∥ f ( x ) − µ t − 1 ( x ) ∥ ∞ ≤ p β t ∥ σ t − 1 ( x ) ∥ ∞ . (22) MAIN RESUL T : SIMPLE REGRET DECOMPOSITION Deﬁne the simple regret under the true mixture utility: R T : = U ⋆ − max t ≤ T U ⋆ ( x t ) . (23) Theorem 3 (Simple regret decomposition) . On the GP conﬁdence event in Assumption 2, for any T ≥ 1 , R T ≤ 2 c w min t ≤ T ∥ f ( x t ) − µ t − 1 ( x t ) ∥ ∞ | {z } objectiv e GP term + 2 B y c 2 w min t ≤ T ∆ w t | {z } archetype error + 2 B y c w min t ≤ T ∆ η t | {z } mixture-weight error + max t ≤ T ε t | {z } surrogate maximization . (24) Mor eover , still on the same event, the objective GP term admits an e xistence-type rate: min t ≤ T ∥ f ( x t ) − µ t − 1 ( x t ) ∥ ∞ ≤ 1 T T X t =1 ∥ f ( x t ) − µ t − 1 ( x t ) ∥ ∞ ≲ r β T γ T T , (25) wher e γ T is a (max) information-gain term for the objective GPs (taking the maximum over objectives if kernels differ), and ≲ hides constants depending only on the kernels and noise. Pr oof. Let x ⋆ ∈ arg max x ∈X U ⋆ ( x ) . Fix any round t ≥ 1 . Add and subtract the surrogate utility (16) at x ⋆ and x t : U ⋆ ( x ⋆ ) − U ⋆ ( x t ) =  U ⋆ ( x ⋆ ) − b U t ( x ⋆ )  +  b U t ( x ⋆ ) − b U t ( x t )  +  b U t ( x t ) − U ⋆ ( x t )  . (26) By ε t -optimality (17), the middle term is at most ε t . Therefore, U ⋆ ( x ⋆ ) − U ⋆ ( x t ) ≤   U ⋆ ( x ⋆ ) − b U t ( x ⋆ )   +   U ⋆ ( x t ) − b U t ( x t )   + ε t . (27) It remains to bound |U ⋆ ( x ) − b U t ( x ) | for a generic x . Step 1: (Decomposition) Using (15) and (16), U ⋆ ( x ) − b U t ( x ) = K ⋆ X k =1 η ⋆ k U ( f ( x ) , w ⋆ k ) − K X k =1 ˆ η k,t U ( µ t − 1 ( x ) , ˆ w k,t ) . Insert and subtract P K ⋆ k =1 η ⋆ k U ( µ t − 1 ( x ) , w ⋆ k ) and P K ⋆ k =1 η ⋆ k U ( µ t − 1 ( x ) , ˆ w π t ( k ) ,t ) , then apply triangle inequality:   U ⋆ ( x ) − b U t ( x )   ≤ K ⋆ X k =1 η ⋆ k   U ( f ( x ) , w ⋆ k ) − U ( µ t − 1 ( x ) , w ⋆ k )   + K ⋆ X k =1 η ⋆ k   U ( µ t − 1 ( x ) , w ⋆ k ) − U ( µ t − 1 ( x ) , ˆ w π t ( k ) ,t )   +      K X k =1 ( η ⋆ k − ˆ η k,t ) U ( µ t − 1 ( x ) , ˆ w k,t )      . (28) Step 2: (Bounding terms using Lemma 2) For the ﬁrst line, apply Lipschitzness in y (19):   U ( f ( x ) , w ⋆ k ) − U ( µ t − 1 ( x ) , w ⋆ k )   ≤ 1 c w ∥ f ( x ) − µ t − 1 ( x ) ∥ ∞ . Summing ov er k with weights η ⋆ k giv es K ⋆ X k =1 η ⋆ k   U ( f ( x ) , w ⋆ k ) − U ( µ t − 1 ( x ) , w ⋆ k )   ≤ 1 c w ∥ f ( x ) − µ t − 1 ( x ) ∥ ∞ . (29) For the second line, apply Lipschitzness in w (20) with ∥ µ t − 1 ( x ) ∥ ∞ ≤ B y :   U ( µ t − 1 ( x ) , w ⋆ k ) − U ( µ t − 1 ( x ) , ˆ w π t ( k ) ,t )   ≤ B y c 2 w ∥ ˆ w π t ( k ) ,t − w ⋆ k ∥ 1 . Thus the second line is bounded by ( B y /c 2 w )∆ w t . For the third line, use | U ( µ t − 1 ( x ) , ˆ w k,t ) | ≤ ∥ µ t − 1 ( x ) ∥ ∞ /c w ≤ B y /c w to obtain      K X k =1 ( η ⋆ k − ˆ η k,t ) U ( µ t − 1 ( x ) , ˆ w k,t )      ≤ B y c w ∥ ˆ η t − η ⋆ ∥ 1 = B y c w ∆ η t . (30) Combining (28), (29) and (30), we conclude that for all x ,   U ⋆ ( x ) − b U t ( x )   ≤ 1 c w ∥ f ( x ) − µ t − 1 ( x ) ∥ ∞ + B y c 2 w ∆ w t + B y c w ∆ η t . (31) Step 3: (Con vert to simple regr et) Apply (31) at x = x ⋆ and x = x t in (27): U ⋆ ( x ⋆ ) − U ⋆ ( x t ) ≤ 1 c w  ∥ f ( x ⋆ ) − µ t − 1 ( x ⋆ ) ∥ ∞ + ∥ f ( x t ) − µ t − 1 ( x t ) ∥ ∞  + 2 B y c 2 w ∆ w t + 2 B y c w ∆ η t + ε t . (32) Now tak e the minimum ov er t ≤ T on the right-hand side. Since R T = min t ≤ T  U ⋆ ( x ⋆ ) − U ⋆ ( x t )  and max t ≤ T ε t upper bounds ε t , we obtain (24). On the GP conﬁdence ev ent (21), the bound ∥ f ( x ) − µ t − 1 ( x ) ∥ ∞ ≤ p β t ∥ σ t − 1 ( x ) ∥ ∞ holds for all x ∈ X , including x ⋆ . Step 4: (Existence-type GP rate) On the conﬁdence ev ent (21), ∥ f ( x t ) − µ t − 1 ( x t ) ∥ ∞ ≤ p β t ∥ σ t − 1 ( x t ) ∥ ∞ ≤ p β T ∥ σ t − 1 ( x t ) ∥ ∞ . Under standard variance-sum bounds for GP re gression (e.g., Lemma X), T X t =1 ∥ σ t − 1 ( x t ) ∥ 2 ∞ ≤ C γ T , which implies min t ≤ T ∥ σ t − 1 ( x t ) ∥ ∞ ≤ 1 T T X t =1 ∥ σ t − 1 ( x t ) ∥ ∞ ≤ r C γ T T . B ADDITIONAL EXPERIMENT AL DET AILS B.1 SIMULA TED D A T ASETS W e use well-known MOO benchmarks for our experimental setup, called DTLZ Deb et al. (2005) and WFG suite Huband et al. (2005). W e ev aluate on DTLZ2 with L = 6 objectiv es and d = 7 decision variables (minimization), search space [0 , 1] d . At each design x ∈ [0 , 1] 7 we observe y = f ( x ) . W e also use WFG9 with L = 8 objectiv es and d = 34 decision variables. Follo wing common practice, we set the WFG position parameter k = 2( L − 1) = 14 and distance parameter l = 20 so d = k + l . V ariables are scaled to [0 , 1] and all objectives are minimized. A design x ∈ [0 , 1] d ev aluates to an objectiv e vector y = f ( x ) ∈ R 8 . WFG9 induces challenging biases and mixed separability , making it a stronger testbed for query policies. T o construct distinct but ov erlapping preference archetypes, we partition the objectiv es into disjoint groups and assign structured weight vectors. F or the DTLZ suite ( L = 6 ), we deﬁne three groups G 1 = { 0 , 1 } , G 2 = { 2 , 3 } , G 3 = { 4 , 5 } while for the WFG suite ( L = 8 ) we deﬁne four groups G 1 = { 0 , 1 } , G 2 = { 2 , 3 } , G 3 = { 4 , 5 } , G 4 = { 6 , 7 } . Each mode allocates 80% of its total mass uniformly across its dominant group G k and distributes the remaining 20% uniformly ov er the other objectiv es. The resulting vector is renormalized to ensure w k ∈ ∆ L − 1 . This construction yields clearly separated trade-off proﬁles while maintaining nonzero sensiti vity to all objectiv es. Let Z t ∈ { 1 , . . . , K } denote the acti ve preference mode at query t . Under an iid context regime, each query is gen- erated independently according to a ﬁxed mixture Z t ∼ Categorical( η ⋆ ) , with η ⋆ = (0 . 5 , 0 . 3 , 0 . 2) for DTLZ and η ⋆ = { (0 . 60 , 0 . 25 , 0 . 15 , 0 . 10) , 0 . 4 , 0 . 3 , 0 . 3 , 0 . 1 } for WFG. Under a persistent context re gime, the mode ev olves according to a stay-or -resample process with persistence parameter ρ ∈ [0 , 1) . The initial mode is dra wn as Z 1 ∼ Categorical( η ⋆ ) , and at each subsequent query the mode remains unchanged with probability ρ and is resampled from Categorical( η ⋆ ) with probability 1 − ρ . The expected run length within a single mode is 1 / (1 − ρ ) , and the expected number of switches ov er T queries is (1 − ρ )( T − 1) . B.2 REAL-WORLD: PET PROCESS DESIGN W e ev aluate the proposed framew ork on a real-world multi-objecti ve process design problem for polyethylene terephthalate (PET) production. The dataset comprises 10,000 previously ev aluated process designs. Each design is represented by a 12-dimensional continuous decision vector x ∈ R 12 corresponding to operating and design parameters such as temperatures, pressures, residence times, and feed ratios. Each variable is bounded within physically meaningful limits and normalized to [0 , 1] for modeling purposes. For each design, we consider L = 7 objectiv es consisting of Return on In vestment (R OI), life- Cycle Assessment (LCA) indicators, including global warming potential (GWP), terrestrial acidiﬁcation, w ater consumption, fossil depletion, surplus ore, and human toxicity . T o model heterogeneous chemist preferences, we construct K = 3 archetypal weight vectors w k ∈ ∆ 6 that parameterize the Chebyshe v scalarization. Objecti ves are grouped semantically T ype A (Economic) which emphasizes on R OI, T ype B (En vironmental) which cares about en vironmental LCA indicators, T ype C (Health-focused) which stresses on toxicity- related indicators. Each archetype allocates 80% of its mass to its dominant objecti ve group and distributes the remaining 20% across the other objectiv es, followed by renormalization. This construction yields clearly distinct yet overlapping trade-off proﬁles that reﬂect plausible real-world stakeholder types. At each comparison step t , a latent mode variable Z t ∈ { 1 , 2 , 3 } determines which archetype governs the user’ s decision. Modes are sampled from a categorical distrib ution with mixture weights η = [0 . 40 , 0 . 35 , 0 . 25] representing the population-lev el prev alence of economic, en vironmental, and health-oriented chemists. The parameter η therefore controls the global heterogeneity across users. T o capture temporal consistency , we adopt a sticky gating process with persistence parameter ρ ∈ [0 , 1) . The initial mode is drawn from Categorical( η ) , and at each subsequent step the mode remains unchanged with probability ρ and is resampled from η with probability 1 − ρ . In our experiments we set ρ = 0 . 8 , implying strong short-term consistency in preferences. The preferences are generated using the same method as mentioned in the simulated settings. B.3 INTERA CTION CONTEXTS W e ev aluate preference learning under the follo wing two feedback re gimes. Persistent context A latent archetype z t ∈ { 1 , . . . , K ⋆ } ev olves with temporal inertia. At round t , z t = ( z t − 1 , with probability ρ, Categorical( η ⋆ ) , with probability 1 − ρ, where η ⋆ ∈ ∆ K ⋆ − 1 are the true mixture weights and ρ ∈ [0 , 1) controls stickiness. The initial archetype z 1 is sampled from Categorical( η ⋆ ) . Conditional on z t = k , pairwise feedback is generated using utility U  f ( x ) , w ⋆ k  under a probit likelihood. This regime induces temporally coherent feedback and facilitates identiﬁcation and reﬁnement of acti ve archetypes. I.i.d. context. At each round, z t ∼ Categorical( η ⋆ ) , independently across t . Each comparison is therefore generated from an independently sampled archetype, removing temporal continuity . This re gime is more challenging for mode identiﬁcation, as archetype switches occur randomly and minority modes are observed less frequently in e xpectation. All methods are ev aluated under both regimes using identical seeds and e v aluation b udgets. B.4 IMPLEMENT A TION DET AILS: W e conduct our experiments on two simulated WFG and DTLZ and one real-world dataset. W e consider stickiness parameter to be 0 . 5 for both DTLZ and WFG dataset. While simulation of clusters, we consider η ⋆ = (0 . 5 , 0 . 3 , 0 . 2) for DTLZ, η ⋆ = (0 . 50 , 0 . 25 , 0 . 15 , 0 . 10) for WFG suite and η ⋆ = (0 . 40 , 0 . 35 , 0 . 25) for Chemistry Process Design. W e use Python Symbol Meaning L number of objectiv es K truncation lev el (max modes) ℓ objectiv e index k preference mode index w k weight vector for mode k η k mixture weight of mode k r i latent mode for comparison i σ u preference noise parameter T able 1: Notation summary . 3.12 with J AX/J AXlib, NumPyro, Optuna, and GPy . Each objectiv e is modeled with an independent GP (RBF kernel); for synthetic/WFG experiments GPs are reﬁt e very 3 iterations with bounded noise and data-adapti ve bounds on v ariance and lengthscale, whereas for chemistry experiments pretrained GP surrogates are loaded and queried. Preferences follo w a probit likelihood with noise u σ is set to 0.02 for WFG/DTLZ dataset and 0.1 for PET process design dataset. Heterogeneous preferences are modeled via a truncated stick-breaking DP mixture with preknown K components and variational inference using SVI (Adam lr=1e-3, 500 updates per iteration). Preference queries are selected from pairs of previously observ ed outcomes using entropy/MI-based criteria (modes: clusterless/inter/intra/hybrid). Candidate points are proposed by MC expected improv ement with 12 samples and 10 L-BFGS-B restarts. All randomness (initial seeds, EI samples, query subsampling, sticky archetype gating, and v ariational initialization) is controlled by a single run seed. All methods share an identical initial seed dataset injected into Optuna. Simple regret uses a common reference U ∗ estimated by random search with 500 samples (sav ed and reused). W e report mean ± std across 3 runs (different run seeds). B.5 ADDITIONAL METRICS 1) Mixture-weight calibration W e compare the posterior mean mixture weights ˆ η to the ground-truth weights η ⋆ projected via the same alignment (bar plot), and track KL( η ⋆ ∥ ˜ η ) ov er iterations (lower is better). Blue-only mass indicates spurious modes, orange-only mass indicates missed modes. 2) Errors of inferred prefer ences: Errors of inferred preferences relative to single reference w true : mixture-mean || w true − P k η k ¯ w k || 1 , MAP-cluster || w true − ¯ w k ∗ || with k ∗ = arg max k η k and expected P k η k || w true − ¯ w k || 1 . Lower is better . C RESUL TS AND ANAL YSIS Figure 6 shows mixture-weight calibration for DTLZ2 dataset. W e can observe here that in random plot blue mass is concentrated on one inde x and a big orange bar has almost no blue in sev eral panels, hence indicating towards mode collapse. In intra-cluster , we can see that it is focussing on one cluster’ s shape since it doesnt tru to calibrate η . In the inter-query plot, we can see separate modes, and more blue bars are lining up with orange bars where orange bars indicate projected true η ∗ . Figure 7 sho ws errors of inferred preferences for persistent user for DTLZ2 dataset for clusterless and inter -query modes. we can observe that mixture-mean/expected errors creep up, and MAP worsens. That’ s consistent with uncertainty sampling under collapsed scalarizer . Since clusterless mode ignores disagreement between archetypes, it chases high entropy and can drift away from the true mode. In the plot of inter-query , we can observe that the expected and mixture-mean errors decline clearly . Here, we can also note that MAP improves more slo wly . It shows ho w inter queries look for identity ﬁrst and tend to cut mixture confusion early . The Figure 8 shows posterior probability mass assigned to the cluster k i.e. ho w model belie ves each archetype contributes to the ov erall mixture ov er series of iterations for WFG and DTLZ dataset. The query mode is hybrid in both cases. W e can see for WFG dataset, cluster 0 dominates and rises to ≈ 0.7 and remains stable, cluster 1 is around ≈ 0 . 2 , cluster 2 and cluster 3 around 0.1 and cluster 4 around 0. This describes the true weights of WFG dataset as well ( η ⋆ = (0 . 50 , 0 . 25 , 0 . 15 , 0 . 10) . The stabalization of weights happens roughly after 20 iterations. W e can also see the plot for DTLZ dataset. For DTLZ dataset, we can see that cluster 0 dominates and rises to ≈ 0.7, cluster 1 is around 0.2, cluster 2 is around 0.1 and cluster 3,4 are around 0. This is also close to DTLZ η ⋆ = (0 . 5 , 0 . 3 , 0 . 2) . The Figure 8 shows posterior probability mass assigned to the cluster k i.e. ho w model belie ves each archetype contributes to (a) Random (b) Intra (c) Inter Figure 6: Mixture-weight calibration for persistent user for DTLZ dataset. Inter queries best match the true mixture, while random collapses to one mode and intra largely ignores η . Blue bars are inferred ˆ η , orange bars are projected true η ∗ via Hungarian alignment. (a) Random: blue mass on one inde x and orange-only bars → mode collapse. (b) Intra: blue focuses on one component → poor allocation across modes. (c) Inter: clear blue–orange alignment → best identiﬁcation and calibration across modes (a) Clusterless (b) Inter Figure 7: Errors of inferred preferences for persistent user for DTLZ dataset. Inter–queries quickly reduce mixture-mean and expected L 1 errors, while MAP impro ves more slowly; in contrast, clusterless drifts upward, reﬂecting entrop y-only sampling under an av eraged scalarizer and mode confusion. Curves: blue = mixture-mean || w true − P k η k ¯ w k || 1 , orange = MAP-cluster || w true − ¯ w k ∗ || with k ∗ = arg max k η k and green = expected P k η k || w true − ¯ w k || 1 . Lower is better . (a) WFG dataset (b) DTLZ dataset Figure 8: Evolution of posterior mixture weights η k ov er outer iterations for WFG dataset (a) random (b) clusterless (c) Intra (d) Inter (e) hybrid Figure 9: Comparison of archetype-aware querying strate gies (a–e) Empirical gating frequencies versus true mixture weights for different selection modes. Hybrid scheduling achie ves the closest alignment to the true mixture, demonstrating effecti ve exploration across archetypes. the ov erall mixture ov er series of iterations for WFG and DTLZ dataset. The query mode is hybrid in both cases. W e can see for WFG dataset, cluster 0 dominates and rises to ≈ 0.7 and remains stable, cluster 1 is around ≈ 0 . 2 , cluster 2 and cluster 3 around 0.1 and cluster 4 around 0. This describes the true weights of WFG dataset as well ( η ⋆ = (0 . 50 , 0 . 25 , 0 . 15 , 0 . 10) . The stabalization of weights happens roughly after 20 iterations. W e can also see the plot for DTLZ dataset. For DTLZ dataset, we can see that cluster 0 dominates and rises to ≈ 0.7, cluster 1 is around 0.2, cluster 2 is around 0.1 and cluster 3,4 are around 0. This is also close to DTLZ η ⋆ = (0 . 5 , 0 . 3 , 0 . 2) . Figure 9 gating-frequency panels (a–e) sho w how each policy allocates queries among latent preference archetypes, while the regret plot (f) sho ws the impact of that allocation on optimization performance. Random and clusterless ignore mixture structure; intra collapses to one mode; inter balances but lacks ﬁne exploitation; hybrid balances exploration across archetypes and exploitation within them—achie ving both accurate mixture recovery and lo west regret.

Mixture-Model Preference Learning for Many-Objective Bayesian Optimization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment