From Asymptotic to Finite-Sample Minimax Robust Hypothesis Testing

1 From Asymptotic to Finite-Sample Minimax Rob ust Hypothesis T esting G ¨ okhan G ¨ ul Pre venti ve Cardiology and Pre venti v e Medicine, Department of Cardiology , Uni versity Medical Center of the Johannes Gutenber g Univ ersity Mainz Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, Uni versity Medical Center Johannes Gutenber g Univ ersity Mainz German Center for Cardiov ascular Research (DZHK), Partner Site Rhine Main, Uni versity Medical Center of the Johannes Gutenber g Univ ersity Mainz Langenbeckstraße 1, 55131 Mainz, Germany Email: goekhan.guel@unimedizin-mainz.de Abstract This paper establishes a formal connection between ﬁnite-sample and asymptotically minimax rob ust hypothesis testing under distributional uncertainty . It is shown that, whenev er a ﬁnite-sample minimax robust test exists, it coincides with the solution of the corresponding asymptotic minimax problem. This result enables the analytical deriv ation of ﬁnite-sample minimax robust tests using asymptotic theory , bypassing the need for heuristic constructions. The total variation distance and band model are examined as representativ e uncertainty classes. F or each, the least fav orable distributions and corresponding robust likelihood ratio functions are derived in parametric form. In the total variation case, the new deriv ation generalizes earlier results by allowing unequal rob ustness parameters. The theory also explains and systematizes previously heuristic designs. Simulations are provided to illustrate the theoretical results. Index T erms Hypothesis testing, e vent detection, robustness, least favorable distributions, minimax optimization. February 24, 2026 DRAFT 2 I . I N T RO D U C T I O N In simple binary hypothesis testing, the design of optimum tests requires complete statis- tical kno wledge of the underlying data distributions [1]. Ho wev er , this assumption is often too restrictiv e and seldom holds in practice [2]. A pragmatic alternati ve is to adopt either parametric [3], [4] or non-parametric robust approaches [5]. Parametric models, including those based on M -estimators [6], implicitly assume that the general form of the distrib utions remains kno wn, whereas non-parametric methods, such as the sign and W ilcoxon tests, make only mild assumptions about the underlying distributions and are therefore regarded as more conservati ve approaches [7], [8]. Minimax rob ust hypothesis testing (MRHT) offers an intermediate frame work between para- metric and non-parametric methods. In MRHT , the true distribution of the observed data, G j , is assumed to belong to an uncertainty class G j , whose size is determined by a robustness parameter ϵ j . By adjusting these parameters, one can balance robustness against detection po wer . The choice of uncertainty classes is typically application-dependent, with common formulations being model-based (e.g., ϵ -contamination models) or distance-based (e.g., sets deﬁned through the KL-di vergence) [9]. The designer’ s objecti ve is then to determine a decision rule ˆ δ that minimizes a predeﬁned risk function ev aluated under the least fa vorable distributions (LFDs) 1 . Under mild regularity conditions, such tests are optimal in the minimax sense, guaranteeing the best possible detection performance under the assumed model uncertainties. Robustness can also be characterized in terms of sample size. Depending on the underlying uncertainty model, minimax robust tests may exist either for ﬁnite samples or only in the asymptotic regime. For example, in the ϵ -contamination model and total variation neighborhoods, minimax rob ust tests exist for any ﬁnite sample size [10], [11], whereas for formulations based on div ergence measures, only asymptotically minimax robust solutions are guaranteed [12], [13]. This distinction moti vates the de velopment of a uniﬁed frame work that systematically links ﬁnite- and large-sample rob ustness. 1 For bre vity , no notational distinction is made between least fa vorable distributions and their corresponding densities unless confusion may arise. February 24, 2026 DRAFT 3 A. Related work The foundations of robust hypothesis testing trace back to the seminal works of P . J. Huber , who in 1965 introduced a robust version of the likelihood ratio test for the ϵ -contamination and total variation classes of probability distributions [10]. Huber deriv ed the corresponding least fa vorable distributions and showed that the clipped likelihood ratio test constitutes the minimax robust solution for both uncertainty classes. This line of research was later e xtended by Huber and Strassen [14] and further generalized to the framework of 2 -alternating capacities [15], establishing the theoretical foundation of minimax rob ust detection. Ne vertheless, it w as later sho wn that minimax robust tests do not always exist, for instance when uncertainty classes are deﬁned using the KL-di vergence [16]. One natural extension of the basic uncertainty formulations is to construct classes that incorporate partial prior knowledge about the true distributions, such as their approximate shapes, supports, or lo w-order moments. Among these, moment classes specify the uncertainty in terms of ﬁnitely many statistical moments of the distributions [13], allowing direct control of mean and v ariance de viations while maintaining analytical tractability . The p -point classes [4], [17] provide an alter- nati ve description by partitioning the probability domain into p disjoint regions with prescribed probability masses, which enables ﬂexible modeling of multimodal or asymmetric de viations. Band models, originally introduced by Kassam [18] and later reﬁned by Fauß et al. [19], instead deﬁne upper and lower bounds on the admissible densities, thereby encoding approximate shape and location information. These formulations offer complementary ways of representing partial distributional kno wledge and serve as a foundation for many subsequent dev elopments in rob ust detection and estimation. Se veral studies hav e applied these uncertainty formulations to practical detection problems and extended them tow ard data-driv en settings. Early works such as [20], [21], [22] in vestigated clipped-likelihood and mixture-based detectors under Gaussian and contaminated noise models. V ariants of moment and p -point uncertainty sets ha ve been used across disciplines, including robust decision-making in ﬁnance [23], admission control [24], and queueing systems [25]. More recent lines of research have connected these formulations with distributionally robust op- timization [26], enabling tractable con ve x approximations under W asserstein [27] and Sinkhorn- type metrics [28]. Related extensions include kernel-based uncertainty sets [29], [30], empirical distribution approaches [31], and adversarially robust testing framew orks [32]. T ogether , these February 24, 2026 DRAFT 4 de velopments highlight a continuous progression tow ards more comprehensiv e and versatile frame works for robust decision-making under distributional uncertainty . B. Summary of the paper and its contributions In our earlier work, we showed that asymptotically minimax robust tests can be systematically designed by solving the optimization problem min u ∈ (0 , 1) max ( G 0 ,G 1 ) ∈ G 0 × G 1 D u ( G 0 , G 1 ) , where D u ( G 0 , G 1 ) = Z Ω g u 1 g 1 − u 0 dµ is the so-called u -af ﬁnity [33]. In this paper , it is demonstrated that asymptotic minimax theory not only provides a principled frame work in its own right, but also yields a direct analytical route to derive ﬁnite-sample minimax robust tests — whene ver such tests e xist. This connection enables the exact computation of LFDs without requiring heuristic constructions. Moreo ver , the same theoretical foundation also allo ws for numerical computation of LFDs in cases where analytical forms are not a vailable; moment and p-point uncertainty classes are treated as two representativ e examples. The main contributions of this work are as follows: 1) It is proven that whene ver a ﬁnite-sample minimax rob ust test exists, it coincides with the test deriv ed from asymptotic minimax theory (see Proposition II.7). This uniﬁes the treatment of ﬁnite and inﬁnite sample regimes. 2) Based on this equiv alence, ﬁnite-sample minimax robust tests are analytically deri ved for two uncertainty classes: total variation distance and band model. In each case, the least fa vorable distributions and rob ust likelihood ratio functions (LRFs) are obtained in parametric form. 3) For the total variation case, the derived LFDs generalize Huber’ s results [10] by allowing asymmetric robustness le vels, i.e., dif ferent contamination radii ϵ 0  = ϵ 1 for the two hypotheses H 0 and H 1 (see Theorem III.1). 4) It is sho wn that two special cases of the band model yield distinct versions of the ϵ - contamination neighborhood, each of which admits a clipped likelihood ratio test (CLR T) as the minimax robust solution (see Theorems III.5 and III.7). One of these cases was pre viously studied by Huber [10]; the other is newly deriv ed in this work. It is also proven February 24, 2026 DRAFT 5 that both are single-sample minimax robust (see Theorem III.8), and their intersection corresponds to the general form of the band model. 5) In the symmetric total variation case, our formulation rev eals that the clipping thresholds necessarily satisfy t l t u = 1 , implying that the family of least fa vorable distrib utions is intrinsically one-dimensional. This observ ation clariﬁes and sharpens the classical construc- tion of Huber , in which the symmetry-induced parameter reduction is not made explicit. 6) T ak en together , these results provide a theoretical foundation for sev eral previously heuris- tic constructions, such as the designs proposed by Huber [10] and Kassam [18], by sho wing ho w they emerge naturally from the asymptotic theory . C. Outline of the paper The rest of the paper is organized as follo ws. In Section II, single-sample, ﬁnite-sample and asymptotic minimax robustness are deﬁned, the connection between asymptotic and ﬁnite- sample minimax rob ustness is established and the problem statement is made. In Section III, least fa vorable distributions and the correesponding rob ust likelihood functions are derived in closed parametric forms for the total variation neighborhood and the band model. The parameters can be obtained by solving a system of non-linear equation, leading to the asymptotic minimax robust tests. In Section IV the formulation of asymptotic minimax rob ustness is extended to the LFDs which cannot be obtained in closed form and numerical methods are required. Moment classes and p-point classes are presented as the two examples. In Section V, simulations are performed to ev aluate and ex emplify the theoretical deriv ations. Finally in Section VI, the paper is concluded. D. Notations The following notations are applied throughout the paper . Upper case symbols are used for probability distributions and random variables, and the corresponding lo wer case symbols denote the density functions and observations, respecti vely . Boldface symbols are used for the sequence of random v ariables, sequence of observ ations or joint functions. The hypotheses H 0 and H 1 are associated with the nominal probability measures F 0 and F 1 , whereas the corresponding actual distributions are denoted by G 0 and G 1 . The sets of probability distributions are denoted by G 0 and G 1 . Every probability measure, e.g. G [ · ] , is associated with its distribution function G ( · ) and the density function g ( · ) . The symbol M denotes the set of all distribution functions on Ω . The February 24, 2026 DRAFT 6 notation ˆ ( · ) indicates the least fa vorable distributions ˆ G j ∈ G j , the corresponding densities ˆ g j , or the robust likelihood ratio test ˆ δ . The expected value of a random variable Y ∼ G j is denoted by E G j [ Y ] . The argument (v alue on the domain) of the subsequent operation is denoted by arg . I I . L I N K I N G A S Y M P T OT I C M I N I M A X T H E O RY T O F I N I T E - S A M P L E T E S T S Let M denote the set of all probability measures on Ω , and let G 0 ⊂ M and G 1 ⊂ M be two uncertainty classes associated with the hypotheses H 0 and H 1 , respectively . Consider a sequence of n independent and identically distributed (i.i.d.) random variables Y = ( Y 1 , . . . , Y n ) , each taking v alues in Ω . Under the hypothesis H j , the distribution of Y k is denoted by G j ∈ G j , and the binary decision problem is to determine which of the hypotheses H 0 : Y k ∼ G 0 , G 0 ∈ G 0 , H 1 : Y k ∼ G 1 , G 1 ∈ G 1 , (1) is true for k ∈ { 1 , . . . , n } . Let δ : Y 7→ { 0 , 1 } denote a decision rule, with false-alarm probability P F = G 0 [ δ ( Y ) = 1] and miss-detection probability P M = G 1 [ δ ( Y ) = 0] . For a gi ven prior probability π 0 = P ( H 0 ) , the error probability is deﬁned as P E ( δ, G 0 , G 1 ) = π 0 P F ( δ, G 0 ) + (1 − π 0 ) P M ( δ, G 1 ) . (2) The minimax formulation seeks a decision rule ˆ δ together with least fav orable distributions ( ˆ G 0 , ˆ G 1 ) that solve min δ max ( G 0 ,G 1 ) ∈ G 0 × G 1 P E ( δ, G 0 , G 1 ) . (3) Let ˆ l = ˆ g 1 / ˆ g 0 denote the likelihood ratio induced by the least f av orable distributions, deﬁne X k = log ˆ l ( Y k ) , and let the empirical mean be S n ( X ) = 1 n n X k =1 X k . Since the minimax problem (3) is solved by a likelihood ratio test based on ( ˆ G 0 , ˆ G 1 ) , the associated decision rule takes the form ˆ δ ( X ) =      0 , S n ( X ) < t, 1 , S n ( X ) ≥ t, (4) where t ∈ R is a threshold chosen to minimize the worst-case error probability . In the remainder of this section the asymptotic setting n → ∞ is considered. The behaviour of February 24, 2026 DRAFT 7 S n ( X ) under G 0 and G 1 , together with its associated lar ge-deviation properties, provides the foundation for connecting asymptotic minimax robustness with ﬁnite-sample minimax robustness in the subsequent subsections. A. Minimax Robustness Concepts 1) Single-sample minimax r obustness (SMR): Single-sample minimax rob ustness characterizes optimality against worst-case distributional perturbations when only a single observ ation is av ailable. Deﬁnition II.1 (Single-sample minimax robustness) . Let n = 1 and let Y = Y 1 . Suppose there exist distributions ˆ G 0 ∈ G 0 and ˆ G 1 ∈ G 1 such that the likelihood ratio ˆ l ( Y ) = ˆ g 1 ( Y ) / ˆ g 0 ( Y ) satisﬁes G 0 h ˆ l ( Y ) < t i ≥ ˆ G 0 h ˆ l ( Y ) < t i , G 1 h ˆ l ( Y ) < t i ≤ ˆ G 1 h ˆ l ( Y ) < t i , (5) for all t ∈ R and all ( G 0 , G 1 ) ∈ G 0 × G 1 . Then the likelihood–ratio test induced by ( ˆ G 0 , ˆ G 1 ) , which solves the single-sample minimax problem (3), is called single-sample minimax r obust (SMR) [10, p. 1754]. The existence of LFDs depends on the choice of uncertainty classes G 0 and G 1 . For instance, SMR tests exist for ϵ -contamination and total variation classes [10], whereas they fail to exist for KL-di vergence-based neighborhoods [16]. Even if no minimax solution exists within the class of deterministic decision rules, a unique minimax rule can still be obtained over randomized decision rules [16]. Howe ver , the information contained in the randomization does not carry ov er to products of likelihood ratios, and therefore cannot be directly extended to multiple samples. 2) F inite-sample minimax r obustness (FMR): A test is called ﬁnite-sample minimax r obust (FMR) if it is minimax robust for ev ery ﬁnite sample size n < ∞ ; equiv alently , the single-sample stochastic ordering conditions (5) hold for the joint likelihood ratio based on Y . 3) Asymptotic minimax r obustness (AMR): In the asymptotic regime, the behavior of S n is characterized through large-de viation principles. The minimax criterion compares the exponential decay rates of the error probabilities under all ( G 0 , G 1 ) ∈ G 0 × G 1 with those achiev ed under a February 24, 2026 DRAFT 8 candidate pair of least fa vorable distributions. The following regularity assumptions ensure that the comparison of error exponents is meaningful for all distributions in the uncertainty classes. Assumption II.1 (Uniform exponential moments) . Ther e exists ε > 0 such that the random variable X k = log( ˆ l ( Y k )) has a ﬁnite moment generating function (MGF) E G j  e uX k  < ∞ , for all | u | < ε, j ∈ { 0 , 1 } , (6) for every ( G 0 , G 1 ) ∈ G 0 × G 1 . This ensur es that the corresponding Cram ´ er rate functions ar e ﬁnite and well-deﬁned. Assumption II.2 (Threshold separation) . There exists a r eal number t such that E G 0 [ X k ] < t < E G 1 [ X k ] , (7) for all ( G 0 , G 1 ) ∈ G 0 × G 1 . This guarantees that both false-alarm and miss-detection e xponents ar e positive. Corollary II.3. Any of the following conditions is sufﬁcient for (7) to hold: 1) Ther e exist LFDs ( G 0 , G 1 ) ∈ G 0 × G 1 satisfying single-sample minimax r obustness. 2) E ˆ G 1 [ X k ] < E G 1 [ X k ] and E ˆ G 0 [ X k ] > E G 0 [ X k ] for all ( G 0 , G 1 ) ∈ G 0 × G 1 3) min E G 1 [ X k ] > max E G 0 [ X k ] for all ( G 0 , G 1 ) ∈ G 0 × G 1 Mor eover , 1 = ⇒ 2 = ⇒ 3 , and neither implication is r eversible in gener al. Deﬁnition II.2 (Asymptotic minimax robustness) . A test is called asymptotically minimax r obust if, for a threshold t satisfying Assumption II.2 and under Assumption II.1, the inequalities lim n →∞ 1 n log G 0 [ S n ( X ) > t ] ≤ lim n →∞ 1 n log ˆ G 0 [ S n ( X ) > t ] , (8) and lim n →∞ 1 n log G 1 [ S n ( X ) ≤ t ] ≤ lim n →∞ 1 n log ˆ G 1 [ S n ( X ) ≤ t ] , (9) hold for all ( G 0 , G 1 ) ∈ G 0 × G 1 . B. Relations Between F inite-Sample and Asymptotic Minimax Robustness This section de velops the connection between classical single-sample minimax robustness (SMR), its ﬁnite-sample extension (FMR), and the asymptotic minimax robustness (AMR). W e February 24, 2026 DRAFT 9 sho w that SMR and FMR coincide, that FMR always implies AMR, and that AMR implies FMR under a mild uniqueness condition. Theorem II.4. If a single-sample minimax r obust test exists, then a minimax r obust test exists for every ﬁnite sample size n < ∞ , with the same least favorable distributions and for the same uncertainty classes. Hence, SMR ⇐ ⇒ FMR . (10) Pr oof: See [10, Section 4]. The next result recalls a fundamental equiv alence between the stochastic ordering condition (5) and the minimization of f -div ergences. Theorem II.5 (Equiv alence of SMR and f -div ergence minimization) . Let G 0 and G 1 be distri- butions absolutely continuous with r espect to a common measur e µ on Ω . F or the f -diver gence D f ( G 0 , G 1 ) = Z Ω f  g 0 g 1  g 1 dµ, (11) wher e f : R ≥ 0 → R is con vex and satisﬁes f (1) = 0 , the following equivalence holds: ( ˆ G 0 , ˆ G 1 ) satisﬁes (5) ⇐ ⇒ ( ˆ G 0 , ˆ G 1 ) minimizes D f (12) over all ( G 0 , G 1 ) ∈ G 0 × G 1 , for every twice continuously differ entiable con vex f . Pr oof: See Appendix A. Theorem II.6 (FMR implies AMR) . Let ( G 0 , G 1 ) be uncertainty classes for which a ﬁnite-sample minimax r obust (FMR) test exists with LFDs ( ˆ G 0 , ˆ G 1 ) . Then an asymptotically minimax r obust (AMR) test exists for the same uncertainty classes and with the same LFDs. In particular , FMR = ⇒ AMR . (13) Pr oof: Let ( ˆ G 0 , ˆ G 1 ) be the LFDs generating the FMR test. By Deﬁnition II.1 and Theo- rem II.4, these distributions satisfy the stochastic ordering condition (5). From Theorem II.5, the same pair minimizes ev ery twice differentiable con ve x f -div ergence D f , and in particular minimizes − D u (equi valently maximizes D u ) for ev ery u ∈ (0 , 1) , as well as D KL . The AMR deﬁnition requires two conditions: (i) the threshold separation condition (7), and (ii) LFDs that maximize D u for the minimizing u . Condition (i) holds by Corollary II.3, applied to the LFDs February 24, 2026 DRAFT 10 ( ˆ G 0 , ˆ G 1 ) . Condition (ii) holds because ( ˆ G 0 , ˆ G 1 ) maximize D u for all u ∈ (0 , 1) by Theorem II.5. Hence the same LFDs satisfy the AMR requirements, completing the proof. Remark II.1 . The AMR formulation requires the ﬁnite–MGF condition of Assumption II.1. If the uncertainty classes G 0 , G 1 contain pathological elements (e.g., point masses located at zeros of the nominal densities) that produce inﬁnite MGFs, these distributions are never least fav orable: they yield inﬁnite f –di vergences or zero Chernof f-type exponents, and therefore cannot minimize any con vex f –div ergence or maximize D u . Such elements may be removed from the uncertainty classes without affecting the LFDs, so the effecti ve uncertainty classes (i.e., those that contain all potential minimizers of the robust problem) automatically satisfy Assumption II.1. Thus the LFDs ( ˆ G 0 , ˆ G 1 ) obtained from the FMR problem also satisfy the AMR regularity requirements. An analogous implicit restriction is already present in Huber’ s SMR inequalities: any pair of distributions for which g 0 v anishes on a set where G 1 places positiv e mass leads to nonintegrable likelihood ratios or div ergent integrals, and therefore cannot be least fav orable. Theorem II.7 (AMR implies FMR conditionally) . Let ( G 0 , G 1 ) be uncertainty classes for which an AMR solution exists. Let ( G 0 , G 1 ) be the AMR LFDs corr esponding to the minimizing parameter u ∗ ∈ (0 , 1) , i.e. the y maximize D u ∗ over G 0 × G 1 . Assume that for u ∗ the maximization of D u ∗ admits a unique solution (up to µ -a.e. equality). If a ﬁnite-sample minimax r obust (FMR) test exists for the same uncertainty classes, then its LFDs must coincide a.e. with ( G 0 , G 1 ) . Thus, whenever FMR exists, AMR = ⇒ FMR . (14) Pr oof: Let ( ˆ G 0 , ˆ G 1 ) denote the LFDs of an FMR test. By Theorem II.5, they maximize D u ∗ , since FMR implies minimization of all f -di ver gences. Because the D u ∗ -maximizer is unique by assumption, any two maximizers must coincide µ -a.e. Thus ( ˆ G 0 , ˆ G 1 ) = ( G 0 , G 1 ) almost e verywhere. Remark II.2 . If the maximizer of D u ∗ ov er G 0 × G 1 is not unique, the above argument shows only that any FMR LFD pair must belong to the set of D u ∗ -maximizers. In particular , FMR LFDs coincide (up to µ -a.e. equality) with some AMR LFDs, but not necessarily with a distinguished pair ( G 0 , G 1 ) . The uniqueness assumption in Theorem II.7 is in fact mild; under natural con ve xity and compactness conditions, D u admits a unique maximizer , as formalized in the follo wing lemma. February 24, 2026 DRAFT 11 Lemma II.8 (Uniqueness of D u -maximizer) . Let (Ω , F , µ ) be a σ -ﬁnite measur e space. Let G 0 and G 1 be sets of pr obability measur es G 0 and G 1 absolutely continuous with respect to µ , with densities g 0 = d G 0 / d µ and g 1 = d G 1 / d µ r espectively . Assume G 0 × G 1 is con vex and compact in some topology . F or u ∈ (0 , 1) , r ecall that D u ( G 0 , G 1 ) = Z Ω g u 1 g 1 − u 0 dµ. (15) If G 0 ∩ G 1 = ∅ and max ( G 0 ,G 1 ) ∈ G 0 × G 1 D u ( G 0 , G 1 ) > 0 , (16) then D u admits a unique maximizer ( G ∗ 0 , G ∗ 1 ) in G 0 × G 1 , up to µ -null sets. Pr oof: See Appendix B. Corollary II.9 (Uniqueness transfer from AMR to FMR) . If for the minimizing parameter u ∗ the maximizer of D u ∗ is unique, then any e xisting FMR LFDs must coincide µ -a.e. with the AMR LFDs. Thus AMR uniqueness implies uniqueness of all ﬁnite-sample LFDs whenever the y exist. Pr oof: Immediate from Theorem II.7. C. P oblem F ormulation The design of asymptotically minimax robust tests is based on the optimization problem min u ∈ (0 , 1) max ( G 0 ,G 1 ) ∈ G 0 × G 1 D u ( G 0 , G 1 ) . (17) As sho wn in our earlier work [33], this problem admits a saddle value under mild assumptions on the uncertainty classes, with associated least fa vorable distributions ( ˆ G 0 , ˆ G 1 ) and minimizing parameter ˆ u . The precise saddle-point characterization and existence conditions are giv en in [33]. Gi ven the existence of ( ˆ G 0 , ˆ G 1 , ˆ u ) , the least fa vorable distributions can be obtained by solving the coupled minimax optimization problem, Maximization: ˆ g 0 = arg sup G 0 ∈ G 0 D u ( G 0 , G 1 ) s.t. g 0 > 0 , Υ( G 0 ) = Z Ω g 0 dµ = 1 ˆ g 1 = arg sup G 1 ∈ G 1 D u ( G 0 , G 1 ) s.t. g 1 > 0 , Υ( G 1 ) = Z Ω g 1 dµ = 1 Minimization: ˆ u = arg min u ∈ (0 , 1) D u ( ˆ G 0 , ˆ G 1 ) . (18) February 24, 2026 DRAFT 12 I I I . F I N I T E - S A M P L E M I N I M A X R O B U S T T E S T S V I A A S Y M P T O T I C T H E O RY In this section, LFDs and the asymptotically minimax robust tests are deri ved for various uncertainty classes considering the minimax optimization problem giv en by (18). The analysis includes the uncertainty classes based on the total v ariation distance as well as the band model. The deriv ations are intentionally presented in full generality so that the same analytic strategy can be applied by other researchers to new uncertainty classes. T o keep the exposition focused, detailed proofs are placed in the appendix. By analytically solving the coupled Lagrangian optimality conditions, classical results are recov ered, new ones are obtained (e.g., asymmetric total variation neighborhoods and one specialization of the band model), and the structural consequences of the theory are rev ealed — in particular , the clipped, piece wise-deﬁned form and uniqueness of the robust likelihood ratio, as well as the fact that its parameters are independent of the choice of u . A. T otal V ariation Neighborhood The total variation neighborhood is deﬁned as G j = { G j : D TV ( G j , F j ) ≤ ϵ j } , j ∈ { 0 , 1 } , where D TV ( G j , F j ) = 1 2 Z Ω | g j − f j | dµ. The LFDs and the corresponding minimax rob ust test for the uncertainty classes created by the total variation neighborhood were found earlier by Huber [10]. Howe ver , the design approach is heuristic, many choices of the parameters and/or functions are unkno wn and the test is obtained under the assumption that the robustness parameters are equal ϵ 0 = ϵ 1 . Since asymptotic minimax robustness is a necessary condition for ﬁnite-sample minimax robustness, the minimax robust test resulting from the total v ariation neighborhood can also be analytically deriv ed following the same design procedure as before. The following theorem substantiate this claim. February 24, 2026 DRAFT 13 Theorem III.1 (Least Fa vorable Distributions and Robust LRF under T otal V ariation) . Under total variation neighborhoods with radii ϵ 0 and ϵ 1 , the r obust likelihood ratio function is given by ˆ l =              t l , l < t l , l , t l ≤ l ≤ t u , t u , l > t u , (19) wher e 0 < t l < t u < ∞ . The least favorable distributions corr esponding to the r obust LRF take the form ˆ g 0 =              (1 − β t l ) f 0 + β f 1 , l < t l , f 0 , t l ≤ l ≤ t u , (1 − σ t u ) f 0 + σ f 1 , l > t u , ˆ g 1 =              t l ˆ g 0 , l < t l , f 1 , t l ≤ l ≤ t u , t u ˆ g 0 , l > t u . (20) wher e the coefﬁcients β ( t l , t u ) = ϵ 0 Z lt u ( f 1 − t u f 0 ) dµ (21) ar e explicit functions of the clipping thr esholds t l and t u . Pr oof. Lemma III.2 establishes that, under total variation neighborhoods, the robust likelihood ratio function admits the clipped form with thresholds 0 < t l < t u < ∞ . Giv en this robust likelihood ratio, Lemma III.3 sho ws that the corresponding least fa vorable distributions must co- incide with the nominal densities on the central region { t l ≤ l ≤ t u } and are linear combinations of f 0 and f 1 on the lower and upper clipping regions, respectiv ely . Imposing continuity at the region boundaries yields the explicit parametric expressions stated in the theorem. The remaining dependence of the coefﬁcients β and σ on the clipping thresholds t l and t u is determined by the normalization and total v ariation constraints, and is made explicit in Proposition III.4. Lemma III.2 (Parametric form of the robust LRF) . F or the total variation neighborhod, the r obust likelihood ratio function admits a piecewise parametric r epr esentation of the form ˆ g 1 ˆ g 0 =              t 1 , f 1 f 0 < t l , f 1 f 0 , t l ≤ f 1 f 0 ≤ t u , t u , f 1 f 0 > t u , (22) February 24, 2026 DRAFT 14 wher e t l < t u ar e determined by the Lagrang e multipliers. Mor eover , the parameter u ∈ (0 , 1) is uniquely determined by the Lagrange multipliers as u = log  λ 0 +2 µ 0 − λ 0 +2 µ 0  log  λ 0 +2 µ 0 − λ 0 +2 µ 0  + log  − λ 1 +2 µ 1 λ 1 +2 µ 1  . (23) Pr oof. The proof proceeds by analyzing the pointwise optimality conditions of the Lagrangian formulation associated with the total variation constraints. For ﬁxed u ∈ (0 , 1) , consider the Lagrangians L 0 ( g 0 , g 1 ; λ 0 , µ 0 ) = D u ( G 0 , G 1 ) + λ 0 ( D TV ( G 0 , F 0 ) − ϵ 0 ) + µ 0 (Υ( G 0 ) − 1)) , L 1 ( g 0 , g 1 ; λ 1 , µ 1 ) = D u ( G 0 , G 1 ) + λ 1 ( D TV ( G 1 , F 1 ) − ϵ 1 ) + µ 1 (Υ( G 1 ) − 1)) , (24) with Lagrange multipliers λ j ≥ 0 and µ j ∈ R . T aking pointwise ﬁrst variations with respect to g 0 and g 1 yields the ﬁrst-order stationarity conditions for µ -almost ev ery y ∈ Ω , (1 − u )  g 1 g 0  u + µ 0 + λ 0 2 sgn( g 0 − f 0 ) = 0 , (25) u  g 1 g 0  u − 1 + µ 1 + λ 1 2 sgn( g 1 − f 1 ) = 0 . (26) Equations (25)–(26) show that the ratio g 1 /g 0 can take only ﬁnitely many constant values, depending on the signs of g 0 − f 0 and g 1 − f 1 . There are nine possible combinations: g 0 ≶ f 0 , g 1 ≶ f 1 , g 0 = f 0 , g 1 = f 1 . (27) These cases are examined next. Case 1: g 0 = f 0 and g 1 = f 1 . Since both constraints are inacti ve, the optimality conditions imply g 1 g 0 = f 1 f 0 , (28) which deﬁnes the central region. Case 2: g 0 < f 0 and g 1 > f 1 . W e hav e sgn( g 0 − f 0 ) = − 1 and sgn( g 1 − f 1 ) = +1 . Solving (25)–(26) giv es g 1 g 0 = u ( − λ 0 + 2 µ 0 ) (1 − u )( λ 1 + 2 µ 1 ) =: t l . (29) Moreov er , since g 0 < f 0 and g 1 > f 1 , dividing both inequalities pointwise yields g 1 g 0 > f 1 f 0 Case 3: g 0 > f 0 and g 1 < f 1 . Here sgn( g 0 − f 0 ) = +1 and sgn( g 1 − f 1 ) = − 1 , and similarly g 1 g 0 = u (2 µ 0 + λ 0 ) (1 − u )(2 µ 1 − λ 1 ) =: t u , (30) February 24, 2026 DRAFT 15 where in this case we also hav e g 1 g 0 < f 1 f 0 . Remaining cases. All other sign combinations either reduce to one of the abov e three cases by di viding the deﬁning inequalities by g 0 or f 0 , or lead to infeasible solutions (e.g., negati ve constants for g 1 /g 0 ), which are ruled out by nonnegati vity of densities. Thus, no additional values of g 1 /g 0 arise. Collecting the admissible cases, the likelihood ratio function ˆ g 1 / ˆ g 0 admits the piece wise representation as giv en in Lemma III.2. The four equations—a pair from each case—must be compatible with a single u ∈ (0 , 1) . Eliminating t l and t u between the two pairs yields a consistency condition that determines u uniquely . After straightforward algebra one obtains u as giv en in (23). This completes the proof. Lemma III.3. Suppose the r obust LRF ˆ l admits the clipped form characterized in Lemma III.2, with thr esholds 0 < t l < t u < ∞ and a central r e gion on which ˆ l = f 1 /f 0 . Then ther e exist densities ( ˆ g 0 , ˆ g 1 ) such that • ˆ g 0 = f 0 and ˆ g 1 = f 1 on the r e gion { t l ≤ f 1 /f 0 ≤ t u } , • ˆ g 0 and ˆ g 1 ar e linear combinations of f 0 and f 1 on the clipping re gions { f 1 /f 0 < t l } and { f 1 /f 0 > t u } , and the r esulting pair ( ˆ g 0 , ˆ g 1 ) is given by the expr essions stated in Theor em III.1. Pr oof. Suppose the robust likelihood ratio function is giv en by Lemma III.2. On a region where ˆ g 1 / ˆ g 0 is a constant the pointwise stationary conditions imply that both ˆ g 0 and ˆ g 1 belong to the linear span of the nominal densities f 0 and f 1 . Consequently , on the lo wer clipping region { f 1 /f 0 < t l } there exist parameters α , β such that ˆ g 0 = αf 0 + β f 1 , ˆ g 1 = t l ( αf 0 + β f 1 ) , (31) and on the upper clipping region { f 1 /f 0 > t u } there exist parameters γ , σ such that ˆ g 0 = γ f 0 + σ f 1 , ˆ g 1 = t u ( γ f 0 + σ f 1 ) . (32) On the middle region { t l ≤ f 1 /f 0 ≤ t u } , the likelihood ratio coincides with the nominal likelihood ratio, and the pointwise stationarity conditions are satisﬁed by the nominal densities, yielding ˆ g 0 = f 0 , ˆ g 1 = f 1 . (33) February 24, 2026 DRAFT 16 It remains to determine the parameters in these linear representations. The parameters in (31) and (32) are further restricted by continuity of ˆ g 0 and ˆ g 1 at the boundaries of the three regions. Let y l be such that f 1 ( y l ) /f 0 ( y l ) = t l . Continuity at y l requires αf 0 ( y l ) + β f 1 ( y l ) = f 0 ( y l ) , (34) which implies α + β t l = 1 . Similarly , letting y u satisfy f 1 ( y u ) /f 0 ( y u ) = t u , continuity at y u yields γ + σ t u = 1 . Substituting these relations into (31)–(32) together with (33) results in the piece wise expressions stated in Theorem III.1. This completes the proof. Since the least fav orable distrib utions constructed abov e are independent of the parameter u , they maximize D u ( G 0 , G 1 ) simultaneously for all u ∈ (0 , 1) . Proposition III.4 (Identiﬁcation of the Clipping Thresholds) . Let ˆ g 0 and ˆ g 1 be the least favorable distributions under total variation neighborhoods with radii ϵ 0 and ϵ 1 , given in Theor em III.1. Then the clipping thr esholds t l and t u ar e uniquely determined by the system Z lt u  f 1 − t u f 0  dµ − ϵ 0 t u = ϵ 1 . (35) Pr oof. The least fa vorable distributions ˆ g 0 and ˆ g 1 are subject to the four deﬁning constraints D TV ( G 0 , F 0 ) = ϵ 0 , D TV ( G 1 , F 1 ) = ϵ 1 , Υ( G 0 ) = 1 , Υ( G 1 ) = 1 . (36) Substituting the parametric forms of ˆ g 0 and ˆ g 1 gi ven in Theorem III.1 into these constraints yields four scalar equations in volving the unkno wn parameters β , σ , t l , and t u . The unit-mass constraint for ˆ g 0 together with the total v ariation constraint for ˆ g 0 uniquely determine the coefﬁcients β and σ as functions of the clipping thresholds t l and t u . In particular, these coefﬁcients can be eliminated from the remaining constraints by expressing them in terms of t l and t u . Inserting these expressions into the unit-mass constraint for ˆ g 1 and the total v ariation constraint for ˆ g 1 , and simplifying using the piecewise structure induced by the clipping regions { l < t l } and { l > t u } , the resulting conditions reduce to the pair of equations gi ven in 35. Remark III.1 (Symmetric total variation neighborhoods) . When ϵ 0 = ϵ 1 , the minimax problem is in v ariant under interchange of the hypotheses f 0 and f 1 . Since the least fav orable likelihood ratio is unique, its clipped form must be in variant under the transformation l 7→ 1 /l . This in v ariance maps the lower clipping region { l < t l } onto the upper clipping region { l > t u } , which is February 24, 2026 DRAFT 17 possible if and only if t l t u = 1 . Consequently , the symmetric total v ariation case reduces to a one- parameter family of least fav orable distributions. Under this symmetry constraint, the coef ﬁcients in the least fa vorable distrib utions satisfy 1 − β t l = β and 1 − σ t u = σ . Substituting these expressions into the general parametric form recovers Huber’ s classical results [10]. The present formulation makes explicit that, in the symmetric case, the apparent two-parameter representation of Huber reduces intrinsically to a single parameter . Remark III.2 (T ilting structure of the least fa vorable distributions) . The least fav orable distrib u- tions admit a simple tilting interpretation. In the clipping regions, they can be written as linear perturbations of the nominal distributions. In particular , ˆ g 0 = f 0 + ϵ 0                    f 1 − t l f 0 R lt u ( f 1 − t u f 0 ) dµ , l > t u , (37) with an analogous expression for ˆ g 1 through the relation ˆ g 1 = ˆ l ˆ g 0 . Since f 1 − t l f 0 < 0 on { l < t l } and f 1 − t u f 0 > 0 on { l > t u } , the perturbation reduces probability mass in the lo wer likelihood- ratio region and reallocates the same amount of mass to the upper likelihood-ratio region, while leaving the central region t l ≤ l ≤ t u unchanged. This tilting structure makes explicit how total v ariation robustness redistributes probability mass in an adversarial yet controlled manner , concentrating it on observ ations that are most fa vorable to the competing hypothesis. B. Band Model So far , the nominal distributions hav e been assumed to be known or to be reasonably well approximated prior to the construction of the uncertainty classes. Howe ver , in many settings such precise kno wledge is unav ailable, and the true distributions are only known to lie within prescribed pointwise bounds. This motiv ates the use of band-type uncertainty models [18], which capture distributional uncertainty through lower and upper bounding functions on the densities. The band model is giv en by the uncertainty classes G j =  G j ∈ M : g L j ≤ g j ≤ g U j  (38) February 24, 2026 DRAFT 18 where M is the set of all distribution functions on Ω , and g L j and g U j are non-negati ve lo wer and upper bounding functions such that G 0 and G 1 are nonempty sets. This implies Z Ω g L j dµ ≤ 1 ≤ Z Ω g U j dµ, j ∈ { 0 , 1 } . Moreov er , g L j and g U j should be chosen such that g 0 and g 1 are distinct density functions, if not G 0 ∩ G 1  = ∅ and minimax hypothesis testing is not possible. Band models differ fundamentally from distance-based uncertainty classes. While total v ariation and f -div ergence neighborhoods constrain distributions only in an integral sense and therefore permit highly concentrated local deviations, band models enforce pointwise bounds on the den- sities and exclude such beha vior by construction. Consequently , band models are not equiv alent to, nor recoverable from, distance-based uncertainty classes except in degenerate cases. From a theoretical standpoint, band models constitute capacity-type uncertainty classes. How- e ver , it has long been unclear whether they satisfy the structural properties required for the direct application of Huber’ s minimax rob ustness theory , such as alternation [17]. As a result, general minimax robustness guarantees could not be established for band models within Huber’ s frame work [15]. From a practical perspective, band models naturally arise in applications where density estimates are accompanied by pointwise conﬁdence bounds, making lower and upper bounding functions an appropriate representation of uncertainty [18]. Follo wing the same approach as in the previous sections, the asymptotically minimax robust test and least fa vorable distributions for the band model can be deriv ed as follows. Consider the Lagrangians: L 0 ( g 0 , g 1 , λ 0 , θ 0 , µ 0 ) = D u ( G 0 , G 1 ) + λ 0 ( g 0 − g L 0 ) + ν 0 ( g U 0 − g 0 ) + µ 0 (Υ( G 0 ) − 1) , L 1 ( g 0 , g 1 , λ 1 , θ 1 , µ 1 ) = D u ( G 0 , G 1 ) + λ 1 ( g 1 − g L 1 ) + ν 1 ( g U 1 − g 1 ) + µ 1 (Υ( G 1 ) − 1) , where µ j are scalar, and λ j and ν j are functional Langrangian multipliers. T aking the Gateaux deri vati ves of the Lagrangians, at the direction of unit area integrable functions ψ 0 and ψ 1 , respecti vely , leads to the ﬁrst-order stationarity conditions ∂ L 0 ∂ g 0 = Z  (1 − u )  g 1 g 0  u + λ 0 − ν 0 + µ 0  ψ 0 dµ = 0 , ∂ L 1 ∂ g 1 = Z u  g 1 g 0  u − 1 + λ 1 − ν 1 + µ 1 ! ψ 1 dµ = 0 . (39) February 24, 2026 DRAFT 19 The conditions in (39) are accompanied by complementary slackness for the band constraints, which for j ∈ { 0 , 1 } read λ j ( y )  g j ( y ) − g L j ( y )  = 0 , ν j ( y )  g U j ( y ) − g j ( y )  = 0 , (40) with λ j ( y ) ≥ 0 and ν j ( y ) ≥ 0 almost ev erywhere. Consequently , on regions where a bound is inacti ve, the corresponding Lagrange multiplier vanishes almost ev erywhere, and the stationarity conditions reduce to pointwise equations in volving only the remaining multipliers. Depending on which bounds are activ e, three distinct cases arise, which are analyzed separately below . Case 1. g U 0 = ∞ and g U 1 = ∞ (no upper bounding functions): In this case, letting g L j = (1 − ϵ j ) f j , the band model can equiv alently be written as the lo wer ϵ -contamination model G ϵ − j = { G j : G j = (1 − ϵ j ) F j + ϵ j H , H ∈ M } where f j are the nominal densities and 0 ≤ ϵ j < 1 [18]. Since we have ν 0 = 0 and ν 1 = 0 e verywhere, and hence, no constraints reg arding the upper bounding functions are in effect, there are four conditions regarding the Lagrangians L 0 : g 0 = g L 0 on A 0 and g 0 > g L 0 on Ω \ A 0 , L 1 : g 1 = g L 1 on A 1 and g 1 > g L 1 on Ω \ A 1 . The integrals in (39) are deﬁned on the regions where g 0 > g L 0 and g 1 > g L 1 , respecti vely . On these regions, the lower band constraints are inacti ve and, by complementary slackness, the corresponding multipliers satisfy λ 0 = λ 1 = 0 almost e verywhere. Since ν 0 = 0 and ν 1 = 0 e verywhere in this case, the stationarity conditions reduce to g 1 g 0 = 1 k 2 on ¯ A 0 = Ω \ A 0 = { y : g 0 > g L 0 } , g 1 g 0 = k 1 on ¯ A 1 = Ω \ A 1 = { y : g 1 > g L 1 } , (41) where k 1 and k 2 are positiv e constants determined by the normalization constraints. Theorem III.5. F r om (41) , it follows that the LFDs and the corr esponding likelihood ratio function ar e unique and given by ˆ g 0 =      g L 0 , y ∈ A 0 k 2 g L 1 , y ∈ ¯ A 0 , ˆ g 1 =      g L 1 , y ∈ A 1 k 1 g L 0 , y ∈ ¯ A 1 , (42) February 24, 2026 DRAFT 20 and ˆ g 1 ˆ g 0 =            1 k 2 , y ∈ ¯ A 0 ∩ A 1 g L 1 g L 0 , y ∈ A 0 ∩ A 1 k 1 , y ∈ A 0 ∩ ¯ A 1 . Pr oof: The claim follows from the conditions: 1. The sets A 0 , A 1 , ¯ A 0 and ¯ A 1 are all non-empty . 2. The set ¯ A 0 ∩ ¯ A 1 is empty . 3. On ¯ A 0 and ¯ A 1 , respectiv ely , we have ˆ g 0 = k 2 g L 1 and ˆ g 1 = k 1 g L 0 . A detailed proof for each of these conditions is gi ven in Appendix C. Corollary III.6. The parameter s should satisfy k 1 < 1 /k 2 , hence, A 0 ∩ A 1 = { k 1 ≤ g L 1 /g L 0 ≤ 1 /k 2 } . Mor eover , A 0 = { g L 1 /g L 0 < 1 /k 2 } , A 1 = { g L 1 /g L 0 > k 1 } . Pr oof: A proof of Corollary III.6 is giv en in Appendix D Remark III.3 . Let t u = 1 /k 2 , t l = k 1 and l = g L 1 /g L 0 . Then, the LFDs and the robust LRF can be rewritten as ˆ g 0 =      g L 0 , l ≤ t u 1 /t u g L 1 , l > t u , ˆ g 1 =      g L 1 , l ≥ t l t l g L 0 , l < t l , (43) and ˆ g 1 ˆ g 0 =            t u , l > t u l , t l ≤ l ≤ t u t l , l < t l . (44) The lower bounding function constraints are satisﬁed automatically . Because, on { l ≤ t u } and { l ≥ t l } , ˆ g j ≥ g L j holds with equality , and on { l > t u } and { l < t l } , we necessarily have February 24, 2026 DRAFT 21 ˆ g 0 = 1 /t u g L 1 ≥ g L 0 and ˆ g 1 = t l g L 0 ≥ g L 1 , respecti vely , as l = g L 1 /g L 0 . The density function constraints are satisﬁed by solving Z l ≤ t u g L 0 dµ + 1 t u Z l>t u g L 1 d µ = 1 , Z l ≥ t l g L 1 dµ + t l Z l 0 . By the condition of no low er bounding functions, we have λ 0 = 0 and λ 1 = 0 everywhere. Similarly , the positivity constraints are also not imposed as before because, as it can be seen later, the density functions automatically satisfy these constraints. In this case, there are four conditions regarding the Lagrangians: L 0 : g 0 = g U 0 on A 0 and g 0 < g U 0 on Ω \ A 0 , L 1 : g 1 = g U 1 on A 1 and g 1 < g U 1 on Ω \ A 1 . (47) The integrals in (39) are deﬁned on the regions where g 0 < g U 0 and g 1 < g U 1 , respectiv ely . On these regions, the upper band constraints are inactiv e and, by complementary slackness, the corresponding multipliers satisfy ν 0 = ν 1 = 0 almost e verywhere. Since λ 0 = 0 and λ 1 = 0 e verywhere in this case, the stationarity conditions reduce to g 1 g 0 = 1 k 2 on ¯ A 0 = Ω \ A 0 = { y : g 0 < g U 0 } , g 1 g 0 = k 1 on ¯ A 1 = Ω \ A 1 = { y : g 1 < g U 1 } , (48) February 24, 2026 DRAFT 22 where k 1 and k 2 are positiv e constants determined by the normalization constraints. Theorem III.7. Let t l = 1 /k 2 , t u = k 1 and l = g U 1 /g U 0 . It follows that the LFDs and the corr esponding LRF are unique and given by ˆ g 0 =      g U 0 , l ≥ t l 1 /t l g U 1 , l < t l , ˆ g 1 =      g U 1 , l ≤ t u t u g U 0 , l > t u , (49) and ˆ g 1 ˆ g 0 =            t l , l < t l l , t l ≤ l ≤ t u t u , l > t u . (50) Mor eover , all the Lagrangian constraints ar e satisﬁed and in particular the LFDs ar e obtained by solving Z l ≥ t l g U 0 dµ + 1 t l Z lt u g U 0 dµ = 1 . Pr oof: A proof of Theorem III.7 is giv en in Appendix E. Theorem III.8. The LFDs in Theor em III.7 ar e single-sample minimax r obust, i.e. G 0 h ˆ l < t i ≥ ˆ G 0 h ˆ l < t i , G 1 h ˆ l < t i ≤ ˆ G 1 h ˆ l < t i (51) for all t ∈ R ≥ 0 and ( G 0 , G 1 ) ∈ G 0 × G 1 . Pr oof: A proof of Theorem III.8 is giv en in Appendix F. Case 3. g L j < g j < g U j (the general case): The uncertainty classes for the general case are obtained by the intersection of lower and upper ϵ -contamination neighborhoods G j = G ϵ − j ∩ G ϵ + j . There are six conditions regarding the Lagrangians L 0 : g 0 = g L 0 on A 0 , g 0 = g U 0 on A 1 and g L 0 < g 0 < g U 0 on A 2 , L 1 : g 1 = g L 1 on A 3 , g 1 = g U 1 on A 4 and g L 1 < g 1 < g U 1 on A 5 . (52) February 24, 2026 DRAFT 23 On regions where g L 0 < g 0 < g U 0 and g L 1 < g 1 < g U 1 , the functional multipliers λ j and ν j v anish pointwise by complementary slackness, and the stationarity conditions reduce to g 1 g 0 = k 2 on A 2 = { y : g L 0 < g 0 < g U 0 } , g 1 g 0 = k 1 on A 5 = { y : g L 1 < g 1 < g U 1 } , (53) where k 1 and k 2 are some positiv e constants. Theorem III.9. Ther e ar e thr ee differ ent asymptotically minimax r obust likelihood ratio func- tions, T ype-A : ˆ g 1 ˆ g 0 =                          g U 1 /g L 0 , g U 1 /g L 0 ≤ k 2 k 2 , g U 1 /g L 0 > k 2 > g U 1 /g U 0 g U 1 /g U 0 , k 2 ≤ g U 1 /g U 0 ≤ k 1 k 1 , g U 1 /g U 0 > k 1 > g L 1 /g U 0 g L 1 /g U 0 , g L 1 /g U 0 ≥ k 1 , T ype-B : ˆ g 1 ˆ g 0 =            g U 1 /g L 0 , g U 1 /g L 0 ≤ k 1 k 1 , g U 1 /g L 0 > k 1 > g L 1 /g U 0 g L 1 /g U 0 , g L 1 /g U 0 ≥ k 1 , T ype-C : ˆ g 1 ˆ g 0 =                          g U 1 /g L 0 , g U 1 /g L 0 ≤ k 1 k 1 , g U 1 /g L 0 > k 1 > g L 1 /g L 0 g L 1 /g L 0 , k 1 ≤ g L 1 /g L 0 ≤ k 2 k 2 , g L 1 /g L 0 > k 2 > g L 1 /g U 0 g L 1 /g U 0 , g L 1 /g U 0 ≥ k 2 , with the corr esponding pairs of LFDs, respectively , ˆ g 0 =            g L 0 , g U 1 /g L 0 ≤ k 2 1 k 2 g U 1 , g U 1 /g L 0 > k 2 > g U 1 /g U 0 g U 0 , g U 1 /g U 0 ≥ k 2 , ˆ g 1 =            g L 1 , g L 1 /g U 0 ≥ k 1 k 1 g U 0 , g U 1 /g U 0 > k 1 > g L 1 /g U 0 g U 1 , g U 1 /g U 0 ≤ k 1 , February 24, 2026 DRAFT 24 ˆ g 0 =                    g L 0 , g U 1 /g L 0 ≤ k 1 k 2 ( g L 0 + h 1 ) , g U 1 /g L 0 > k 1 ≥ g L 1 /g L 0 k 2 k 1 ( g L 1 + h 2 ) , g L 1 /g L 0 > k 1 > g L 1 /g U 0 g U 0 , g L 1 /g U 0 ≥ k 1 , ˆ g 1 =                    g U 1 , g U 1 /g L 0 ≤ k 1 k 1 k 2 ( g L 0 + h 1 ) , g U 1 /g L 0 > k 1 ≥ g L 1 /g L 0 k 2 ( g L 1 + h 2 ) , g L 1 /g L 0 > k 1 > g L 1 /g U 0 g L 1 , g L 1 /g U 0 ≥ k 1 , ˆ g 0 =            g L 0 , g L 1 /g L 0 ≤ k 2 1 k 2 g L 1 , g L 1 /g L 0 > k 2 > g L 1 /g U 0 g U 0 , g L 1 /g U 0 ≥ k 2 , ˆ g 1 =            g L 1 , g L 1 /g L 0 ≥ k 1 k 1 g L 0 , g U 1 /g L 0 > k 1 > g L 1 /g L 0 g U 1 , g U 1 /g L 0 ≤ k 1 , Mor eover , LRFs of T ype-A and T ype-C tend to clipped likelihood ratio functions, e.g., as given by (44) with the corr esponding LFDs deﬁned by (43) . Pr oof: A proof of Theorem III.9 is giv en in Appendix G. In practice, two situations may arise. If the structural type of the robust likelihood ratio function is known a priori, the parameters k 1 and k 2 can be determined by enforcing the unit- mass constraints on the least fav orable densities. Otherwise, the minimax problem formulated in Section II-C may be solved numerically as a con ve x optimization problem over the model (38). In this case, the densities can be discretized by sampling, and numerical integration may be carried out using standard techniques such as the trapezoidal rule. I V . N U M E R I C A L D E R I V A T I O N O F M I N I M A X R O B U S T T E S T S V I A C O N V E X O P T I M I Z A T I O N When analytic characterizations of the least fa vorable distributions are unav ailable or in- tractable, the asymptotic minimax design problem can be cast as a con ve x optimization problem after discretizing the domain Ω . This numerical formulation applies to a broad class of con vex uncertainty sets. In this work, moment classes and p -point classes are treated as representativ e examples. Let Ω be discretized into n grid points { x 1 , . . . , x n } with uniform spacing ∆ x = ( x max − x min ) / ( n − 1) . Each density g is represented as a nonnegati ve vector g = ( g 1 , . . . , g n ) satisfying the normalization constraint n X i =1 g i ∆ x = 1 . (54) February 24, 2026 DRAFT 25 The u -afﬁnity functional D u ( g 0 , g 1 ) is then approximated using the trapezoidal integration rule as b D u ( g 0 , g 1 ) ≈ n X i =1 g u 1 ,i g 1 − u 0 ,i ∆ x. (55) In the asymptotic minimax rob ust formulation, the design of LFDs in volv es solving the minimax problem min u ∈ (0 , 1) max ( g 0 ,g 1 ) ∈ G 0 × G 1 b D u ( g 0 , g 1 ) , (56) where G 0 and G 1 are con vex sets describing the uncertainty classes. The inner maximization ov er ( g 0 , g 1 ) is con vex for ﬁxed u , and the outer minimization over u can be performed via a scalar search. Based on Theorem II.6, whenev er a ﬁnite-sample minimax robust (FMR) test exists for a giv en uncertainty class, the same least fa vorable distributions maximize D u for all u ∈ (0 , 1) . In such cases, the minimization ov er u can be skipped entirely , and u can be set to any con venient value (e.g., u = 0 . 5 ). For the moment and p -point classes considered here, both the counterexample provided in [34] and our own numerical experiments indicate that they are not ﬁnite-sample minimax robust. Nev ertheless, asymptotically minimax rob ust tests do e xist for these classes, and their numerical deri vation will be presented in the simulations section. A. Moment Classes Moment classes, originally introduced in [13], model partial information about the distribution through bounds on generalized moments G j =  G j ∈ M : a k j ≤ E G j [ h k j ( Y )] ≤ b k j  , k ∈ { 1 , . . . , K } , (57) where h k j are real-v alued continuous functions, ( a k j , b k j ) are giv en bounds, and K is the number of constraints. The bounds should be chosen so that G 0 ∩ G 1 = ∅ . Upon discretization, each moment constraint becomes a linear inequality in g j , a k j ≤ n X i =1 h k j ( x i ) g j,i ∆ x ≤ b k j . (58) The resulting ﬁnite-dimensional minimax program reads min u ∈ (0 , 1) max g 0 ,g 1 n X i =1 g u 1 ,i g 1 − u 0 ,i ∆ x s.t. 1 ⊤ g j ∆ x = 1 , g j ≥ 0 , j = 0 , 1 , a k j ≤ ( h k j ) ⊤ g j ∆ x ≤ b k j , k = 1 , . . . , K . February 24, 2026 DRAFT 26 This is a con ve x optimization problem in ( g 0 , g 1 ) for ev ery ﬁxed u and can be solved using general-purpose con vex solvers; the minimization o ver u can be performed, for example, via a simple grid search. An example of LFDs for a speciﬁc set of moment bounds is giv en in Section V. B. P-point Classes The p-point class models partial information in the form of probability masses assigned to disjoint subsets of Ω . A generalized deﬁnition is G j =  G j ∈ M : a k j ≤ G j ( A k j ) ≤ b k j  , k ∈ { 1 , . . . , K } , (59) where A k j ∈ A are disjoint measurable subsets and ( a k j , b k j ) specify allow able mass interv als. This generalizes the models in [4], [35]. When Ω is discretized, each A k j becomes a kno wn index set I k j ⊂ { 1 , . . . , n } , and (59) reduces to linear constraints a k j ≤ X i ∈I k j g j,i ∆ x ≤ b k j . (60) The resulting optimization has the same structure as in the moment-classes. An example of LFDs under p -point constraints is provided in Section V. C. Hybrid Models and Generality of the Method The same discretization–optimization frame work can handle hybrid models that combine dif ferent con vex constraints, such as simultaneous moment and p -point bounds. More generally , any con ve x uncertainty class whose constraints admit a con ve x ﬁnite-dimensional representation after discretization can be handled by this method. The choice of n and ∆ x determines the trade-of f between numerical accuracy and computational cost; typical v alues of n ∈ { 100 , 200 } yield sufﬁcient accuracy for the smooth densities considered here. D. Summary For moment classes, p -point classes, and their hybrids, the LFDs are obtained by discretizing the domain and solving a ﬁnite-dimensional con vex program for each ﬁxed u , combined with an outer minimization ov er u (which can be implemented via line search or bisection). The ov erall procedure is summarized in Algorithm 1. This frame work extends the analytic solutions av ailable for di ver gence-based classes and the band model, providing a uniﬁed computational method for robust hypothesis testing across a wide range of con ve x uncertainty classes. February 24, 2026 DRAFT 27 Algorithm 1: Numerical LFD Design via Con ve x Optimization Require: Uncertainty classes G 0 , G 1 with con ve x constraints; domain Ω = [ x min , x max ] ; number of grid points n ; optional ﬁxed u 0 ∈ (0 , 1) if u -minimization is skipped. Ensure: Least fa vorable densities ( ˆ g 0 , ˆ g 1 ) . 1: Discretize Ω into n grid points { x 1 , . . . , x n } with spacing ∆ x . 2: Express the constraints of G 0 , G 1 as inequalities in g 0 , g 1 . 3: if u -minimization is required then 4: Initialize a search set U ⊂ (0 , 1) . 5: for each u ∈ U do 6: Solve the con ve x program: maximize n X i =1 g u 1 ,i g 1 − u 0 ,i ∆ x subject to 1 ⊤ g j ∆ x = 1 , g j ≥ 0 , j = 0 , 1 , class constraints for G 0 , G 1 . 7: Record the objectiv e v alue and corresponding ( g 0 , g 1 ) . 8: end for 9: Select u ∗ and ( ˆ g 0 , ˆ g 1 ) achieving the minimum over u . 10: else 11: Solve the con ve x program above for u = u 0 . 12: end if 13: return (ˆ g 0 , ˆ g 1 ) . V . S I M U L A T I O N S This section illustrates and ev aluates the theoretical ﬁndings through representative examples. The notation N ( µ, σ 2 ) denotes a Gaussian distribution with mean µ and v ariance σ 2 , and f N denotes the corresponding density function. All systems of equations are solved using the damped Newton method [36], and con vex optimization problems are handled by interior-point methods [37]. February 24, 2026 DRAFT 28 f 0 f 1 g  0 ( ϵ 0 = 0.1 , ϵ 1 = 0.1 ) g  1 ( ϵ 0 = 0.1 , ϵ 1 = 0.1 ) g  0 ( ϵ 0 = 0.1 , ϵ 1 = 0.15 ) g  1 ( ϵ 0 = 0.1 , ϵ 1 = 0.15 ) g  0 ( ϵ 0 = 0.15 , ϵ 1 = 0.1 ) g  1 ( ϵ 0 = 0.15 , ϵ 1 = 0.1 ) - 4 - 2 2 4 y 0.1 0.2 0.3  f j , g  j  f 0 f 1 g  0 ( ϵ 0 = 0.1 ) g  1 ( ϵ 1 = 0.1 ) g  0 ( ϵ 0 = 0.15 ) g  1 ( ϵ 1 = 0.15 ) - 4 - 2 2 4 y 0.1 0.2 0.3  f j , g  j  l l  ( ϵ 0 = 0.1 , ϵ 1 = 0.1 ) l  ( ϵ 0 = 0.1 , ϵ 1 = 0.15 ) l  ( ϵ 0 = 0.15 , ϵ 1 = 0.1 ) - 4 - 2 0 2 4 y 0.2 0.5 1 2 5  l , l   l l  ( ϵ 0 = 0.1 , ϵ 1 = 0.1 ) l  ( ϵ 0 = 0.1 , ϵ 1 = 0.15 ) l  ( ϵ 0 = 0.15 , ϵ 1 = 0.1 ) - 4 - 2 0 2 4 y 0.2 0.5 1 2 5  l , l   Fig. 1. Least fav orable densities (top) and robust likelihood ratio functions (bottom) for (left) total variation distance based uncertainty classes and (right) ϵ -contamination model. A. T otal V ariation and ϵ -contamination Neighborhoods In this subsection, the least fa vorable densities (LFDs) and the corresponding likelihood ratio functions (LRFs) resulting from the total variation neighborhood are illustrated and compared with those of the classical ϵ -contamination model. Unit-v ariance mean-shifted Gaussian distribu- tions f j ∼ N (2 j − 1 , 1) are considered as the nominal distributions. For various conﬁgurations, the parameters of the least fav orable densities gi ven in Theorem III.1 and Remark III.3 have been obtained by solving the pair of equations giv en by (35) and (45), respectiv ely . Figure 1 illustrates the resulting LFDs and the corresponding LRFs together with the nominal distributions. From this example the following observations can be made: 1) Both models lead to clipped likelihood ratio tests; howe ver , their LFDs are not identical February 24, 2026 DRAFT 29 e ven when their LRFs coincide. This can be deduced by examining the region where the robust and nominal LRFs are equal (i.e. for ϵ 0 = ϵ 1 = 0 . 1 and y ∈ [ − 0 . 7 , 0 . 7] ). In this region, the LFDs of the total variation neighborhood coincide with each other , while those of the ϵ -contamination model do not. Hence, the two models dif fer in their least fav orable density structures, despite having exactly the same LRFs. 2) Identical clipping thresholds ( t ℓ , t u ) can be obtained by appropriately adjusting the robust- ness parameters. In particular , the total v ariation model with ϵ 0 = ϵ 1 ≈ 0 . 08875 produces the same upper and lower thresholds as the ϵ -contamination model with ϵ 0 = ϵ 1 = 0 . 1 . This indicates that, in terms of likelihood-ratio compression, the total variation neighborhood induces stronger robustness ef fects for the same robustnes parameters. 3) The ef fects of unequal robustness parameters are similar at the lower and upper clipping regions, and dif ferent in the middle re gion. For both models, increasing either ϵ 0 or ϵ 1 decreases the upper clipping threshold and increases the lo wer one (a vertical compression), while the horizontal displacement—rightward for larger ϵ 1 and leftward for larger ϵ 0 —is observed only in the ϵ -contamination model. 4) From a computational standpoint, both models require two equations -each having a single v ariable- to be solved. While, the two equations of the total v ariation based uncertainty model are coupled in ϵ 0 and ϵ 1 , those that of the ϵ -contamination model are not. This explains why one needs three pairs of LFD for the total v ariation based uncertainy model and only two pairs for the ϵ -contamination model in Figure 1. 5) The ov erall deformation of the LRFs with changing robustness parameters therefore differs fundamentally between the two models. These comparisons highlight an important distinction: while both uncertainty models yield clipped likelihood ratio tests, their least fav orable densities exhibit fundamentally different be- haviors, particularly when ϵ 0  = ϵ 1 . The new analytical formulation for the total v ariation neighborhood thus enables robust test design under unequal robustness parameters—a capability not av ailable in previous literature. B. Band Model Asymptotically minimax robust tests arising from the band model can similarly be simulated. Consider the lower bounding functions g L 0 ( y ) = (1 − ϵ ) f N ( y ; − 1 , 4) , g L 1 ( y ) = (1 − ϵ ) f N ( y ; 1 , 4) , February 24, 2026 DRAFT 30 where the contamination ratio is chosen to be ϵ = 0 . 2 . Furthermore, let the upper bounding functions be g U 0 ( y ) = (1 + ε ) f N ( y ; − 1 , 4) , g U 1 ( y ) = (1 + ε ) f N ( y ; 1 , 4) , with the parameters ε = 0 . 2 (T ype-A), ε = 0 . 5 (T ype-B), ε = 1 . 5 (T ype-C) or ε = 19 (T ype-C), simulating three different types of robust LRFs resulting from the band model, cf. Section III-B. For this setup, and excluding ε = 19 for the sake of clarity , Figure 2 illustrates the corresponding LFDs together with the lower bounding functions, and the upper bounding functions for ε = 0 . 2 . For ε = 0 . 5 , the LFDs are ov erlapping around y = 0 , leading to ˆ l = 1 . This type of overlapping has previously been reported by [38] for single-sample minimax robust tests obtained from the KL-di vergence neighborhood. Ho wev er , the test in [38] is not minimax rob ust unless a well deﬁned randomized decision rule is used. In Figure 3, the corresponding robust likelihood ratio functions are illustrated. Increasing ε transforms the corresponding robust LRF from T ype-A to T ype-B and then to T ype-C. Further increasing ε , i.e. when ε = 19 , the robust LRF tends to a clipped likelihood ratio test, which is the limiting LRF stated in Section III-B. The robust LRFs can take dif ferent shapes depending on the bounding functions. Similar patterns were stated in [18] and also observed in [19]. In the second example the v ariance of T ype-B upper bounding functions are changed from 4 to 9 and the rest of the setup is kept the same as before. Figure 4 illustrates the LFDs together with the upper and lower bounding functions, and the corresponding robust LRF . This example sho ws both an asymmetric and a degenerate case, where the latter is due to the f act that the region g U 1 /g L 0 ≤ k 1 does not exist, as the optimum solution satisﬁes g U 1 /g L 0 > k 1 . Also the LFDs in the middle two regions, where the corresponding LRF is constant, were determined as a result of numerical optimization, see Section IV, as the LFDs in these regions depend on two general functions h 1 and h 2 , which cannot be recov ered as linear combinations of upper and lo wer bounding functions. C. Moment Classes The LFDs and robust LRFs arising from the moment classes can be exempliﬁed by solving the con ve x optimization problem numerically by using Algorithm 1 where the class constraints − 2 ≤ E G 0 [ Y ] ≤ − 0 . 5 , 0 . 5 ≤ E G 1 [ Y ] ≤ 2 , 0 ≤ E G 0 [ Y 2 ] ≤ 2 , 2 ≤ E G 1 [ Y 2 ] ≤ 4 , February 24, 2026 DRAFT 31 g 0 L g 1 L g 0 U ( ϵ = 0.2 ) g 1 U ( ϵ = 0.2 ) g  0 ( Type - A ) g  1 ( Type - A ) g  0 ( Type - B ) g  1 ( Type - B ) g  0 ( Type - C ) g  1 ( Type - C ) - 5 0 5 y 0.05 0.10 0.15 0.20 0.25  g  0 , g  1  Fig. 2. Three different pairs of LFDs arising from the band model together with the bounding functions for ε ∈ { 0 . 2 , 0 . 5 , 1 . 5 } . g 1 U / g 0 L g 1 L / g 0 L g 1 L / g 0 U l  ( Type - A ) l  ( Type - B ) l  ( Type - C ) l  ( Type - C ) - 5 5 y 0.001 0.010 0.100 1 10 100 1000 l  Fig. 3. Three different types of Robust LRFs arising from the band model for ε ∈ { 0 . 2 , 0 . 5 , 1 . 5 , 19 } together with the nominal LRF . are deﬁned ov er the ﬁrst and second moments of the probability density functions. Figure 5 illustrates the LFDs and the corresponding robust LRF , respectiv ely . February 24, 2026 DRAFT 32 g 0 U , g 0 L , g 1 L , g 1 U g  0 ( Type - B ) g  1 ( Type - B ) - 10 - 5 0 5 10 y 0.1 0.2 0.3  g  0 , g  1  g 1 U / g 0 L g 1 L / g 0 L g 1 L / g 0 U l  - 10 - 5 5 10 y 1 10 100 1000 l  Fig. 4. Degenerate T ype-B least favorable densities (top) and likelihood ratio functions (bottom) for band model with asymmetric variances. g  0 g  1 - 5 0 5 y 2 4 6  g  0 , g  1  - 5 5 y 0.5 1 5 l  Fig. 5. Least favorable densities (top) and the likelihood ratio function (bottom) for moment-constrained uncertainty classes in the gi ven example. February 24, 2026 DRAFT 33 g  0 g  1 - 5 0 5 y 0.05 0.10 0.15 0.20  g  0 , g  1  - 5 0 5 y 0.5 1 l  Fig. 6. Least favorable densities (top) and the likelihood ratio function (bottom) for p-point uncertainty classes in the given example. D. P-point Classes Similarly , an example to the asymptotically minimax robust test arising from the p-point classes can be gi ven. Consider the p-point classes deﬁned by the constraints Z 3 − 5 g 0 ( y ) dy ≤ 0 . 3 , Z 3 0 g 1 ( y ) dy ≥ 0 . 8 . Figure 6 illustrates the LFDs and the corresponding robust LRF , respectiv ely . A nice property stated by Lemma II.7 can be applied here to see whether the obtained AMR tests are also FMR. For both moment classes and p-point classes the LFDs vary with varying u . This implies that the tests deri ved are only AMR and not FMR. V I . C O N C L U S I O N This paper has established a formal equi valence between ﬁnite-sample and asymptotically minimax robust hypothesis testing. Speciﬁcally , it was sho wn that when a ﬁnite-sample minimax robust test exists, it coincides with the test deriv ed via asymptotic minimax theory . This result provides a unifying perspectiv e and enables the analytical deri vation of minimax robust tests without relying on heuristic constructions. As a demonstration, the total variation and band model uncertainty classes were analyzed. In both cases, least fav orable distributions and the associated robust likelihood ratio functions were February 24, 2026 DRAFT 34 deri ved in closed parametric forms. The results generalized earlier work, notably extending Huber’ s design to allo w for unequal robustness parameters and offering an analytical foundation for designs previously constructed heuristically . Beyond these speciﬁc models, the ﬁndings in this paper suggest that asymptotic theory of fers a broadly applicable and efﬁcient frame work for designing ﬁnite-sample minimax robust tests whene ver they exist. This closes a long-standing gap in the literature and provides new tools for robust decision-making under distributional uncertainty . A P P E N D I X A P RO O F O F T H E O R E M I I . 5 The equiv alence stated by (12) was prov en in [15, Section 6] 2 for a version of D f ; D f ∗ ( G 0 , G 1 ) = Z Ω f ∗  g 0 g 0 + g 1  ( g 0 + g 1 ) dµ, where f ∗ : (0 , 1) → R is a twice continuously differentiable and con vex function. Let f ∗∗ ( y ) = f ∗ ( y ) − f ∗ (1 / 2) which results in f ∗∗ (1 / 2) = 0 . By [39, Equations 7-8], it is kno wn that D f = D f ∗∗ using the transformation f ∗∗ ( t ) = tf ((1 − t ) /t ) for t ∈ (0 , 1) . Hence, minimizing D f and D f ∗∗ are equiv alent ov er all twice continuously dif ferentiable and con vex f . A P P E N D I X B P RO O F O F L E M M A I I . 8 Existence: Since D u is upper semicontinuous on the compact set G 0 × G 1 , it attains its maximum. Let M = max D u > 0 by assumption. Uniqueness: Suppose ( G ∗ 0 , G ∗ 1 ) and ( G ′ 0 , G ′ 1 ) are two distinct maximizers achieving D u = M > 0 . Since G 0 ∩ G 1 = ∅ , we cannot hav e G ∗ 0 = G ∗ 1 or G ′ 0 = G ′ 1 . Thus, strict concavity of ( g 0 , g 1 ) 7→ g u 1 g 1 − u 0 implies that for t ∈ (0 , 1) , D u ( tG 0 + (1 − t ) G ′ 0 , tG 1 + (1 − t ) G ′ 1 ) > tD u ( G 0 , G 1 ) + (1 − t ) D u ( G ′ 0 , G ′ 1 ) = M , (61) a contradiction. Hence the maximizer is unique up to µ -null sets. 2 In [15] Ω is deﬁned to be a complete separable metrizable space. Furthermore, if all ( G 0 , G 1 ) ∈ G 0 × G 1 are absolutely continuous with respect to a ﬁxed measure µ , Ω may need to be ﬁnite. February 24, 2026 DRAFT 35 A P P E N D I X C P RO O F O F T H E O R E M I I I . 5 1. The sets ¯ A 0 and ¯ A 1 are trivially non-empty . If not, we ha ve R Ω ˆ g 0 = R Ω g L 0 dµ < 1 and R Ω ˆ g 1 dµ = R Ω g L 1 dµ < 1 , which are contradictions with the fact that ˆ g 0 and ˆ g 1 are density functions. The set A 0 is also non-empty and this can be sho wn again with contradiction. Assume that A 0 is empty . In this case, A 1 can either be empty or non-empty . Assume that A 1 is also empty . Then, by (41), we necessarily hav e ˆ g 0 = ˆ g 1 a.e., which is excluded by a suitable choice of g L 0 and g L 1 . Therefore, A 1 is non-empty . If A 1 is non-empty , then we must have ˆ g 0 = k 2 g L 1 on ¯ A 0 . If not, ˆ g 1 / ˆ g 0 will not be a constant function on ¯ A 0 ∩ A 1 , which is non-empty since Ω = ¯ A 0 . This again yields a contradiction with (41). Since ˆ g 0 = k 2 g L 1 is deﬁned on Ω , in order to satisfy (41), ˆ g 1 must also be g L 1 on ¯ A 1 . Hence, we hav e ˆ g 1 = g L 1 a.e. which is again a contradiction with the fact that R Ω ˆ g 1 = 1 . Therefore, A 0 is non-empty . A similar analysis shows that A 1 is also non-empty . 2. The set ¯ A 0 ∩ ¯ A 1 is empty . If not, from (42) and (41) we hav e ˆ g 1 ˆ g 0 = k 1 k 2 g L 0 g L 1 = k 1 = 1 k 2 . (62) This implies ( ¯ A 0 ∩ A 1 ) ∪ ( A 0 ∩ ¯ A 1 ) = Ω , hence, both ¯ A 0 ∩ ¯ A 1 and A 0 ∩ A 1 are empty sets. Since, A 0 ∩ A 1 is non-empty , we have a contradiction, hence, ¯ A 0 ∩ ¯ A 1 must be empty . 3. The set A 0 ∩ A 1 is non-empty . If not, A 0 and A 1 are disjoint sets. This implies at least non- empty ¯ A 0 ∩ A 1 and A 0 ∩ ¯ A 1 and at most additionally non-empty ¯ A 0 ∩ ¯ A 1 . Non-empty ¯ A 0 ∩ ¯ A 1 implies ˆ g 1 / ˆ g 0 = k a.e on Ω , see (62), and this is impossible, unless k = 1 . If only ¯ A 0 ∩ A 1 and A 0 ∩ ¯ A 1 are non-empty , i.e. if ¯ A 0 ∩ ¯ A 1 and A 0 ∩ A 1 are empty , hence, ( ¯ A 0 ∩ A 1 ) ∪ ( A 0 ∩ ¯ A 1 ) = Ω , we hav e A 0 = ¯ A 1 and A 1 = ¯ A 0 together with A 0 ∪ A 1 = Ω . This is possible if and only if k = k 1 = 1 /k 2 , because ¯ A 0 ∩ A 1 = { g 0 > g L 0 , g 1 = g L 1 } = { 1 /k 2 = g 1 /g 0 < g L 1 /g L 0 } , A 0 ∩ ¯ A 1 = { g 1 > g L 1 , g 0 = g L 0 } = { k 1 = g 1 /g 0 > g L 1 /g L 0 } . The condition k = k 1 = 1 /k 2 also implies ˆ g 1 / ˆ g 0 = 1 a.e on Ω , which is av oided by suitable choices of g L 0 and g L 1 . Hence, A 0 ∩ A 1 cannot be empty . The sets ¯ A 0 ∩ A 1 and A 0 ∩ ¯ A 1 are both non-empty . From A 0 ∩ A 1  = ∅ , there are four cases A 0 ⊂ A 1 , A 1 ⊂ A 0 , A 0 = A 1 , or A 0 \ A 1 and A 1 \ A 0 are both non-empty . The ﬁrst three conditions imply either non-empty ¯ A 0 ∩ ¯ A 1 , or A 0 = Ω , A 1 = Ω or both. The ﬁrst condition February 24, 2026 DRAFT 36 is a contradiction with (62) and the other three imply ˆ g j = g L j on Ω , which is impossible, see (42). Therefore, we hav e non-empty A 0 ∩ A 1 together with non-empty A 0 \ A 1 and A 1 \ A 0 . This e ventually implies non-empty ¯ A 0 ∩ A 1 and A 0 ∩ ¯ A 1 . It is kno wn that ˆ g 1 = g L 1 on A 1 and on ¯ A 0 ∩ A 1 we hav e ˆ g 1 / ˆ g 0 = 1 /k 2 . Hence, on ¯ A 0 we must hav e ˆ g 0 = k 2 g L 1 . Similarly , on ¯ A 1 we hav e ˆ g 1 = k 1 g L 0 . A P P E N D I X D P RO O F O F C O R O L L A RY I I I . 6 k 1 = 1 /k 2 implies empty A 0 ∩ A 1 , which is impossible, and k 1 > 1 /k 2 implies non-empty ( ¯ A 0 ∩ A 1 ) ∩ ( A 0 ∩ ¯ A 1 ) , which in turn implies k 1 = 1 /k 2 , another contradiction. Therefore, we hav e k 1 < 1 /k 2 . Accordingly , the sets A 0 and A 1 can be written as A 0 =( A 0 ∩ A 1 ) ∪ ( A 0 ∩ ¯ A 1 ) = { k 1 ≤ g L 1 /g L 0 ≤ 1 /k 2 } ∪ { k 1 > g L 1 /g L 0 } = { g L 1 /g L 0 ≤ 1 /k 2 } , A 1 =( A 0 ∩ A 1 ) ∪ ( ¯ A 0 ∩ A 1 ) = { k 1 ≤ g L 1 /g L 0 ≤ 1 /k 2 } ∪ { 1 /k 2 < g L 1 /g L 0 } = { g L 1 /g L 0 ≥ k 1 } . A P P E N D I X E P RO O F O F T H E O R E M I I I . 7 The deﬁnition of the sets A j , their intersections, their relation to l , k 1 and k 2 , and the fact that k 1 > 1 /k 2 tri vially follo w from the same line of arguments used in Theorem III.5 and Corollary III.6 by considering (47) and (48). The lower bounding function constraints are automatically satisﬁed as ˆ g 0 and ˆ g 1 are non-negati ve functions. The upper bounding function constraints are also satisﬁed in the same way as explained in Case 1 . The LFDs are obtained by unit density function constraints. A P P E N D I X F P RO O F O F T H E O R E M I I I . 8 For any g j ∈ G j , if t > t u , the ev ent A = [ ˆ l < t ] has a full probability and if t ≤ t l , it has a null probability . Therefore, (51) holds trivially for these cases. For t l < t ≤ t u , we have G 1 ( A ) =(1 + ϵ 1 ) F 1 ( A ) − ϵ 1 h ≤ (1 + ϵ 1 ) F 1 ( A ) = ˆ G 1 ( A ) G 0 ( A ) =(1 + ϵ 0 ) F 0 ( A ) − ϵ 0 h ≥ (1 + ϵ 0 ) F 0 ( A ) − ϵ 0 = 1 − (1 + ϵ 0 )(1 − F 0 ( A )) =1 − (1 + ϵ 0 ) F 0 ( ¯ A ) = 1 − G U 0 ( ¯ A ) = 1 − ˆ G 0 ( ¯ A ) = ˆ G 0 ( A ) . February 24, 2026 DRAFT 37 Hence, ˆ g 0 and ˆ g 1 are single-sample minimax robust. Moreov er , by the virtue of Theorem II.5, single-sample minimax robust LFDs minimize all f -div ergences, accordingly they also maximize all u -afﬁnities. A P P E N D I X G P RO O F O F T H E O R E M I I I . 9 From (52) and (53), LFDs can be written as ˆ g 0 =            g L 0 , A 0 1 k 2 g L 1 or 1 k 2 g U 1 , A 2 g U 0 , A 1 , ˆ g 1 =            g L 1 , A 3 k 1 g L 0 or k 1 g U 0 , A 5 g U 1 , A 4 , (63) Let ˆ g 0 = 1 k 2 g L 1 on A 2 and ˆ g 1 = k 1 g L 1 on A 5 . Then, A 1 ∩ A 5 , A 2 ∩ A 4 , and A 2 ∩ A 5 (64) are all empty sets, because their existence contradicts with (53). Accordingly , the robust LRF can implicitly be written as ˆ g 1 ˆ g 0 =                                  g U 1 /g L 0 , A 0 ∩ A 4 k 1 , A 0 ∩ A 5 g L 1 /g L 0 , A 0 ∩ A 3 g U 1 /g U 0 , A 1 ∩ A 4 k 2 , A 2 ∩ A 3 g L 1 /g U 0 , A 1 ∩ A 3 . Furthermore, from (52) and (53) we hav e A 0 ∩ A 5 = { g L 1 < g 1 < g U 1 , g 0 = g L 0 } = { g L 1 /g L 0 < k 1 = g 1 /g 0 < g U 1 /g L 0 } , A 2 ∩ A 3 = { g L 0 < g 0 < g U 0 , g 1 = g L 1 } = { g L 1 /g U 0 < k 2 = g 1 /g 0 < g L 1 /g L 0 } . (65) The empty sets in (64) imply A 2 ⊂ A 3 and A 5 ⊂ A 0 , which in turn imply A 5 = A 0 ∩ A 5 and A 2 = A 2 ∩ A 3 . Accordingly , A 2 and A 5 can also be made explicit in (63). The sets A 0 , A 1 and A 2 are disjoint, as well as the sets A 3 , A 4 and A 5 . On A 2 we hav e g L 1 /k 2 < g U 0 and due to continuity 1 k 2 g L 1 = g U 0 at least on a single point. It is also at most on a single point, if not A 1 and A 2 are not disjoint. For A 1 , the only choice left is then A 1 = { g L 1 /k 2 ≥ g U 0 } . Similarly , i.e. February 24, 2026 DRAFT 38 considering g L 0 < g L 1 /k 2 on A 2 etc., we hav e A 0 = { g L 0 ≥ g L 1 /k 2 } . Performing the same analysis ov er A 2 ∩ A 3 , leads to the explicit deﬁnition of the sets A 3 , A 4 and A 5 . This implies that A 1 ∩ A 4 is an empty set. Hence, ˆ g 0 , ˆ g 1 and ˆ g 1 / ˆ g 0 follo w as deﬁned by Theorem III.9, T ype-C. Follo wing the same line of arguments for the cases ˆ g 0 = 1 k 2 g U 1 on A 2 and ˆ g 1 = k 1 g U 1 on A 5 we hav e A 1 ∩ A 5 = { g L 1 < g 1 < g U 1 , g 0 = g U 0 } = { g L 1 /g U 0 < k 1 = g 1 /g 0 < g U 1 /g U 0 } , A 2 ∩ A 4 = { g L 0 < g 0 < g U 0 , g 1 = g U 1 } = { g U 1 /g U 0 < k 2 = g 1 /g 0 < g U 1 /g L 0 } , in the places of A 0 ∩ A 5 and A 2 ∩ A 3 , respecti vely , empty A 0 ∩ A 3 , and the explicit deﬁnition of the sets A j , which leads to the LRF of T ype-A and the corresponding LFDs. The LRF of T ype-B is a special case arising from merging the middle three regions of the LRFs of T ype-A and T ype-C as k 2 → k 1 . Moreover , LRFs of T ype-A and T ype-C tend to clipped likelihood ratio functions for k 1 small enough and k 2 large enough, and k 1 large enough and k 2 small enough, respecti vely . This implies empty A 0 ∩ A 4 and A 1 ∩ A 3 . R E F E R E N C E S [1] S. M. Kay , Fundamentals of Statistical Signal Pr ocessing, V olume 2: Detection Theory . Prentice Hall PTR, jan 1998. [2] B. C. Levy , Principles of Signal Detection and P arameter Estimation , 1st ed. Springer Publishing Company , Incorporated, 2008. [3] S. Kassam, G. Moustakides, and J. Shin, “Rob ust detection of kno wn signals in asymmetric noise, ” IEEE T ransactions on Information Theory , v ol. 28, no. 1, pp. 84–91, January 1982. [4] A. El-Sawy and V . V andeLinde, “Robust detection of known signals, ” IEEE T ransactions on Information Theory , v ol. 23, no. 6, pp. 722–727, November 1977. [5] J. D. Gibson and J. L. Melsa, Intr oduction to nonpar ametric detection with applications , ser . Mathematics in science and engineering. New Y ork, San Francisco, London: Academic Press, 1975. [6] P . J. Huber and E. M. Ronchetti, Rob ust statistics; 2nd ed. , ser . W iley Series in Probability and Statistics. Hoboken, NJ: W iley , 2009. [7] F . Wilcoxon, “Individual comparisons by ranking methods, ” Biometrics Bulletin , vol. 1, no. 6, pp. 80–83, 1945. [8] G. G ¨ ul, Robust and Distributed Hypothesis T esting , ser . Lecture Notes in Electrical Engineering. Springer , 2017, vol. 414. [9] B. C. Levy , “Robust hypothesis testing with a relativ e entropy tolerance, ” IEEE T ransactions on Information Theory , vol. 55, no. 1, pp. 413–421, 2009. [10] P . J. Huber , “ A robust v ersion of the probability ratio test, ” Ann. Math. Statist. , vol. 36, pp. 1753–1758, 1965. [11] P . J. Huber and V . Strassen, “Minimax tests and the neyman–pearson lemma for capacities, ” Annals of Statistics , vol. 1, no. 2, pp. 251–263, 1973. [12] A. G. Dabak and D. H. Johnson, “Geometrically based robust detection, ” in Proceedings of the Conference on Information Sciences and Systems , Johns Hopkins University , Baltimore, MD, May 1994, pp. 73–77. February 24, 2026 DRAFT 39 [13] C. Pandit and S. Meyn, “W orst-case large-de viation asymptotics with application to queueing and information theory , ” Stochastic Processes and their Applications , vol. 116, no. 5, pp. 724 – 756, 2006. [14] P . J. Huber and V . Strassen, “Robust conﬁdence limits, ” Z. W ahr cheinlichkeitstheorie verw . Gebiete , vol. 10, pp. 269–278, 1968. [15] ——, “Minimax tests and the Ne yman-Pearson lemma for capacities, ” Ann. Statistics , vol. 1, pp. 251–263, 1973. [16] G. G ¨ ul, “Minimax robust decentralized hypothesis testing for parallel sensor networks, ” IEEE T ransactions on Information Theory , vol. 67, no. 1, pp. 538–548, 2021. [17] K. V astola and H. Poor , “On the p-point uncertainty class (corresp.), ” IEEE T ransactions on Information Theory , v ol. 30, no. 2, pp. 374–376, March 1984. [18] S. Kassam, “Robust hypothesis testing for bounded classes of probability densities (corresp.), ” IEEE T ransactions on Information Theory , v ol. 27, no. 2, pp. 242–247, March 1981. [19] M. Fauß and A. M. Zoubir, “Old bands, new tracks-revisiting the band model for robust hypothesis testing, ” IEEE T ransactions on Signal Pr ocessing , vol. 64, no. 22, pp. 5875–5886, Nov 2016. [20] R. Martin and S. Schwartz, “Robust detection of a known signal in nearly Gaussian noise, ” IEEE T ransactions on Information Theory , v ol. 17, no. 1, pp. 50–56, Jan 1971. [21] S. Kassam and J. Thomas, “ Asymptotically robust detection of a known signal in contaminated non-Gaussian noise, ” IEEE T ransactions on Information Theory , vol. 22, no. 1, pp. 22–26, January 1976. [22] R. Martin and C. McGath, “Rob ust detection to stochastic signals (corresp.), ” IEEE T ransactions on Information Theory , vol. 20, no. 4, pp. 537–541, Jul 1974. [23] J. E. Smith, “Generalized chebychev inequalities: Theory and applications in decision analysis, ” Oper . Res. , vol. 43, no. 5, pp. 807–825, oct 1995. [24] F . Brichet and A. Simonian, “Conservati ve Gaussian models applied to measurement-based admission control, ” in Quality of Service, 1998. (IWQoS 98) 1998 Sixth International W orkshop on , May 1998, pp. 68–71. [25] M. A. Johnson and M. R. T aaffe, “ An in vestigation of phase-distribution moment-matching algorithms for use in queueing models, ” Queueing Systems , vol. 8, no. 1, pp. 129–147, Dec 1991. [26] H. Rahimian and S. Mehrotra, “Distrib utionally robust optimization: A revie w , ” Optimization Online, 2019. [27] R. Gao, Y . Xie, L. Xie, and H. Xu, “Rob ust hypothesis testing using wasserstein uncertainty sets, ” in Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2018. [28] J. W ang, R. Gao, and Y . Xie, “Non-conv ex robust hypothesis testing using sinkhorn uncertainty sets, ” 2024. [29] Z. Sun and S. Zou, “K ernel robust hypothesis testing, ” arXiv pr eprint arXiv:2205.01755 , 2022. [30] A. Schrab and I. Kim, “Rob ust kernel hypothesis testing under data corruption, ” arXiv preprint , 2024. [31] J. W ang, R. Gao, and Y . Xie, “ A data-driven approach to robust hypothesis testing, ” arXiv pr eprint arXiv:2201.02657 , 2022. [32] M. Puranik, U. Madhow , and R. Pedarsani, “Generalized likelihood ratio test for adversarially robust hypothesis testing, ” arXiv:2105.00182, 2021. [33] G. G ¨ ul, “ Asymptotically minimax robust likelihood ratio test, ” 2026. [Online]. A vailable: https://arxi v .org/abs/2602.08174 [34] A. Magesh, Z. Sun, V . V . V eerav alli, and S. Zou, “Robust multi-hypothesis testing with moment-constrained uncertainty sets, ” in 2024 IEEE International Symposium on Information Theory (ISIT) , 2024, pp. 849–854. [35] A. El-Sawy and V . V andeLinde, “Robust sequential detection of signals in noise, ” IEEE T ransactions on Information Theory , vol. 25, no. 3, pp. 346–353, May 1979. February 24, 2026 DRAFT 40 [36] D. Ralph, “Global con vergence of damped newton’ s method for nonsmooth equations via the path search, ” Math. Oper . Res. , vol. 19, no. 2, pp. 352–389, 1994. [37] F . A. Potra and S. J. Wright, “Interior -point methods, ” J. Comput. Appl. Math. , vol. 124, no. 1-2, pp. 281–302, Dec 2000. [38] G. G ¨ ul and A. M. Zoubir , “Minimax robust hypothesis testing, ” IEEE T ransactions on Information Theory , vol. 63, no. 9, pp. 5572–5587, 2017. [39] F . Osterreicher and I. V ajda, “Statistical information and discrimination, ” IEEE T ransactions on Information Theory , v ol. 39, no. 3, pp. 1036–1039, May 1993. February 24, 2026 DRAFT

From Asymptotic to Finite-Sample Minimax Robust Hypothesis Testing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment