Universal and Composite Hypothesis Testing via Mismatched Divergence

Universal and Composite Hypothesis Testing via Mismatched Divergence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For the universal hypothesis testing problem, where the goal is to decide between the known null hypothesis distribution and some other unknown distribution, Hoeffding proposed a universal test in the nineteen sixties. Hoeffding’s universal test statistic can be written in terms of Kullback-Leibler (K-L) divergence between the empirical distribution of the observations and the null hypothesis distribution. In this paper a modification of Hoeffding’s test is considered based on a relaxation of the K-L divergence test statistic, referred to as the mismatched divergence. The resulting mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for the case where the alternate distribution lies in a parametric family of the distributions characterized by a finite dimensional parameter, i.e., it is a solution to the corresponding composite hypothesis testing problem. For certain choices of the alternate distribution, it is shown that both the Hoeffding test and the mismatched test have the same asymptotic performance in terms of error exponents. A consequence of this result is that the GLRT is optimal in differentiating a particular distribution from others in an exponential family. It is also shown that the mismatched test has a significant advantage over the Hoeffding test in terms of finite sample size performance. This advantage is due to the difference in the asymptotic variances of the two test statistics under the null hypothesis. In particular, the variance of the K-L divergence grows linearly with the alphabet size, making the test impractical for applications involving large alphabet distributions. The variance of the mismatched divergence on the other hand grows linearly with the dimension of the parameter space, and can hence be controlled through a prudent choice of the function class defining the mismatched divergence.


💡 Research Summary

The paper revisits the classic universal hypothesis‑testing problem, where one must decide between a fully known null distribution (Q) and an unknown alternative. Hoeffding’s seminal test from the 1960s uses the Kullback‑Leibler (KL) divergence between the empirical distribution (\hat{P}_n) of the observations and the null (Q) as its statistic. While optimal in the sense of achieving the best error exponent, the KL‑based statistic suffers from a variance that grows linearly with the alphabet size (|\mathcal{X}|). This makes the test impractical for high‑dimensional or large‑alphabet applications such as text, genomics, or network traffic analysis, where the number of possible symbols can be in the thousands or more.

To address this limitation, the authors introduce a relaxed divergence called the mismatched divergence (D_{\mathcal{F}}(\hat{P}n|Q)). The divergence is defined with respect to a user‑chosen function class (\mathcal{F}={f\theta:\theta\in\Theta}) of finite dimension (d=\dim(\Theta)): \


Comments & Academic Discussion

Loading comments...

Leave a Comment