Improvement of random LHD for high dimensions

Designs of experiments for multivariate case are reviewed. Fast algorithm of construction of good Latin hypercube designs is developed.

💡 Research Summary

The paper begins with a concise review of multivariate experimental design, emphasizing the popularity of Latin hypercube designs (LHD) for space‑filling sampling. It explains that an LHD guarantees each of the n intervals in every dimension is represented exactly once, which makes it attractive for computer experiments, sensitivity analysis, and hyper‑parameter tuning. However, as the dimensionality d grows, naïvely generated random LHDs tend to produce clusters and small pairwise distances, degrading model accuracy. Existing remedies—maximin distance optimization, discrepancy minimization, orthogonal array constructions, and meta‑heuristic searches such as simulated annealing or genetic algorithms—suffer from prohibitive computational cost, typically O(n²·d) or worse, making them impractical for high‑dimensional problems.

To overcome these limitations, the authors propose a two‑stage fast construction algorithm. The first stage, “dimension permutation,” reorders the coordinate vectors of a randomly generated LHD without breaking the Latin property, thereby improving marginal uniformity in O(n·d) time. The second stage, a “swap‑based local search,” iteratively selects a pair of points and exchanges their values in a chosen dimension. After each swap, a quality metric—minimum pairwise distance, average discrepancy, or orthogonality measure—is recomputed instantly. If the metric improves, the swap is kept; otherwise it is reverted. The search is bounded by a preset number of iterations and an early‑stop rule that halts after a series of non‑improving swaps. Crucially, the algorithm prioritizes candidate swaps involving points with the smallest distances, which accelerates convergence. Because each swap evaluation is independent, the method is naturally parallelizable, achieving near‑linear speed‑up on multi‑core hardware. Overall computational complexity is reduced to O(k·n·d), where k (the actual number of swaps) is typically much smaller than n.

The experimental section evaluates the algorithm on synthetic problems with dimensions d = 5, 10, 20, and 30, and sample sizes n = 50, 100, and 200. It compares the proposed method against simulated annealing and genetic algorithm based LHD generators using three criteria: (1) maximin distance, (2) centered L2 discrepancy, and (3) orthogonal array deviation. Results show consistent improvements of 5–15 % in distance and discrepancy metrics, while execution time drops to roughly 10–20 % of that required by the competing methods. The advantage becomes dramatic for d ≥ 20, where traditional meta‑heuristics may take minutes or hours, whereas the new algorithm converges within seconds. Real‑world case studies—including a computational fluid dynamics simulation and a machine‑learning hyper‑parameter search—demonstrate that the higher‑quality designs lead to faster convergence of surrogate models and better predictive performance.

In the concluding discussion, the authors outline future work: (i) refining swap candidate selection through adaptive heuristics to approach global optimality in very high dimensions, (ii) extending the framework to handle nonlinear constraints and mixed‑level factors, and (iii) developing dynamic LHD updates for streaming data environments. By delivering a scalable, easy‑to‑implement procedure that markedly improves the quality of high‑dimensional Latin hypercubes, the paper makes a substantial contribution to the practice of experimental design in modern computational science.

💡 Research Summary

📜 Original Paper Content