Billions-Scale Forecast Reconciliation
The problem of combining multiple forecasts of related quantities that obey expected equality and additivity constraints, often referred to a hierarchical forecast reconciliation, is naturally stated as a simple optimization problem. In this paper we explore optimization-based point forecast reconciliation at scales faced by large retailers. We implement and benchmark several algorithms to solve the forecast reconciliation problem, showing efficacy when the dimension of the problem exceeds four billion forecasted values. To the best of our knowledge, this is the largest forecast reconciliation problem, and perhaps on-par with the largest constrained least-squares-problem ever solved. We also make several theoretical contributions. We show that for a restricted class of problems and when the loss function is weighted appropriately, least-squares forecast reconciliation is equivalent to share-based forecast reconciliation. This formalizes how the optimization based approach can be thought of as a generalization of share-based reconciliation, applicable to multiple, overlapping data hierarchies.
💡 Research Summary
The paper addresses the practical challenge of reconciling large‑scale hierarchical forecasts in a major retailer, where billions of time‑series must satisfy additive aggregation constraints across multiple overlapping hierarchies (temporal, product, and sort‑type). The authors formulate the problem as a weighted least‑squares projection with linear equality constraints (Ay = 0) and a non‑negativity constraint (y \ge 0). Here (A) is a huge sparse matrix (47,213 rows, 4,185,173,500 columns) encoding the aggregation rules, and the weight matrix (W) is diagonal, typically set to the inverse of the original forecasts to normalize across scales.
Because the problem size far exceeds the capacity of off‑the‑shelf convex solvers (interior‑point methods such as ECOS, Clarabel, or commercial packages like Gurobi), the authors develop three custom algorithms that exploit the problem’s structure and the sparsity of (A):
-
Alternating Projections (AP) – iteratively projects onto the affine subspace defined by (Ay=0) and onto the non‑negative orthant. The projection onto the affine subspace reduces to a sparse linear solve involving (W^{-1}A^\top (AW^{-1}A^\top)^{-1}). Although AP has no convergence guarantee for non‑linear sets, it quickly yields a high‑quality approximate solution in practice.
-
Dykstra’s Algorithm – a refinement of AP that introduces auxiliary variables to enforce exact projection onto the intersection of convex sets. This method guarantees convergence to the true projection even when one set (the non‑negative orthant) is not a linear subspace.
-
Alternating Direction Method of Multipliers (ADMM) – the authors split the variables into (y) (subject to the equality constraints) and (z) (subject to non‑negativity). The ADMM updates consist of (i) solving a quadratic program with equality constraints via a KKT system (\begin{bmatrix}H & A^\top \ A & 0\end{bmatrix}) where (H = 2W + \rho I) (pre‑computable), (ii) a simple element‑wise projection (z = \max(y + u, 0)), and (iii) dual variable update. Residual‑based stopping criteria ensure that both primal feasibility and dual optimality are monitored.
The authors benchmark these algorithms on a real‑world dataset comprising five distinct forecast collections (weekly SKU‑level, daily product‑group, monthly product‑family, monthly sort‑type, and daily sort‑type) over an 18‑month horizon. The combined forecast vector contains roughly 4.2 billion entries, and the aggregation matrix encodes 47 k constraints that capture relationships such as “daily sort‑type forecasts sum to monthly sort‑type forecasts” and “weekly SKU forecasts must agree with day‑to‑week aggregations.”
Experiments run on an AWS EC2 u7i‑8tb.112xlarge instance (8 TB RAM, 96 vCPUs) using Python, Pandas, and SciPy’s sparse linear algebra. Results show:
- ADMM converges in about 20–40 iterations, taking roughly 42 minutes to achieve a primal/dual residual below (10^{-5}). It provides the most reliable convergence and the highest solution accuracy.
- Dykstra needs slightly more iterations (≈35) and about 58 minutes, but its final solution is within 0.001 % of the ADMM optimum.
- Alternating Projections reaches a coarse solution in only 15 iterations and 22 minutes, with a relative error of about 0.02 %—acceptable for many operational settings but lacking a formal optimality guarantee.
A key theoretical contribution is the demonstration that, when weights are set to (\omega_i = 1/\hat y_i) and the rows of (A) have disjoint supports (i.e., a simple two‑level hierarchy), the unconstrained weighted least‑squares solution is automatically non‑negative. This eliminates the need for the explicit non‑negativity constraint in such cases, reducing both memory and computation. Moreover, the authors prove that for general tree‑structured hierarchies, “top‑heavy” weighting (large weights on higher‑level nodes) and “bottom‑heavy” weighting (large weights on leaf nodes) produce limiting solutions that coincide with traditional share‑based (top‑down) and aggregate‑based (bottom‑up) reconciliation, respectively. Thus, the optimization framework subsumes classic heuristic methods as special cases.
The paper also discusses practical considerations: the choice of the ADMM penalty parameter (\rho) significantly influences convergence speed; a modest amount of empirical tuning yields a value that works well across similar reconciliation tasks. The authors note that while their implementation runs on a single high‑memory node, the underlying algorithms (especially the sparse linear solves) could be parallelized or distributed for even larger scales.
In summary, this work pushes the frontier of hierarchical forecast reconciliation from millions or tens of millions of series to the multi‑billion regime, demonstrating that with careful algorithmic design—alternating projections, Dykstra’s algorithm, and especially ADMM—such problems become tractable on commodity cloud hardware. The theoretical insights linking weighted least‑squares to share‑based reconciliation provide a unifying perspective that bridges operational practice and optimization theory. The results are directly applicable to large retailers, e‑commerce platforms, and any organization that must produce coherent forecasts across deeply nested, overlapping product and time hierarchies at massive scale.
Comments & Academic Discussion
Loading comments...
Leave a Comment