Prediction with Expert Advice under Discounted Loss
We study prediction with expert advice in the setting where the losses are accumulated with some discounting—the impact of old losses may gradually vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm for Regression to this case, propose a suitable new variant of exponential weights algorithm, and prove respective loss bounds.
💡 Research Summary
The paper addresses the classic problem of prediction with expert advice in an online setting, but introduces a novel twist: the accumulated loss is discounted over time so that older errors gradually lose influence. This reflects many real‑world scenarios—such as financial trading, traffic forecasting, or recommendation systems—where the relevance of past mistakes decays. The authors formalize the discounted loss by a factor γ∈(0,1], defining the cumulative discounted loss at round t as L_t = ∑_{s=1}^t γ^{t‑s}ℓ_s, where ℓ_s is the instantaneous loss. When γ=1 the model collapses to the standard undiscounted case; when γ<1 the contribution of earlier rounds shrinks geometrically.
The first major contribution is a systematic generalisation of the Aggregating Algorithm (AA) and its regression variant (AA‑R) to this discounted setting. In the original AA, expert i’s weight is updated as w_i^{(t+1)}∝exp(−η·S_i^{(t)}), where S_i^{(t)} is the total loss up to round t. The authors replace S_i^{(t)} with the discounted sum D_i^{(t)} = ∑_{s=1}^t γ^{t‑s}ℓ_i^{(s)} and derive the update rule
w_i^{(t+1)} = w_i^{(t)}·exp(−η·γ^{t}·ℓ_i^{(t)}).
The factor γ^{t} precisely scales the current loss according to its weight in the discounted total. For the regression case, the same principle is applied to squared‑error losses, leading to a weighted least‑squares update that respects the discount factor. The authors also discuss how to choose the learning rate η as a function of γ, the loss range, and the horizon T to obtain optimal theoretical guarantees.
The second contribution is a new exponential‑weights scheme specifically designed for discounted losses, termed Exponential Weights with Discounting (EWD). EWD modifies the classic Hedge algorithm by inserting γ^{t} into the exponent:
w_i^{(t+1)} = w_i^{(t)}·exp(−η·γ^{t}·ℓ_i^{(t)}),
followed by normalisation to obtain a probability distribution over experts. The paper proves two complementary regret bounds for EWD. In expectation, the discounted regret—defined as the difference between the algorithm’s discounted loss and that of the best expert in hindsight—is bounded by O(√(T·log N)/(1‑γ)), where N is the number of experts and T the total number of rounds. A high‑probability bound of the same order is also established using concentration inequalities. These results smoothly interpolate between the undiscounted case (γ≈1) and the heavily discounted regime (γ≪1).
The theoretical analysis relies on a potential‑function argument. The authors define Φ^{(t)} = ∑_i w_i^{(t)} and examine the evolution of log Φ^{(t)} under the discounted update. By applying the convexity of the exponential function and bounding the contribution of each loss term with its discount factor, they derive the regret inequalities. The analysis also covers both bounded losses in
Comments & Academic Discussion
Loading comments...
Leave a Comment