The on-line shortest path problem under partial monitoring

The on-line shortest path problem under partial monitoring
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1/\sqrt{n} and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. An extension to the so-called label efficient setting is also given, in which the decision maker is informed about the weights of the edges corresponding to the chosen path at a total of m « n time instances. Another extension is shown where the decision maker competes against a time-varying path, a generalization of the problem of tracking the best expert. A version of the multi-armed bandit setting for shortest path is also discussed where the decision maker learns only the total weight of the chosen path but not the weights of the individual edges on the path. Applications to routing in packet switched networks along with simulation results are also presented.


💡 Research Summary

The paper tackles the online shortest‑path problem on a weighted directed acyclic graph (DAG) where edge weights may change arbitrarily from round to round, possibly in an adversarial manner. In each round a decision maker must select a path from a designated source to a target; the loss of the chosen path is the sum of the weights of its constituent edges. The central difficulty is that the decision maker does not observe the full weight vector after each round but only a partial view, which leads to several distinct monitoring models.

Partial‑monitoring models

  1. Edge‑level bandit – after playing a path the learner sees the individual weights of the edges that belong to that path. This is a natural generalization of the classic multi‑armed bandit problem to a combinatorial setting.
  2. Label‑efficient – the learner is allowed to query edge weights only at a limited number (m\ll n) of the total (n) rounds. The query times are chosen in advance (or randomly) and the algorithm must still guarantee low regret.
  3. Tracking (time‑varying comparator) – instead of competing against a single best static path, the learner competes against a sequence of paths that may change a bounded number of times (the “switches”). This extends the problem to the well‑studied “tracking the best expert” scenario.
  4. Total‑loss bandit – the learner observes only the total loss of the selected path, not the individual edge weights. This is the most restrictive feedback model.

Algorithmic framework
The authors propose an EXP3‑style algorithm adapted to the combinatorial structure of paths. Each edge (e) maintains a weight estimate (\hat w_t(e)) and a selection probability (p_t(e)). In round (t) a path is sampled according to the product of edge probabilities (equivalently, by running a randomized shortest‑path routine with the current edge scores). After the path is played, the observed edge weights are importance‑weighted: for every edge (e) on the chosen path the unbiased estimator (\tilde w_t(e)=\frac{w_t(e)}{p_t(e)}) is formed, while unobserved edges receive a zero estimate. The edge weights are then updated multiplicatively, \


Comments & Academic Discussion

Loading comments...

Leave a Comment