Distributed Online Submodular Maximization under Communication Delays: A Simultaneous Decision-Making Approach

Distrib uted Online Submodular Maximization under Communication Delays: A Simultaneous Decision-Making Approach Zirui Xu, V asileios Tzoumas † Abstract — W e pr ovide a distributed online algorithm for multi-agent submodular maximization under communication delays. W e are motivated by the future distributed information- gathering tasks in unknown and dynamic en vironments, where utility functions naturally exhibit the diminishing-returns prop- erty , i.e. , submodularity . Existing approaches for online sub- modular maximization either rely on sequential multi-hop communication, resulting in prohibiti ve delays and restrictiv e connectivity assumptions, or r estrict each agent’ s coordination to its one-hop neighborhood only , thereby limiting the coor- dination perf ormance. T o addr ess the issue, we provide the Distributed Online Greedy ( DOG ) algorithm, which integrates tools from adversarial bandit learning with delay ed feedback to enable simultaneous decision-making acr oss arbitrary netw ork topologies. W e pro vide the approximation performance of DOG against an optimal solution, capturing the suboptimality cost due to decentralization as a function of the network structure. Our analyses further re veal a trade-off between coordination perf ormance and con vergence time, determined by the magnitude of communication delays. By this trade-off, DOG spans the spectrum between the state-of-the-art fully centralized online coordination approach [ 1 ] and fully decentralized one- hop coordination appr oach [ 2 ]. I . I N T R O D U C T I O N Multi-agent systems of the future will increasingly rely on agent-to-agent communication to coordinate tasks such as target tracking [ 1 ], en vironmental mapping [ 3 ], and area monitoring [ 4 ]. These tasks are often modeled as maximiza- tion problems of the form max a i,t ∈ V i , ∀ i ∈ N f t ( { a i,t } i ∈ N ) , t = 1 , 2 , . . . , (1) across the robotics, control, and machine learning communi- ties, where N denotes the set of agents, a i,t denotes agent i ’ s chosen action at time t , V i denotes agent i ’ s set of av ailable actions, and f t : 2 Q i ∈N V i 7→ R denotes the objectiv e function that captures the task utility (global objectiv e) [ 3 ]–[ 15 ]. In resource allocation and information gathering applications, f t is submodular [ 16 ], a diminishing-returns property [ 5 ]. For example, in target monitoring with multiple reorientable cameras, N is the set of cameras, V i represents the possible orientations of each camera, and f t measures the number of distinct targets observed within the joint ﬁeld of vie w . The optimization problem in eq. ( 1 ) is NP-hard [ 17 ], but polynomial-time algorithms with provable approximation guarantees exist when the f t is submodular . A classical example is the Sequential Gr eedy ( SG ) algorithm [ 16 ], which † Department of Aerospace Engineering, Univ ersity of Michigan, Ann Arbor , MI 48109 USA; {ziruixu,vtzoumas}@umich.edu This work was supported by NSF CAREER A ward No. 2337412 and AR O Early Career Program A ward W911NF-25-1-0280. guarantees a 1 / 2 -approximation ratio. Many multi-agent tasks, including target tracking, collaborativ e mapping, and monitoring, can be cast as submodular coordination prob- lems. Consequently , SG and its variants hav e been widely adopted in the controls, machine learning, and robotics literature [ 3 ]–[ 8 ], [ 10 ], [ 11 ], [ 13 ]–[ 15 ], [ 18 ]–[ 20 ]. In this paper , we focus on applications where the en viron- ment is unpredictable and partially observ able, and where the agents must solve the optimization problem in eq. ( 1 ) via agent-to-agent communication, e.g. , via mesh networks. Such optimization settings are challenging since, respec- tiv ely: (i) f t ( · ) is unknown a priori, necessitating online optimization approaches where the agents jointly plan actions using only retrospectiv e feedback (bandit feedback) [ 1 ]; and (ii) the state-of-the-art agent-to-agent communication speeds are slow compared to wired connections or 5G [ 21 ], necessitating no vel decentralized optimization paradigms that rigorously sacriﬁce near-optimality for scalability [ 15 ]. In such challenging optimization settings, SG and its variants offer no performance guarantees [ 1 ]. In more detail, the related work on submodular maximiza- tion in unpredictable and partially observable environments, and in the decentralized optimization context of limited communication speeds, is as follows: a) Online submodular optimization: In unpredictable settings, such as tar get tracking with maneuvering tar gets whose intentions are unknown [ 22 ], drones cannot forecast the future to ev aluate f t in advance. Instead, they must coordinate actions online using retrospectiv e feedback. The challenge deepens under partial observability: with limited sensing ( e.g. , drones tracking targets within a restricted ﬁeld of view), drones can ev aluate only the re ward of executed ac- tions but not the alternative rewards of unselected ones. This bandit feedback [ 23 ] pre vents agents from fully exploiting past information, thus hindering the design of near-optimal coordination strategies in such environments. T o this end, se- quential coordination algorithms leveraging online feedback hav e been proposed in [ 1 ], [ 24 ], which provide guaranteed suboptimality against robots’ optimal time-varying actions in hindsight. These algorithms extend SG to the bandit setting, expanding tools from the literature on tr acking the best expert ( e.g . , the EXP3-SIX algorithm [ 25 ]) to the multi-agent setting upon accounting for the submodular structure of the optimization problem. b) Online submodular optimization under low commu- nication speeds: But online sequential algorithms [ 1 ], [ 24 ], similar to their of ﬂine sequential greedy counterparts [ 3 ]– [ 8 ], [ 10 ], [ 11 ], [ 13 ], [ 14 ], [ 18 ]–[ 20 ], cannot scale under real- world communication conditions [ 15 ]. Since they perform sequential multi-hop communication ov er (strongly) con- nected networks to enable near -optimality , the y can cause excessi ve communication delays between consecutiv e time steps t and t + 1 [ 15 ]. In particular , for these state-of-the- art algorithms (of ﬂine and online variants), the communica- tion delays increase quadratically or ev en cubically as the number of agents increases [ 15 ]. For example, the Bandit Sequential Greedy ( BSG ) algorithm in [ 1 ] requires: (i) a communication complexity that is cubic in the number of agents at each time step (decision round) over a worst-case directed network, such that all agents can obtain the feedback of their selected actions, and (ii) the number of decision rounds being quadratic in the number of agents such that the algorithm can con ver ge. That is, it takes up to a quintic time in the number of agents for BSG to achie ve near-optimal coordination performances [ 2 , Theorem 6]. T o address the communication issues abo ve, nov el dis- tributed optimization algorithms hav e been proposed that achiev e linear time complexity in the number of agents. T o this end, for example, they (i) require each agent to coordinate with one-hop neighbors only , and (ii) operate ov er arbitrary network topologies. Such an algorithm is the Resource-A ware distributed Greedy ( RAG ) algorithm [ 15 ]. While RA G matches the performance of Sequential Greedy in fully centralized networks, for arbitrary network topologies, it suffers a suboptimality cost, as a function of the network topology . Howe ver , RAG applies to ofﬂine optimization only , where f t ( · ) is kno w a priori, instead of online. Another example is the ActSel subroutine proposed in [ 2 ], which extends RAG to the online settings While the online approach ActSel enables simultaneous decisions by av oiding sequential communication and yet still provides an asymptotically the same suboptimality guarantee as RAG , it has even larger potentials: the reliance on strictly local information limits the coordination performance, as agents cannot access broader information from beyond their immediate neighbors. Therefore, the following research ques- tion arises: Under communication delays, how does each agent coor dinate with others be yond its immediate neighbor- hood to maximize action coordination performance without sacriﬁcing decision speed? Contributions. In this paper, we dev elop an online op- timization algorithm that allo ws agents to exploit multi-hop communication in order to le verage information from be yond immediate neighbors, such that the performance gap between distributed and centralized coordination can be minimized. T o address the inevitable multi-hop communication delays, we le verage techniques from bandit learning with delayed feedback [ 26 ], enabling agents to select asymptotically near- optimal actions despite outdated feedback. The algorithm has the following properties: a) Appr oximation P erformance: The algorithm pro- vides a suboptimality bound against an optimal solution to eq. ( 1 ). Particularly , the bound captures the suboptimality cost due to decentralization, as a function of each agent’ s multi-hop coordination neighborhood. For example, as long as the network is connected (not necessarily fully connected), then the algorithm’ s approximation bound is 1 / 2 because ev ery agent can receiv e information from all others via multi- hop communication. In particular , DOG ’ s bound is lower than BSG ’ s 1 / (1 + κ f ) due to decentralization, and better than ActSel ’ s 1 / (1 + κ f ) − P i ∈N coin ( N (1) i ) since multi- hop communication enlar ges the coordination neighborhood, where N (1) i denotes i ’ s one-hop neighborhood (Section IV ). b) Con ver gence T ime: The algorithm enables agents to simultaneously select actions for e very (2 τ f + τ c ¯ d ) time, where τ f and τ c are the times for one function e valuation and transmitting one action for one-hop communication, respectiv ely among all i ∈ N . The con vergence time of DOG is ˜ O  ( τ f + τ c ¯ d ) |N | 2 max i ∈N ( |V i | + d i ) / ϵ  , slower than the ActSel subroutine in [ 2 ] by a factor of ¯ d 2 , while up to |N | 3 faster than BSG [ 1 ] (Section V ). I I . D I S T R I B U T E D O N L I N E S U B M O D U L A R M A X I M I Z A T I O N U N D E R C O M M U N I C ATI O N D E L A Y S W e present the problem formulation, using the notation: • V N ≜ Q i ∈ N V i is the cross product of sets {V i } i ∈ N . • [ T ] ≜ { 1 , . . . , T } for any positiv e integer T ; • f ( a | A ) ≜ f ( A ∪ { a } ) − f ( A ) is the marginal gain of set function f : 2 V 7→ R for adding a ∈ V to A ⊆ V . • |A| is the cardinality of a discrete set A . W e also use the follo wing framework about the agents’ communication network and their global objective f . Communication network. The distributed communica- tion network G = {N , E } can be directed and even discon- nected , where E is the set of communication channels. When G is fully connected (all agents receiv e information from all others), we call it fully centralized . In contrast, when G is fully disconnected (all agents are isolated, receiving infor- mation from no other agent), we call it fully decentralized . Communication neighborhood. When a communication channel exists from agent j to i , i.e. , ( j → i ) ∈ E , i can receiv e, store, and process information from j . The set of all agents from which i can recei ve information, possibly through multi-hop communication, is denoted by N i . Thus, N i represents agent i ’ s ∞ -hop in-neighborhood . F or simplic- ity , we refer to N i as agent i ’ s neighborhood and assume it remains constant over [ T ] . Information originating from different neighbors j ∈ N i may take v arying amounts of time to reach i , depending on the message size, communication data rate, and the number of hops from j to i . Communication delay . The communication delay is de- termined by the radius of agent i ’ s neighborhood, i.e. , the number of edges from the furthest multi-hop neighbor to i . W e denote this delay by d i . In particular , d i is also the delay for agent i to receiv e the rew ard of selecting action a i,t at t . That is, the value of r a i,t , t will be av ailable at t + d i . Deﬁnition 1 (Normalized and Non-Decreasing Submodular Set Function [ 16 ]) . A set function f : 2 V 7→ R is normalized and non-decreasing submodular if and only if • (Normalization) f ( ∅ ) = 0 ; • (Monotonicity) f ( A ) ≤ f ( B ) , ∀ A ⊆ B ⊆ V ; • (Submodularity) f ( s | A ) ≥ f ( s | B ) , ∀ A ⊆ B ⊆ V and s ∈ V . Intuitiv ely , if f ( A ) captures the number of targets tracked by a set A of sensors, then the more sensors are deployed, more or the same targets are covered; this is the non- decreasing property . Also, the marginal gain of tracked tar- gets caused by deplo ying a sensor s dr ops when mor e sensors are already deployed; this is the submodularity property . Deﬁnition 2 (2nd-order Submodul ar Set Function [ 27 ], [ 28 ]) . f : 2 V 7→ R is 2nd-order submodular if and only if f ( s | C ) − f ( s | A ∪ C ) ≥ f ( s | B ∪ C ) − f ( s | A ∪ B ∪ C ) , (2) for any disjoint A , B , C ⊆ V ( A ∩ B ∩ C = ∅ ) and s ∈ V . Intuitiv ely , if f ( A ) captures the number of targets tracked by a set A of sensors, then marginal gain of the mar ginal gains drops when more sensors are already deployed. Problem 1 (Distributed Online Submodular Maximization under Communication Delays) . At each time step t ∈ [ T ] , given the multi-hop neighborhood N i , each agent i ∈ N needs to select an action a i,t to jointly solve max a i,t ∈V i , ∀ i ∈N T X t =1 f t ( { a i,t } i ∈ N ) (3) wher e f t : 2 V N 7→ R is a normalized, non-decreasing sub- modular , and 2nd-order submodular set function, and each agent i can access the value of f t ( A ) only after it has selected a i,t at time t and received { a j,t } j ∈N i at time t + d i , ∀ A ⊆ { a i,t } ∪ { a j,t } j ∈N i . Problem 1 is a generalization to the problem in [ 1 ] by considering (i) the impact of communication delays, and (ii) an arbitrary rather than a connected communication network. Moreov er , Problem 1 differs from [ 15 ] by (i) addressing un- known en vironments, and (ii) allowing for multi-hop instead of merely one-hop communications. The action coordination performance in Problem 1 highly depends on the network {N i } i ∈N : it will improv e as the network becomes more centralized (from all agents coor - dinating with none to all agents coordinating with all). For example, consider the tar get monitoring scenario with multiple reorientable cameras: as the cameras become more centralized, they can each coordinate with more others to av oid covering the same tar gets, thus improving the total number of cov ered targets. Therefore, in this paper , we propose to adopt multi-hop communication to maximize each agent’ s information access ov er the distributed communica- tion network. T o mitigate the inﬂuence of communication delays to the action coordination frequency , we le verage tools from bandit learning with delayed feedback, which will be shown in the next section. I I I . D I S T R I B U T E D O N L I N E G R E E DY A L G O R I T H M ( D O G ) W e present the Distributed Online Greedy algorithm ( DOG ) for Problem 1 . Particularly , Problem 1 takes the form of adversarial bandit problems with delayed feedback. Therefore, in the following, we ﬁrst present the problem formulation of adversarial bandit with delayed feedback (Section III-A ) and then the main algorithm (Section III-B ). Algorithm 1: Distributed Online Greedy ( DOG ) for Agent i Input: Number of time steps T , agent i ’ s action set V i , agent i ’ s in-neighborhood N i , communication delay d i . Output: Agent i ’ s action a i, t , ∀ t ∈ [ T ] . 1: η i ← p log |V i | / [( |V i | + d i ) T ] ; 2: w 1 ←  w 1 , 1 , . . . , w |V i | , 1  ⊤ with w v , 1 = 1 , ∀ a ∈ V i ; 3: for each time step t ∈ [ T ] do 4: get distribution p t ← w t / ∥ w t ∥ 1 ; 5: draw action a i,t ∈ V i from p t ; 6: broadcast a i,t potentially via multi-hop communication; 7: recei ve neighbors’ actions { a j,s } j ∈N i for { s : s + d i = t } ; 8: r a i,s ,s ← f s ( a i,s | { a j,s } j ∈N i ) and normalize r a i,s ,s to [0 , 1] ; 9: ˆ r a,s ← 1 − 1 ( a i,s = a ) p a,s  1 − r a i,s ,s  , ∀ a ∈ V i ; 10: w a,t +1 ← w a,t exp ( η i ˆ r a,s ) , ∀ a ∈ V i ; 11: end for A. Adversarial Bandit with Delayed F eedback The adversarial bandit with delayed feedback problem in volves an agent selecting a sequence of actions to maximize the total reward o ver a gi ven number of time steps [ 26 ]. The challenges are: (i) at each time step t , no action’ s reward is known to the agent a priori, and (ii) after an action is selected, only the selected action’ s rew ard will become known with a time delay d t , which is assumed to be known a priori. W e present the problem in the following using the notation: • V denotes the available action set; • v t ∈ V denotes the agent’ s selected action at t ; • r v t , t ∈ [0 , 1] denotes the reward of selecting v t at t ; • d t is the delay for the rew ard of selecting action v t at t to be received. That is, the v alue of r v t , t will be known by the agent at t + d t . Problem 2 (Adversarial Bandit with Delayed Feed- back [ 26 ]) . Consider a horizon of T time steps. At each time step t ∈ [ T ] , the agent needs to select an action v t ∈ V such that the re gr et Regret T ≜ max v ∈ V T X t =1 r v , t − T X t =1 r v t , t , (4) is minimized, where no actions’ r ewar ds ar e known a priori, and only the selected action’ s r ewar d r v t , t ∈ [0 , 1] will become known at t + d t . The goal of solving Problem 2 is to achiev e a sublinear Regret T , i.e. , Regret T /T → 0 for T → ∞ , since this implies that the agent asymptotically chooses optimal actions ev en though the re wards are unknown a priori [ 26 ]. B. DOG Algorithm W e introduce the Distributed Online Greedy ( DOG ) algo- rithm (Algorithm 1 ). DOG enables agents to solve Problem 1 by simultaneously solving their own instance of Problem 2 . T o describe the algorithm, we use the notation: • A t ≜ { a i,t } i ∈N is the set of all agents’ actions at t ; • A OPT ∈ arg max a i ∈V i , ∀ i ∈N P T t =1 f t ( { a i } i ∈N ) is the optimal actions for agents N that solve Problem 1 . Intuitiv ely , our goal is for each agent i at each time step t to ef ﬁciently select an action a i,t that maximizes the marginal gain f t ( a | { a j,t } j ∈N i ) . That is, DOG aims to efﬁciently minimize the following quantiﬁcation: Deﬁnition 3 (Static Regret for Each Agent i ) . Given that agent i has multi-hop coor dination neighborhood N i . At each time step t , suppose agent i selects an action a i,t . Then, the static r egr et of { a i,t } t ∈ [ T ] is deﬁned as Reg T ( { a i,t } t ∈ [ T ] ) ≜ (5) max a ∈V i T X t =1 f t ( a | { a j,t } j ∈N i ) − T X t =1 f t ( a i,t | { a j,t } j ∈N i ) . Ideally , the agents select actions simultaneously , unlike ofﬂine algorithms such as SG [ 16 ]. But if the agents aim to select actions simultaneously , { a j,t } j ∈N i will become kno wn only after agent i selects a i,t and communicates with N i . Therefore, computing the marginal gain is possible only in hindsight, after all agents’ decisions have been ﬁnalized for time step t . Moreo ver , after a i,t is selected, the feedback { a j,t } j ∈N i cannot be transmitted to agent i until after a delay d i , due to potentially multi-hop communication. Thus, Problem 1 aligns with the framework of Problem 2 at the single-agent lev el, where the rew ard of selecting a i,t ∈ V i at time t , i.e . , r a i,t ,t ≜ f t ( a i,t | { a j,t } j ∈N i ) , will not be known by agent i until time t + d i . DOG starts by initializing a learning rate η i and a weight vector w t for all a vailable actions a ∈ V i (Algorithm 1 ’ s lines 1–2). Then, at each t ∈ [ T ] , it sequentially executes the following steps: • Compute probability distribution p t using w t (lines 3–4); • Select action a i,t ∈ V i by sampling from p t (line 5); • Send a i,t to out-neighbors and relay in-neighbors’ actions if possible (line 6); • Receiv e in-neighbors’ past actions { a j,s } j ∈N i for { s : s + d i = t } , where d i is the time for all { a j,s } j ∈N i to reach i , i.e. , the communication delay (line 7); • Compute reward r a i,s ,s , estimate rew ard ˆ r a,s of each a ∈ V , and update weight w a,t +1 of each a ∈ V i (lines 8–11). 1 I V . A P P RO X I M A T I O N G UA R A N T E E S W e present the suboptimality bound of DOG . The bound compares DOG ’ s solution to the optimal solution of Prob- lem 1 . Leveraging the concept of coin (Deﬁnition 4 ) that captures the suboptimality cost of decentralization, the bound cov ers the spectrum of DOG ’ s approximation performance from when the network is fully centralized (all agents 1 The coordination algorithms in [ 12 ]–[ 14 ] instruct the agents to select ac- tions simultaneously at each time step as DOG , but they lift the coordination problem to the continuous domain and require each agent to know/estimate the gradient of the multilinear extension of f t , which leads to a decision time at least one order higher than DOG [ 15 ]. communicating with all ) to fully decentralized (all agents communicating with none ). Deﬁnition 4 (Centralization of Information [ 15 ]) . F or each time step t ∈ [ T ] , consider a function f t : 2 V N 7→ R and a communication network {N i } i ∈ N wher e each agent i ∈ N has selected an action a i,t . Then, at time t , agent i ’ s centralization of information is deﬁned as coin f t ,i ( N i ) ≜ f t ( a i,t ) − f t ( a i,t | { a j,t } j ∈ N c i ) . (6) coin f t ,i measures ho w much a i,t can ov erlap with the actions of agent i ’ s non-neighbors. In the best scenario, where a i,t does not overlap with other actions at all, i.e. , f t ( a i,t | { a j,t } j ∈ N c i ) = f t ( a i,t ) , then coin f t ,i = 0 . In the worst case instead where a i,t is fully redundant, i.e. , f t ( a i,t | { a j,t } j ∈ N c i ) = 0 , then coin f t ,i = f t ( a i,t ) . W e also need the following deﬁnition to present the approximation performance of DOG . Deﬁnition 5 (Curvature [ 29 ]) . The curvatur e of a normalized submodular function f : 2 V 7→ R is deﬁned as κ f ≜ 1 − min v ∈V [ f ( V ) − f ( V \ { v } )] /f ( v ) . (7) κ f measures ho w far f is from modularity . When κ f = 0 , we ha ve f ( V ) − f ( V \ { v } ) = f ( v ) for all v ∈ V , i.e . , the marginal contribution of each element is independent of the presence of other elements, and thus f is modular . In contrast, κ f = 1 in the extreme case where there exists some v ∈ V such that f ( V ) = f ( V \{ v } ) , i.e . , v has no contribution in the presence of V \ { v } . Theorem 1 (Approximation Performance) . Over t ∈ [ T ] , given the communication network {N i } i ∈N , DOG instructs each agent i ∈ N to select actions { a i,t } t ∈ [ T ] that guarantee E [ f t ( A t )] ≥ 1 1 + κ f E  f t ( A OPT )  − κ f 1 + κ f X i ∈N E [ coin f t ,i ( N i )] − ˜ O  |N | q |V | + d / T  | {z } ψ ( T ) , (8) wher e κ f ≜ max t ∈ [ T ] κ f t , |V | + d ≜ max i ∈N ( |V i | + d i ) , ¯ d ≜ max i ∈N d i , the expectation is due to DOG ’ s internal randomness, and ˜ O ( · ) hides log terms. In particular , when the network is fully centralized, i.e. , N i ≡ N \ { i } , E [ f t ( A t )] ≥ 1 1 + κ f E h f t ( A OPT ) i − ˜ O  |N | q |V | + d / T  | {z } ϕ ( T ) . (9) When the network is fully decentralized, i.e. , N i ≡ ∅ , E [ f t ( A t )] ≥ (1 − κ f ) E h f t ( A OPT ) i − ˜ O  |N | q |V | + d / T  | {z } χ ( T ) . (10) In all, as T → ∞ , DOG enables asymptotically near- optimal action coordination. Particularly , Theorem 1 quanti- ﬁes both the con ver gence speed of DOG and the suboptimal- ity of DOG due to decentralization: • Con ver gence time : ϕ, χ, ψ in eqs. ( 8 ) to ( 10 ) capture the time needed for action selection to con ver ge to near- optimality and its impact to the suboptimality bound. They vanish as T → ∞ , having no impact on the suboptimality bound anymore, and its vanishing speed captures how fast the agents con verge to near-optimal actions. • Decentralization : After ψ v anishes as T → ∞ , the bound in eq. ( 8 ) depends on coin f t ,i capturing the suboptimal- ity due to decentralization: the larger is N i for each i ∈ N , the smaller is coin f t ,i , and the higher is DOG ’ s approximation performance. That is, eqs. ( 8 ) to ( 10 ) imply DOG ’ s suboptimality will improve if the agents have lar ger multi-hop coordination neighborhoods. Importantly , the 1 / (1 + κ f ) suboptimality bound with a fully connected network recovers the bound in [ 29 ] and is near-optimal as the best possible bound for ( 3 ) is 1 − κ f /e [ 30 ]. 2 V . R U N T I M E A N A L Y S I S W e present the runtime of DOG by analyzing its computa- tion and communication complexity (accounting for message length). W e use the notation and observations: • τ f is the time required for one ev aluation of f ; • τ c is the time for transmitting the information about one action from an agent directly to another agent; • ϵ is the conv ergence error after T iterations: T ≥ |N | 2 / ϵ is required for ϕ ( T ) , χ ( T ) , ψ ( T ) ≤ ϵ per eqs. ( 8 ) to ( 10 ). Proposition 1 (Computational Complexity) . At eac h t ∈ [ T ] , DOG r equir es each agent i to execute 2 function evaluations of f t and O ( |V i | ) additions and multiplications. Pr oof. At each t ∈ [ T ] , DOG requires 2 function ev aluations T o compute the marginal gain (Algorithm 1 ’ s line 8), along with O ( |V i | ) additions and multiplications (Algorithm 1 ’ s lines 4 and 9–10), and thus Proposition 1 holds. Proposition 2 (Communication Complexity) . At each t ∈ [ T ] , DOG r equires O ( τ c ¯ d ) communication time such that each agent can transmit enough actions throughout its co- or dination neighborhood without information congestion. Pr oof. Proposition 2 holds since, at each t ∈ [ T ] , if the communication volume is less than O ( d i ) actions for agent i , then there will be ne wly selected actions congesting in the network, leading to an increasing amount of feedback delays (Algorithm 1 ’ s lines 6–7). Theorem 2 (Conv ergence T ime) . DOG achie ves ϵ -con verg ence to near-optimal actions in ˜ O h ( τ f + τ c ¯ d ) |V | + d |N | 2 / ϵ i . Pr oof. Theorem 2 holds by combining Propositions 1 and 2 , along with the deﬁnition of ϵ above, upon ignoring the time needed for additions and multiplications. Remark 1 (T rade-off Between Coordination Performance and Con ver gence T ime) . W e observe this tr ade-off fr om 2 The bounds 1 / (1 + κ f ) and 1 − κ f /e become 1 / 2 and 1 − 1 /e when, in the worst case, κ f = 1 . Theor ems 1 and 2 , determined by the ma gnitude of communi- cation delays. Incorporating delayed information fr om larg er multi-hop neighborhoods impr oves appr oximation perfor- mance yet increasing also the con ver gence time, while r e- stricting communication to one-hop neighbor s only accel- erates con verg ence at the expense of lower coor dination performance. In the fully centralized case, i.e. , the delay takes the lar gest value |N |− 1 , DOG will r ecover BSG ’ s appr oximation bound of 1 / (1 + κ f ) , while con ver ging faster than BSG by O ( |N | ) . In the one-hop coor dination case, DOG will r ecover ActSel ’s appr oximation bound with the same con verg ence time. V I . C O N C L U S I O N W e presented the Distributed Online Greedy ( DOG ) al- gorithm for multi-agent submodular maximization under communication delays. Leveraging tools from adversarial bandit with delayed feedback, DOG enables agents to make simultaneous online decisions while incorporating delayed feedback information from multi-hop neighbors, maximizing each agent’ s coordination neighborhood. W e provided the approximation guarantees that capture the suboptimality cost of network decentralization and showed that DOG enables agents to select asymptotically near-optimal actions. The analyses of approximation bounds and con ver gence time that rev ealed a trade-off between coordination performance and con vergence time: as the communication delay decreases, DOG co vers the spectrum between fully centralized coor- dination and one-hop coordination. Future work. W e will extend this work to enable the agents to activ ely address the trade-off of coordination per- formance and conv ergence rate, by tuning their admissible communication delays. R E F E R E N C E S [1] Z. Xu, X. Lin, and V . Tzoumas, “Bandit submodular maximization for multi-robot coordination in unpredictable and partially observ able en vironments, ” in Robotics: Science and Systems (RSS) , 2023. [2] Z. Xu and V . Tzoumas, “Self-conﬁgurable mesh-networks for scalable distrib uted submodular bandit optimization, ” arXiv pr eprint:2602.19366 , 2026. [3] N. Atanasov , J. Le Ny , K. Daniilidis, and G. J. Pappas, “Decentralized activ e information acquisition: Theory and application to multi-robot SLAM, ” in IEEE Inter . Conf. Rob . Auto. (ICRA) , 2015, pp. 4775–4782. [4] M. Corah and N. Michael, “Distributed submodular maximization on partition matroids for planning on large sensor networks, ” in IEEE Confer ence on Decision and Contr ol (CDC) , 2018, pp. 6792–6799. [5] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor place- ments in g aussian processes: Theory , efﬁcient algorithms and empirical studies, ” Jour . Mach. Learn. Res. (JMLR) , vol. 9, pp. 235–284, 2008. [6] A. Singh, A. Krause, C. Guestrin, and W . J. Kaiser, “Ef ﬁcient informa- tiv e sensing using multiple robots, ” Journal of Artiﬁcial Intelligence Resear ch (JAIR) , vol. 34, pp. 707–755, 2009. [7] P . T okekar , V . Isler, and A. Franchi, “Multi-target visual tracking with aerial robots, ” in IEEE/RSJ International Confer ence on Intelligent Robots and Systems (IR OS) , 2014, pp. 3067–3072. [8] B. Gharesifard and S. L. Smith, “Distrib uted submodular maximization with limited information, ” IEEE T ransactions on Control of Network Systems (TCNS) , vol. 5, no. 4, pp. 1635–1645, 2017. [9] J. R. Marden, “The role of information in distributed resource allo- cation, ” IEEE Tr ansactions on Control of Network Systems (TCNS) , vol. 4, no. 3, pp. 654–664, 2017. [10] D. Grimsman, M. S. Ali, J. P . Hespanha, and J. R. Marden, “The impact of information in distributed submodular maximization, ” IEEE T rans. Ctrl. Netw . Sys. (TCNS) , vol. 6, no. 4, pp. 1334–1343, 2019. [11] B. Schlotfeldt, V . Tzoumas, and G. J. Pappas, “Resilient active information acquisition with teams of robots, ” IEEE T ransactions on Robotics (TR O) , vol. 38, no. 1, pp. 244–261, 2021. [12] B. Du, K. Qian, C. Claudel, and D. Sun, “Jacobi-style iteration for dis- tributed submodular maximization, ” IEEE T ransactions on Automatic Contr ol (T AC) , v ol. 67, no. 9, pp. 4687–4702, 2022. [13] N. Rezazadeh and S. S. Kia, “Distrib uted strategy selection: A submodular set function maximization approach, ” Automatica , vol. 153, p. 111000, 2023. [14] A. Robey , A. Adibi, B. Schlotfeldt, H. Hassani, and G. J. Pappas, “Optimal algorithms for submodular maximization with distributed constraints, ” in Learn. for Dyn. & Cont. (L4DC) , 2021, pp. 150–162. [15] Z. Xu, S. S. Garimella, and V . Tzoumas, “Communication- and computation-efﬁcient distributed submodular optimization in robot mesh networks, ” IEEE T ransactions on Robotics (TR O) , 2025. [16] M. L. Fisher , G. L. Nemhauser, and L. A. W olsey , “ An analysis of approximations for maximizing submodular set functions–II, ” in P olyhedral combinatorics , 1978, pp. 73–87. [17] U. Feige, “ A threshold of l n ( n ) for approximating set cover , ” Journal of the ACM (JA CM) , vol. 45, no. 4, pp. 634–652, 1998. [18] J. Liu, L. Zhou, P . T okekar , and R. K. Williams, “Distributed re- silient submodular action selection in adversarial en vironments, ” IEEE Robotics and Automation Letters , vol. 6, no. 3, pp. 5832–5839, 2021. [19] R. Konda, D. Grimsman, and J. R. Marden, “Execution order matters in greedy algorithms with limited information, ” in American Contr ol Confer ence (A CC) , 2022, pp. 1305–1310. [20] A. Krause and D. Golovin, “Submodular function maximization, ” T ractability: Pr actical Appr oaches to Hard Problems , vol. 3, 2012. [21] B. M. Sadler, F . T . Dagefu, J. N. T wigg, G. V erma, P . Spasojevic, R. J. K ozick, and J. Kong, “Low frequency multi-robot networking, ” IEEE Access , v ol. 12, pp. 21 954–21 984, 2024. [22] M. Sun, M. E. Davies, I. Proudler , and J. R. Hopgood, “ A gaussian process based method for multiple model tracking, ” in Sensor Signal Pr ocessing for Defence Conference (SSPD) , 2020, pp. 1–5. [23] T . Lattimore and C. Szepesvári, Bandit Algorithms . Cambridge Univ ersity Press, 2020. [24] Z. Xu, H. Zhou, and V . Tzoumas, “Online submodular coordination with bounded tracking regret: Theory , algorithm, and applications to multi-robot coordination, ” IEEE Robotics and Automation Letters (RAL) , v ol. 8, no. 4, pp. 2261–2268, 2023. [25] G. Neu, “Explore no more: Improved high-probability regret bounds for non-stochastic bandits, ” Adv . Neu. Info. Pr oc. Sys. , v ol. 28, 2015. [26] T . S. Thune, N. Cesa-Bianchi, and Y . Seldin, “Nonstochastic multi- armed bandits with unrestricted delays, ” Advances in Neural Informa- tion Pr ocessing Systems (NeurIPS) , vol. 32, 2019. [27] Y . Crama, P . L. Hammer, and R. Holzman, “ A characterization of a cone of pseudo-boolean functions via supermodularity-type inequal- ities, ” in Quantitative Methoden in den W irtschaftswissenschaften . Springer , 1989, pp. 53–55. [28] S. Foldes and P . L. Hammer, “Submodularity , supermodularity , and higher-order monotonicities of pseudo-boolean functions, ” Mathemat- ics of Operations Resear ch , v ol. 30, no. 2, pp. 453–461, 2005. [29] M. Conforti and G. Cornuéjols, “Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some general- izations of the rado-edmonds theorem, ” Discrete Applied Mathematics , vol. 7, no. 3, pp. 251–274, 1984. [30] M. Sviridenko, J. V ondrák, and J. W ard, “Optimal approximation for submodular and supermodular optimization with bounded curvature, ” Math. of Operations Resear ch , v ol. 42, no. 4, pp. 1197–1218, 2017. A P P E N D I X I P RO O F O F T H E O R E M 1 W e prove the main result: T X t =1 f t ( A OPT ) = T X t =1 f t ( A OPT ∪ A t ) − T X t =1 X i ∈N f t ( a i,t | A OPT ∪ { a j,t } j ∈ [ i − 1] ) (11) ≤ T X t =1 f t ( A t ) + T X t =1 X i ∈N f t ( a OPT i | A t ) − (1 − κ f ) T X t =1 X i ∈N f t ( a i,t | { a j,t } j ∈N i ) (12) ≤ T X t =1 f t ( A t ) + κ f T X t =1 X i ∈N f t ( a i,t | { a j,t } j ∈N i ) (13) + X i ∈N T X t =1 h f t ( a OPT i | { a j,t } j ∈N i ) − f t ( a i,t | { a j,t } j ∈N i ) i ≤ T X t =1 f t ( A t ) + X i ∈N Reg T ( { a i,t } t ∈ [ T ] ) + κ f T X t =1 X i ∈N f t ( a i,t | { a j,t } j ∈N i ) (14) = (1 + κ f ) T X t =1 f t ( A t ) + X i ∈N Reg T ( { a i,t } t ∈ [ T ] ) (15) + κ f T X t =1 X i ∈N  f t ( a i,t | { a j,t } j ∈N i ) − f t ( a i,t | { a j,t } j ∈ [ i − 1] )  ≤ (1 + κ f ) T X t =1 f t ( A t ) + X i ∈N Reg T ( { a i,t } t ∈ [ T ] ) + κ f T X t =1 X i ∈N  f t ( a i,t ) − f t ( a i,t | { a j,t } j ∈ [ i − 1] \N i )  (16) ≤ (1 + κ f ) T X t =1 f t ( A t ) + X i ∈N Reg T ( { a i,t } t ∈ [ T ] ) + κ f T X t =1 X i ∈N  f t ( a i,t ) − f t ( a i,t | { a j,t } j ∈N c i )  | {z } coin f t ,i ( N i ) , (17) where eq. ( 11 ) holds by telescoping the sum, eq. ( 12 ) holds since f is submodular and since 1 − κ f ≤ f t ( a i,t | { a j,t } j ∈N \{ i } ) f t ( a i,t ) ≤ f t ( a i,t | A OPT ∪{ a j,t } j ∈ [ i − 1] ) f t ( a i,t | { a j,t } j ∈N i ) per Deﬁni- tion 5 , eq. ( 13 ) holds from submodularity , eq. ( 14 ) holds from Deﬁnition 3 , eq. ( 16 ) holds since f t is 2nd-order submodular , and eq. ( 17 ) holds from Deﬁnition 4 . Reorganizing eq. ( 17 ) and leveraging [ 26 , Theorem 1], we prov e eq. ( 8 ) by the following, E h f t ( A OPT ) i = 1 T T X t =1 f t ( A OPT ) ≤ (1 + κ f ) E [ f t ( A t )] + κ f X i ∈N E [ coin f t ,i ( N i )] + ˜ O  |N | q |V | + d / T  . (18) In the fully centralized scenario, we hav e N i = N \ { i } . Thus, coin f t ,i ( N i ) = 0 , and thus eq. ( 9 ) is proved. Finally , in the fully decentralized case where N i = ∅ , per eq. ( 14 ), E [ f t ( A t )] ≥ E h f t ( A OPT ) i − κ f X i ∈N E [ f t ( a i,t )] − ˜ O  |N | q |V | + d / T  ≥ E h f t ( A OPT ) i − κ f 1 − κ f X i ∈N E [ f t ( a i,t )] − ˜ O  |N | q |V | + d / T  . (19) and thus eq. ( 10 ) is proved.

Distributed Online Submodular Maximization under Communication Delays: A Simultaneous Decision-Making Approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment