On Online Control of Opinion Dynamics

On Online Contr ol of Opinion Dynamics Sheryl Paul 1 , Leslie Cruz Juarez 1 , Jyotirmoy V . Deshmukh 1 , K etan Savla 1 Abstract — Networked multi-agent dynamical systems have been used to model how indi vidual opinions ev olve o ver time due to the opinions of other agents in the network. Particularly , such a model has been used to study how a planning agent can be used to steer opinions in a desired direction thr ough repeated, budgeted inter ventions. In this paper , we consider the problem where individuals’ susceptibilities to external inﬂuences are unkno wn. W e propose an online algorithm that alternates between estimating this susceptibility parameter , and using the current estimate to drive the opinion to a desired target. W e provide conditions that guarantee stability and con ver gence to the desir ed tar get opinion when the planning agent faces budgetary or temporal constraints. Our analysis shows that the key advantage of estimating the susceptibility parameter is that it helps achieve near-optimal con vergence to the target opinion given a ﬁnite amount of intervention rounds, and, for a given intervention budget, quantiﬁes how close the opinion can get to the desired target. I . I N T RO D U C T I O N In this paper , we study the problem of inﬂuencing the preferences of social agents through the lens of control design. A social agent is a unit of a network ed multi- agent system, where the opinion or preference of each agent in the network evolv es ov er time while being inﬂuenced by the opinion of the neighboring agents in the network. Additionally , we assume a designated planner ag ent that can perform interventions (or control actions), which percolate through the network and may guide agents to a speciﬁc desired opinion. The agent netw ork is assumed to model the susceptibility of agents to the opinions of other agents as well as the planner agent. The objectiv e of this paper is to study ho w the planner agent can estimate the susceptibility parameters of the agent netw ork, and ho w the learned pa- rameters can be used to design resource allocation to move opinions tow ards a desired value. There are many real-world applications of such a problem setting. F or example, consider social settings where agents are indi vidual humans in a social network, and a central plan- ner may wish to encourage certain kinds of beha viors such as sustainable consumer behavior , public health a wareness, civic engagement, etc. In these scenarios, inﬂuences can be applied through interventions like advertising, discounts, or targeted messaging. Ho we ver , these deployments also come with a cost (e.g. time spent or monetary expense). Thus, we wish to optimize the cost of control actions while being able to reach the desired objectiv e in the least amount of time. This problem of the allocation of inﬂuencing interventions in opinion dynamics has been pre viously studied with v arious 1 Univ ersity of Southern California, Los Angeles, California, 90089, USA. { sherylpa, lcruzjua, jdeshmuk, ksavla } @usc.edu models. Classical frameworks like DeGroot explain consen- sus formation under ﬁxed interaction rules [1]. Subsequent work (e.g., [2], [3]) studies the implications of Bayesian and non-Bayesian social learning models, highlighting when beliefs con v erge to truth, and when network structure or agent behavior leads to polarization, or clustering. The Friedkin–Johnsen model introduces agent stubbornness [4], and has been extended to study leadership design (tailored to network structure) [5], adversarial settings inv olving com- peting stubborn agents [6], and con vergence behavior o ver time-varying or concatenated graphs [7]–[9]. Optimal control formulations target speciﬁc opinion outcomes within budget or effort constraints [10], though often under the assumption that the interaction parameters are kno wn. Recent works emphasize parameter recovery and forecasting e.g., sublinear or Expectation Maximization-style estimators [11], [12] and sociologically informed neural predictors [13]. Such Bayesian estimation methods use re gret minimiza- tion and bandit algorithms that update parameter estimates dynamically upon the arri v al of ne w data. In contrast to the regret-minimization focus of online-learning and bandit methods [14]–[16] and regret-optimal estimation and control methods [17], we treat opinion shaping as a networked control problem. Unlike [13], which forecasts opinions with sociologically-informed neural nets, we aim to control an opinion dynamic model to a prescribed target under b udget and constraints. W e show that by alternating parameter identiﬁcation with a control law (deriv ed analytically), we can driv e the system to a prescribed target opinion, while providing explicit conv ergence-rate and cost bounds. Our proposed method combines opinion dynamics inspired by the Friedkin-Johnsen model and online control, where we assume that the ‘susceptibility’ parameters for the network are unkno wn and are estimated. The algorithm alternates between exploration (to maximize parameter accuracy) and exploitation (using an optimal control law to driv e opinions tow ards a desired target under current estimates). In contrast to [18], our control la w is deri ved analytically , and we provide proofs of con v ergence for state and parameter errors ov er our two-phase algorithm. W e also address a feasibility problem: giv en a time horizon, and budget-constraint, de- termining if the system can be steered to be within a user- speciﬁed error tolerance from the desired target opinion. W e summarize our contributions as follows: Contributions. (1) W e analyze the state in v ariance, equi- libria and conv er gence of opinion dynamics model, derive analytic control update rules, and prove con ver gence to the target when the parameters are under deﬁned constraints. W e further formulate a ﬁnite-horizon feasibility problem under error (distance of the state to the target) and bud- get constraints. (2) For unknown susceptibility parameters, we propose an online control algorithm that alternates be- tween parameter-estimation and control, ensuring parameter identiﬁability and performance. W e prove that, under this algorithm, the combined parameter and state error con ver ges, and the system state is driv en to the desired target. (3) W e compare our work through simulations against recent models that perform budget-constrained optimization and those that perform online control via gradient descent. Additionally , we benchmark the cost of parameter-estimation by comparison with the cost of control under known parameters. The paper is org anized as follo ws: Section 2 introduces the system dynamics and formulates the control and parameter- estimation problems. In Section 3 we analyze system equi- libria, stability under static and time-varying inputs, and perform feasibility analysis via conv ergence rate and control cost. Section 4 presents the parameter identiﬁcation algo- rithm with suf ﬁcient conditions for correctness, while Section 5 describes the adaptiv e control algorithm with theoretical con v ergence guarantees. In Section 6 we report simulation results, and discuss limitations and future directions in Sec- tion 7. I I . P R E L I M I NA R I E S Friedkin-Johnsen Model f or Opinion Dynamics. The opin- ion dynamics model in our work is inspired by the Friedkin- Johnsen (FJ) model [19] which is shown in Equation (1). x ( t ) = Λ( I − L ) x ( t − 1) + ( I − Λ) d (1) where x ( t ) ∈ R N represents the system state at time t , and x i ( t ) ∈ R denotes the opinion of agent i . The desired opinion vector is denoted by d = d 1 ∈ [0 , 1] N , with the scalar d ∈ [0 , 1] representing the target opinion. Λ is a diagonal matrix representing agents’ susceptibility to interpersonal inﬂuence, where λ ii represents the susceptibility of agent i , and L is the graph-Laplacian of the underlying social network graph. This model has the unique ﬁxed point: x ∗ = (Λ L + I − Λ) − 1 ( I − Λ) d . The quantity Λ L encodes how much each agent av erages neighbors’ opinions, and I − Λ encodes attachment or bias to one’ s prejudice. W e note that the FJ model does not include a planner agent that can alter the opinion dynamics. T o allo w for such an agent, in [18], the authors present a modiﬁed two timescale model; in our paper , we consider only the discrete- time version of this model, which is shown in Equation (2). x ( t ) = V  ( I − H U ( t )) x ( t − 1) + H U ( t ) d  . (2) In the above formulation, H = diag ( h 1 , . . . , h N ) ≻ 0 is the susceptibility matrix where h i , represents the suscepti- bility of agent i to the planner’ s inﬂuence. The non-negati ve control vector u ( t ) ∈ R N ≥ 0 represents the inﬂuence that the planner exerts on the system, with u i ( t ) representing the inﬂuence exerted on agent i at time t ( U ( t ) = diag( u ( t )) . V = (Λ L + I − Λ) − 1 ( I − Λ) , V ∈ R N × N , is deﬁned by the ﬁxed point of the FJ model, where v ij represents the weight agent i places on agent j ’ s opinion. In [18], the authors ar gue that the matrix − (Λ L + I − Λ) is Metzler and Hurwitz, and therefore inv ertible, under the assumption of a strongly connected graph and at least one non-stubborn agent i.e. ∃ i s. t. λ ii < 1 1 . The interpretation of system dynamics in (2) is that agents form a con ve x blend of their prior opinions x ( t − 1) and the target d , weighted by H U ( t ) . (W e assume H U ( t ) ∈ [0 , 1] (elementwise) and will justify why this restriction is necessary in the next section.) After this, the social mixing matrix V propagates these updates across the network. 2 Cost deﬁnitions and b udget. W e assume that the control strategy of the planner incurs some cost. Similar to [18], we assume that these costs are linear functions of the control input 3 . W .l.o.g., we interpret u i ( t ) as the actual per–agent effort or cost at time t . W e can deﬁne cost at time instant t as c ( t ) = P n i =1 u i ( t ) and the cumulativ e cost over horizon- T as c u ( T ) = P n i =1 P T t =1 u i ( t ) . A natural budget constraint is c u ( T ) ≤ C max for some C max > 0 . Problem Statement. W e describe the problems we wish to solve, under the cases of kno wn and unkno wn parameters. Speciﬁcally , we assume the susceptibility matrix H which encodes each agent’ s responsiv eness to planner interventions is unknown. 4 The problems we solv e in this work are stated below , categorized in the kno wn and unkno wn parameter settings: Known susceptibility par ameters . (1) Global asymptotic sta- bility: W e design a control sequence u ( t ) so that x = d is a globally asymptotically stable equilibrium of the system, i.e., x ( t ) → d , as t → ∞ . (2) F easibility under err or constraints and budget: Giv en a time horizon T , and a target accuracy ϵ ≥ ∥ x ( t ) − d ∥ and a total cost budget C max > 0 , we wish to ﬁnd u ( t ) that satisﬁes both the accuracy and budget constraints. Unknown susceptibility parameters . (3) Online Contr ol: W e employ a parameter estimation algorithm, to estimate the unknown parameter H , while simultaneously driving x to the target using adaptive control. W e show that, under certain conditions the combined error con verges: the state error decreases to zero, while the parameter error remains bounded at a minimum lev el. 1 The matrix is Metzler because L is a Laplacian and Λ is diagonal with entries in [0 , 1] , preserving the Metzler property . It is Hurwitz because the graph G is connected and has at least one non-stubborn agent ensuring it is positiv e deﬁnite and invertible. 2 Our model applies control at every discrete time step, whereas [18] uses a two-timescale scheme in which interventions ’campaigns’+) occur at discrete epochs and the system e v olves in continuous time between campaigns. 3 The cost in a social setting models the planner purchasing inﬂuence (ads, outreach campaigns, incentiv es). This, along with simpler comparison to related work acts as our motiv ation for choosing a linear sum of costs, over the advantages of smoother analysis provided by quadratic costs. 4 Susceptibility to inﬂuence is latent, and cannot typically be directly observed, or reliably inferred through surve y data, as individuals rarely estimate their own susceptibility accurately . In contrast, other parameters such as the interaction matrix V can be inferred from observed networks, and the opinion vector x ( t ) is observable through expressed behavior (clicks, ratings, purchases, or survey/sentiment responses). I I I . A NA L Y S I S O F S Y S T E M DY NA M I C S As a starting point, we examine the boundedness of the state trajectory and establish that under the system dynamics, x ( t ) remains conﬁned to an in variant subset of R n for all time. Theorem 1. If V ∈ R n × n is r ow-stochastic matrix, i.e., V 1 = 1 and v ij ≥ 0 , ∀ i, j , and the contr ol vector u ( t ) satisﬁes 0 ≤ h i u i ( t ) ≤ 1 ∀ i , for ∀ t , assuming that the opinion vector x ( t − 1) lies in the subset [0 , 1] n with the desir ed targ et d ∈ [0 , 1] n , then under the dynamics equation (2) x ( t ) r emains in the set [0 , 1] n , i.e ., the set [0 , 1] n is positively invariant under the system dynamics. Pr oof. Since each term h i u i ( t ) ∈ [0 , 1] , we observe that for each i , the quantity (1 − h i u i ( t )) x i ( t − 1) + h i u i ( t ) d is a con v ex combination of x i ( t − 1) ∈ [0 , 1] and d ∈ [0 , 1] and ∴ 0 ≤ (1 − h i u i ( t )) x i ( t − 1) + h i u i ( t ) d ≤ 1 ∀ i (3) Therefore, the vector [( I − H U ( t )) x ( t − 1) + H U ( t ) d ] ∈ [0 , 1] n . Now since x ( t ) = V [( I − H U ( t )) x ( t − 1) + H U ( t ) d ] , and each row ( v i 1 , . . . , v in ) of V is a probability vector (i.e. v ij ≥ 0 and P j v ij = 1 ), it follo ws that each coordinate x i ( t ) is a con ve x combination of entries of the expression (1 − h i u i ( t )) x i ( t − 1) + h i u i ( t ) d . ∴ From (3) it follows that: 0 ≤ x i ( t ) ≤ 1 , ∀ i. Thus, the opinion vector x ( t ) ∈ [0 , 1] n , ∀ t . W e now address the problems known susceptibility parame- ters setting. First, we establish global asymptotic stability of the equilibrium (i.e., x ( t ) → d as t → ∞ ) under some bounds on the control input u ( t ) . Second, we formulate and analyze a ﬁnite-horizon feasibility problem that certiﬁes attainment of a prescribed error (distance from target) within a speciﬁed control budget. A. Stability Analysis under Bounded Contr ol Input T o establish global asymptotic stability of the equilibrium x = d , we analyze the existence, uniqueness, and stability of the equilibria under bounded control inputs. Theorem 2 (Stability of the desired equilibrium for admis- sible initial conditions) . Consider the dynamics in (2) , if x (0) , d ∈ [0 , 1] n , and if the contr ol vector u ( t ) satisﬁes 0 < h i u i ( t ) < 1 , ∀ i, t , then for all initial conditions x (0) ∈ [0 , 1] n , the system conver ges exponentially to the equilibrium x ∗ = d . Pr oof. Let us deﬁne the deviation from equilibrium as x ′ ( t ) = x ( t ) − d . Since d is constant: d = V [( I − H U ( t )) d + H U ( t ) d ] = V d . Subtracting d from both sides of the system dynamics Eq. (2): we get x ′ ( t ) = V ( I − H U ( t )) x ′ ( t − 1) . Let us deﬁne A ( t ) = V ( I − H U ( t )) . Then the update becomes: x ′ ( t ) = A ( t − 1) x ′ ( t − 1) = A ( t − 1) A ( t − 2) · · · A (0) x ′ (0) . Let η ( t ) = min i h i u i ( t ) , so that 0 < η ( t ) < 1 . Then: ∥ A ( t ) ∥ ∞ ≤ 1 − η ( t ) < 1 , ∥ x ′ ( t ) ∥ ∞ ≤ t − 1 Y s =0 (1 − η ( s )) ! ∥ x ′ (0) ∥ ∞ . (4) Since η ( t ) ∈ (0 , 1) , Q ∞ t =0 (1 − η ( t )) = 0 and therefore, ∥ x ′ ( t ) ∥ ∞ → 0 . Thus, we have lim t →∞ x ( t ) = d . If u ( t ) = u is constant, then so is A = V ( I − H U ) . By the assumption 0 < h i u i < 1 , we hav e: ∥ I − H U ∥ ∞ = max i | 1 − h i u i | < 1 so ∥ A ∥ ∞ ≤ ∥ V ∥ ∞ ∥ I − H U ∥ ∞ < 1 , and we can similarly show that lim t →∞ x ( t ) = d . Remark 3 (Joint Spectral Radius Argument.) . W e could also show this result using an argument with the joint spectral radius of A ( t ) . Deﬁne M = { V ( I − H U ) : U = diag( u ) , 0 < h i u i ≤ 1 } , η = inf t min i h i u i ( t ) > 0 . Then ∥ A ∥ ∞ ≤ 1 − η for all A ∈ M , so ρ ( M ) ≤ 1 − η < 1 . By standard results [20], for any sequence A ( t ) ∈ M , x ′ ( t ) = A ( t − 1) · · · A (0) x ′ (0) → 0 exponentially, hence x ( t ) → x ∗ . Remark 4 (System behavior without control input.) . If U = 0 , the system reduces to the standard consensus dynam- ics [21]: x ( t ) = V x ( t − 1) . If x ( t − 1) = d , then x ( t ) = d . That is, the system remains at equilibrium. Howe ver , if x ( t − 1)  = d , then the system con v erges to a weighted av erage of the initial state components instead, and d does not inﬂuence the con ver gence. Having established con ver gence to the desired equilibrium for admissible initial conditions under bounds on the control input, we go on to characterize the conv er gence rate for such inputs and quantify the corresponding control cost. B. F easibility Analysis of Contr ol Cost and Accuracy Rate of Con ver gence. W e consider a time-v arying control input of the form u ( t ) , which enforces a uniform rate of con v ergence r ( t ) = a · b t ∈ (0 , 1) , with 0 < a, b < 1 for all agents, s.t. h i u i ( t ) = r ( t ) . Thus, the individual control input for each agent can be set as u i ( t ) = r ( t ) h i (5) From (4) ∥ x ′ ( t ) ∥ ∞ ≤ ∥ x ′ (0) ∥ ∞ · Q t − 1 s =0 (1 − ab s ) . Using the inequality 1 − y ≤ e − y for y ∈ (0 , 1) , we bound the product as: ∥ x ′ ( t ) ∥ ∞ ≤ exp  − a (1 − b t ) 1 − b  · ∥ x ′ (0) ∥ ∞ so lim t →∞ ∥ x ′ ( t ) ∥ ∞ ≤ exp  − a 1 − b  · ∥ x ′ (0) ∥ ∞ . And the distance from equilibrium ϵ , at time T can be bounded by: ϵ = ∥ x ′ ( T ) ∥ ∞ ≤ exp  − a (1 − b T ) 1 − b  · ∥ x ′ (0) ∥ ∞ (6) W e deﬁne non-uniform contr ol as when each agent has a speciﬁed control input depending on their indi vidual sus- ceptibility parameter h i , and uniform control as when the same control input (say u c ( t ) ) is applied to all agents i.e., u i ( t ) = u c ( t ) , ∀ i . 5 Cost of Contr ol. W e now compute the associated cost of control, making explicit the trade- off between con vergence speed and control expenditure. The expression for cost can be given as: c u ( T ) = T − 1 X t =0 n X i =1 r ( t ) h i = T − 1 X t =0 n X i =1 ab t h i = a (1 − b T ) 1 − b S, S =        n X i =1 1 h i (non-uniform control) n h max (uniform control) (7) In both cases, the cost depends on the same bilinear term a (1 − b T ) / (1 − b ) that also governs conv ergence speed. Minimum Control Cost for Finite-Time ϵ -Accuracy . Firstly , in order to ensure at time t , that the state x ( t ) is at most ϵ away from the equilibrium, from (6) we have: a (1 − b t ) ≥ (1 − b ) log  ∥ x ′ (0) ∥ ∞ ϵ  . Secondly , to satisfy a budget constraint, i.e., c u ( t ) ≤ C max , from equation (7) we get: a (1 − b t ) ≤ (1 − b ) C max S . These inequalities aid us in framing and solving the problem of analyzing the feasibility of achieving ϵ − accuracy under gi ven time and budget constraints. F easibility Analysis. Giv en a ﬁnite horizon T , minimum accuracy requirement ϵ > 0 , initial error ∥ x ′ (0) ∥ ∞ , budget limit C max , and the sum of susceptibility parameters S = P n i =0 1 /h i , we seek to determine whether there exist a, b ∈ (0 , 1) such that the control schedule u ( t ) = ab t /h i satisﬁes both the accuracy and b udget constraints. This amounts to solving the constraint system: C max S ≥ a 1 − b T 1 − b ≥ log  ∥ x ′ (0) ∥ ∞ ϵ  , 0 < a, b < 1 . (8) A solution exists if: • Condition 1. Relating the error ϵ to the budget C max we require: ϵ ≥ ∥ x ′ (0) ∥ ∞ e − C max S . • Condition 2. W e can rewrite 8 as: log  ∥ x ′ (0) ∥ ∞ ϵ  1 − b 1 − b T ≤ a ≤ C max S 1 − b 1 − b T . Under Condition 1 the interval is nonempty; to also satisfy a < 1 it sufﬁces that C max S 1 − b 1 − b T ≤ 1 . Obtaining the v alue of b requires solving a polynomial equation: a (1 − b T ) − C max S (1 − b ) = 0 or a (1 − b T ) − log  ∥ x ′ (0) ∥ ∞ ϵ  (1 − b ) = 0 . Hence, feasibility reduces to verifying these conditions, which can be check ed analytically or with standard nonlinear solvers. Thus, we hav e established the con ver gence of the system dynamics and analyzed the feasibility problem in the case of known parameters. Ho we ver , if the susceptibility parameter H is unknown, estimating it enables us to design u ( t ) 5 In the special case of a constant control input u ( t ) = u , con v ergence can be regulated by ﬁxing a rate r ∈ (0 , 1) and setting u i = r /h i , which yields ∥ x ′ ( t ) ∥ ∞ ≤ (1 − r ) t ∥ x ′ (0) ∥ ∞ . In the case of uniform control, we impose bounds h min ≤ h i ≤ h max and select u c = r /h max . This guarantees h i u c ∈ [ r h min /h max , r ] ⊂ (0 , 1) for all i , leading to the contraction bound ϵ = ∥ x ′ ( t ) ∥ ∞ ≤  1 − rh min h max  t ∥ x ′ (0) ∥ ∞ . accurately enough for faster conv ergence, while respecting the bounds 0 < h i u i < 1 and b udget constraints. Although con v ergence can be achie ved without exact estimation of H , more accurate estimates allow tighter enforcement of the input bounds, thereby improving the con vergence rate. This motiv ates the parameter identiﬁcation algorithm that we introduce in the next section. I V . P A R A M E T E R E S T I M A T I O N In this section, we adapt standard parameter-estimation techniques for discrete-time systems following the frame- work presented in the literature [22] (Ref. Chapter 4) to identify the susceptibility parameter H . In such techniques, unknown model parameters are updated online using error signals between the plant and a reference/prediction model. An adaptation law (e.g. gradient, L yapunov-based, or least- squares) adjusts the estimates to maintain stability and de- sired performance. The canonical setup in this framework considers systems of the form: x ( t ) = ˜ F t ( x ( t − 1) , u ( t − 1)) + F t ( x ( t − 1) , u ( t − 1)) θ , (9) where ˜ F t and F t are assumed to be known functions 6 , and θ ∈ R n is an unknown parameter vector that appears linearly with the regressor matrix F t . W e reformulate our system to match the canonical form to show how we can apply the adaptive control technique. W e denote the parameter h i as θ i and H as Θ henceforth. W e observe that the following substitutions allows us to rewrite (2) in the form of (9): ˜ F t := V x ( t − 1) F t := V diag  u ( t ) ◦ ( d − x ( t − 1))  where U ( t ) = diag( u ( t )) and ◦ denotes elementwise multi- plication. The parameter-estimation method prescribes com- puting a state prediction ˆ x ( t ) , and updating the parameter estimate at time t , i.e., ˆ θ ( t ) as follows: ˆ x ( t ) = ˜ F t + F t ˆ θ ( t − 1) , ˆ θ ( t ) = ˆ θ ( t − 1) + Ψ F ⊤ t  x ( t ) − ˆ x ( t )  , Ψ = ψ I , ψ > 0 . (10) where Ψ is a gain matrix. Intuitively , this update cor- rects the estimate in the direction of the state prediction error , weighted by the regressor F t . Let the parameter and state estimation errors be deﬁned as: θ err ( t ) := θ − ˆ θ ( t ) , and x err ( t ) := x ( t ) − ˆ x ( t ) . Substituting in the abo ve dynamics yields: x err ( t ) = F t θ err ( t − 1) and θ err ( t ) = ( I − Ψ F ⊤ t F t ) θ err ( t − 1) (11) Thus, the updates in (10) are applied iterativ ely at each timestep, using the observed state to reﬁne the parameter estimate. The mechanism works by adjusting ˆ θ ( t ) in the direction that would reduce the observed state prediction error , effecti vely steering the parameters to better match the system dynamics over time. W e now show that the parameter 6 For conv enience we denote them without the arguments as ˜ F t and F t henceforth. error under this method con ver ges to zero, using L yapunov analysis. Parameter Error Analysis. T o show that the parameter- error reduces geometrically , we begin by introducing a quadratic L yapuno v function: R ( t ) := 1 2 θ err ( t ) ⊤ Ψ − 1 θ err ( t ) . Using the error dynamics (11), we deri ve conditions under which R ( t ) decreases monotonically . The analysis sho ws that this requires bounds on ∥ F t ∥ and an appropriate choice of the gain parameter ψ . By bounding the L yapunov dif ference R ( t ) − R ( t − 1) , we prove that the parameter error decreases at a geometric rate: ∥ θ err ∥ ≤ K . (1 − κ ) t 2 , where κ > 0 and K > 0 are constants. Lemma 1 (Con vergence of P arameter Estimation Error) . Suppose the following conditions hold: 1) Boundedness: There exists β > 0 such that ∥ F t ∥ ≤ β for all t , and the adaptation gain satisﬁes ψ < 2 /β 2 . 7 2) P ersistent Excitation: The r e gressor F t is persistently exciting , i.e ., ther e exists α > 0 such that ∀ t : F ⊤ t F t ⪰ α 2 I . Then the following hold: 1) F or (11) , the function R ( t ) := 1 2 θ err ( t ) ⊤ Ψ − 1 θ err ( t ) is a discr ete-time Lyapunov function. 2) The rate of decr ease of θ err satisﬁes: ∥ θ err ( t ) ∥ ≤ q 2 β 2 R (0)  1 − α 2 β 2  t/ 2 . Pr oof. By deﬁnition of R ( t ) and using Eq. (11), we can show: R ( t ) − R ( t − 1) = − x err ( t ) ⊤ M t x err ( t ); M t := I − 1 2 F t Ψ F ⊤ t (12) Note that ∀ t , R ( t ) is positiv e. T o establish that R ( t ) is a discrete-time L yapunov function, we sho w that for all t , R ( t ) − R ( t − 1) is strictly negati v e by showing that M t ≻ 0 . Since Ψ is a symmetric p.d. matrix, by the norm-eigen v alue inequality , we hav e λ max ( F t Ψ F ⊤ t ) ≤ λ max (Ψ) ∥ F t ∥ 2 Since λ max (Ψ) = ψ , and from Condition 1, ∥ F t ∥ 2 < β 2 , we get that λ max ( F t Ψ F ⊤ t ) ≤ ψ β 2 . Applying W eyl’ s inequalities to bound the eigenv alues of I − 1 2 F t Ψ F ⊤ t , we get: λ min ( M t ) = 1 − 1 2 λ max ( F t Ψ F ⊤ t ) ≥ 1 − 1 2 ψ β 2 . Also by Condition 1, 1 2 ψ β 2 < 1 , which means that the min eigen v alue of M t is positiv e, proving that M t ≻ 0 . Next, we establish the con ver gence rate of ∥ θ err ( t ) ∥ . Using norm- eigen v alue inequality we have x err ( t ) ⊤ M t x err ( t ) ≥ λ min ( M t ) ∥ x err ( t ) ∥ 2 . (13) From Eq. (11), x err ( t ) = F t θ err ( t − 1) , and from Condition 2, F ⊤ t F t ⪰ α 2 I , so ∥ F t θ err ( t − 1) ∥ 2 ≥ α 2 ∥ θ err ( t − 1) ∥ 2 7 Since V is ﬁxed, x ( t ) , d ∈ [0 , 1] n and 0 < h i u i ( t ) < 1 , it is possible to bound ∥ F t ∥ ≤ ∥ V ∥∥ u ( t ) ◦ ( d − x ( t )) ∥ ∞ ≤ β . The parameter ψ is a design choice and can be selected to ensure ψ < 2 β 2 . And x err ( t ) ⊤ M t x err ( t ) ≥  1 − 1 2 ψ β 2  α 2 ∥ θ err ( t − 1) ∥ 2 . Now , to bound θ err ( t − 1) , we consider the deﬁnition of R ( t − 1) = 1 2 θ err ( t − 1) ⊤ Ψ − 1 θ err ( t − 1) . Since Ψ = ψ I , R ( t − 1) = 1 2 ψ ∥ θ err ( t − 1) ∥ 2 , and we hav e ∥ θ err ( t − 1) ∥ 2 = 2 ψ R ( t − 1) . Substituting in Eq. (13): x err ( t ) ⊤ M t x err ( t ) ≥ 2 ψ  1 − 1 2 ψ β 2  α 2 R ( t − 1) Substituting in (12) we get: R ( t ) ≤ (1 − κ ) R ( t − 1) and R ( t ) ≤ (1 − κ ) t R (0) (14) where κ := 2 ψ  1 − 1 2 ψ β 2  α 2 > 0 . Using the deﬁnition of R ( t ) , we can simplify this to: ∥ θ err ( t ) ∥ ≤ p 2 ψ R (0)(1 − κ ) t/ 2 . (15) As ψ is a design parameter , we observe that setting ψ = 1 β 2 yields the smallest v alue for (1 − κ ) , which in turn yields the tightest upper bound on ∥ θ err ( t ) ∥ . Substituting ψ = 1 β 2 in (15) yields: ∥ θ err ( t ) ∥ ≤ q 2 β 2 R (0)  1 − α 2 β 2  t/ 2 . W e go on to discuss persistency of excitation (PE) (Con- dition 2), and the choice of u ( t ) to maintain it. Persistency of Excitation Conditions. T o enforce the PE condition λ min ( F ⊤ t F t ) ≥ α 2 at each time instant, we expand using F t = V diag( y ( t )) with y ( t ) = u ( t ) ◦ ( d − x ( t )) , which giv es λ min  F ⊤ t F t  ≥  min j | y j ( t ) |  2 λ min  V ⊤ V  . Thus, a sufﬁcient condition to ensure PE is: min j | y j ( t ) | ≥ α λ V where λ V = p λ min ( V ⊤ V ) . Since y j ( t ) = u j ( t ) | x j ( t ) − d | , we must keep the deviation | x j − d | bounded aw ay from zero. So, we enforce a margin: | x j ( t ) − d | ≥ δ ( t ) , ∀ j where δ ( t ) ∈ (0 , 1] , i.e., ev ery component of the state x ( t ) must be at least δ away from the target d to preserve the PE lower bound. W e choose δ ( t ) that guarantees this for all u j ( t ) using a conservati ve bound on θ from its estimate and error . Deﬁne θ max ( t ) = ˆ θ ( t ) + θ err ( t ) for a giv en α we can set: δ = α λ V u max ( t ) where u max ( t ) = min j 1 /θ max j ( t ) (16) Thus PE holds by k eeping x ( t ) outside a neighborhood of radius δ ( t ) , with α and δ ( t ) chosen to respect input bounds. W e now present an online algorithm that estimates θ while steering x ( t ) → d . V . O N L I N E C O N T R O L A L G O R I T H M Our algorithm alternates between “exploration” and “ex- ploitation” phases indexed by m : Let the number of steps in the exploration and exploitation phases of cycle m be K m θ , and K m x resp. Each cycle m is additionally characterized by a neighborhood radius δ m and a PE lev el α m : (i) Explor ation: executed while | x j ( t m ) − d | > δ m , ∀ j to guarantee PE and driv e ˆ θ ( t m ) → θ . W e develop a controller that ensures PE: u j ( t ) = clip [0 , 1 /θ j ]  α λ V | x j ( t ) − d |  , ∀ j (17) Algorithm 1 Online Control Algorithm 1: Input T arget d , social network matrix V 2: Initialize adaptation gain Ψ , α 0 and δ 0 (via (16)), shrink factor c δ , parameter-estimate ˆ θ (0) 3: Set m ← 0 , t m ← 0 4: Observe x (0) 5: while ∥ x ( t m ) − d ∥ has not conv erged do 6: while | x j ( t m ) − d | > δ m , ∀ j and α m > α min do 7: u j ( t m ) ← α m λ V | x j ( t m ) − d j | , ∀ j 8: Predict ˆ x ( t m + 1) ; observe x ( t m + 1) 9: Parameter update ˆ θ ( t m + 1) via (10) 10: t m ← t m + 1 11: end while 12: Update parameter bound: θ max ← ˆ θ ( t m ) + θ err ( t m ) 13: while | x j ( t m ) − d | ≥ γ δ m , ∀ j do 14: Set contraction rate: r ( t m ) = ab t m 15: u j ( t m ) ← r ( t m ) θ max j , ∀ j 16: Observe x ( t m + 1) ; t m ← t m + 1 17: end while 18: Update schedule δ m +1 , α m +1 as in (18) 19: m ← m + 1 20: end while (ii) Exploitation: executed once ∃ j, s.t. | x j ( t m ) − d | ≤ δ m using the analytic control input (Eq. (5)). T o prevent ov er - contraction in this phase and preserve a nontrivial δ m +1 for the next cycle’ s exploration phase, we impose: | x j ( t m ) − d | ≥ γ δ m , ∀ j with γ ∈ (0 , 1) . For a chosen α 0 , we initialize δ 0 using Eq. (16). At the end of each cycle m (after the exploitation phase), to guarantee re- entry into the exploration phase in the next cycle, we shrink δ m by some factor and set: δ m +1 = c δ ( γ δ m ) and α m +1 = λ V u max ( t end m ) δ m +1 (18) where c δ ∈ (0 , 1 2 ] is a shrinking factor , and t end m denotes the time at the end of cycle m . This rule ensures | x j ( t m +1 ) − d | > δ m +1 , ∀ j , so the next cycle begins in exploration with a PE lev el that remains feasible under the input bounds. W e carry out the exploration phase until α m > α min where α min ∈ (0 , 1) . Keeping α > α min forces | x j ( t m ) − d | > δ m , ∀ j hindering the state con ver gence to the target i.e. ∥ x ( t ) − d ∥ → 0 . Upon reaching α min , we stop the exploration phases and deﬁne the parameter error at this point as θ err min . Let m ⋆ = min { m : α m ≤ α min } , deﬁne θ err min = θ err  t end m ⋆  , R min = 1 2 θ err min ⊤ Ψ − 1 θ err min (19) W e continue exploitation letting δ m → 0 to driv e x ( t ) → d . W e can no w present our online control algorithm 1. and describe it as follows: Lines 1 – 4 specify the problem inputs and perform initialization: the current state is observed, the parameter-estimate and other hyperparameters are set. Lines 6 – 10 execute the exploration phase while | x j ( t m ) − d | ∞ > δ m , ∀ j and α m > α min . The control input ensuring the P E condition is applied in line 7 and the parameter-estimate is updated via line 9. Lines 13 – 16 perform exploitation within the bounds γ δ m ≤ | x j ( t m ) − d | ≤ δ m , ∀ j using the analytic control with the conservati ve bound θ max (lines 12–15). At the end of the cycle, the neighborhood radius δ m and PE lev el α m are updated in line 18 using Eq. (18), and the cycle counter advances (line 19). Once α m ≤ α min , the exploration phase in each cycle, is no longer carried out. Howe ver , the exploitation phase (lines 13-16), keeps repeating and as δ m → 0 , the state error ∥ x ( t m ) − d ∥ → 0 , as our control input (line 15) satisﬁes the conditions stated in Theorem 2. Combined State and Parameter errors: Let t m be the start and end of cycle m . Deﬁne the combined error: E m := ν θ R ( t m ) + ν x ∥ x ( t m ) − d ∥ 2 (20) with ν θ > 0 weighting the parameter error deﬁned by the L yapunov function R ( t m ) , and ν x > 0 weighting the distance of the state to the target ∥ x ( t m ) − d ∥ 2 . No w we go on to sho w the con ver gence of this combined error under Algorithm 1. Proposition 1 (Con vergence of combined error) . Let t m and t m +1 be the start of cycles m and m + 1 r espectively . Under the deﬁnitions of combined err or (20) and minimum parameter err or (19) , the state conver ges to the tar get: ∥ x ( t m ) − d ∥ → 0 as m → ∞ while the parameter err or r eaches a minimum R ( t m ) → R min ; hence E m → ν θ R min Pr oof. During exploration ( ∥ x ( t m ) − d ∥ ∞ > δ m ), our control input as designed in Eq. (17), fulﬁlls the conditions in Lemma 1. Hence, the per step contraction is: R ( t m +1) ≤ (1 − κ m ) R ( t m ) (from Eq. (14)). After K ( m ) θ exploration steps and noting that in exploitation ˆ θ is not updated (so R does not increase), R ( t m +1 ) ≤ (1 − κ m ) K ( m ) θ R ( t m ) . (21) The state update satisﬁes the contraction condition (as in Theorem 2), which implies ∥ x ( t m +1) − d ∥ ∞ ≤ ∥ x ( t m ) − d ∥ ∞ . Hence, once the trajectory enters the δ m neighborhood, ∥ x ( t m +1 ) − d ∥ ∞ ≤ δ m , and using the transformation of ℓ ∞ norm to ℓ 2 norm: ∥ x ( t m +1 ) − x ∗ ∥ 2 ≤ nδ 2 m . (22) Multiplying (21) by ν θ and (22) by ν x and adding giv es E m +1 ≤ ν θ (1 − κ m ) K ( m ) θ R ( t m ) + ν x nδ 2 m . Using δ m = c δ ∥ x ( t m ) − d ∥ ∞ ≤ c δ ∥ x ( t m ) − d ∥ yields: E m +1 ≤ (1 − κ m ) K ( m ) θ ν θ R ( t m ) + nc 2 δ ν x ∥ x ( t m ) − d ∥ 2 (23) Finally , the updates on α m +1 and δ m +1 using Eq. (18) guarantee feasibility of inputs and re-entry into exploration, thereby maintaining PE in the next cycle. Now recall the deﬁnitions of R min and θ err min from Eq. (19). For m ≥ m ⋆ , parameter updates stop, so R ( t m ) = R min , while the state update remains contractiv e with δ m → 0 implying ∥ x ( t m ) − d ∥ → 0 . Consequently , E m = ν θ R min + ν x ∥ x ( t m ) − d ∥ 2 → ν θ R min . Fig. 1: Final error ϵ ( T ) vs. control cost c u ( T ) under known θ vs. adaptive control using ˆ θ ( t ) . Fig. 2: Final error ϵ across costs comparing the IODSFC baseline and our method. Thus, we hav e shown that our algorithm sufﬁciently esti- mates the parameter θ , while driving the state to the target. Remark 5 . By our result in 2, estimating θ is not required to driv e x ( t ) → d ; it sufﬁces to choose u ( t ) so that the condition θ i u i ( t ) ∈ (0 , 1) , ∀ i, t is met. Howe ver , as the esti- mation error shrinks, we can tune u ( t ) with greater accuracy which tightens the contraction factor and improves the rate of con ver gence. W e demonstrate this through simulations in our next section. V I . S I M U L A T I O N S Cost of Parameter Estimation. W e now attempt to quantify the cost of learning the parameter θ . As noted earlier , knowing θ although not necessary to drive x ( t ) → d , improv es the rate of con ver gence. Given the same budget C max for a ﬁxed time horizon T , we can reach a lower ∥ x ( T ) − d ∥ , if θ is known a priori, as opposed to when it must be estimated online- these results can be seen in Fig. 1. In the Known P arameters case, the control can be set optimally as u j ( t ) = r /θ j , ∀ j . Howe v er in the Adaptive Contr ol case the control is set to u j ( t ) = r / ˆ θ j ( t ) , ∀ j , while ˆ θ is updating using parameter identiﬁcation. The g ap between the two curves quantiﬁes the cost of learning: with exact θ we could achieve a lower error ∥ x ( T ) − d ∥ for the same control cost. Comparisons with Related W ork. (I) Our setting closely aligns with that of [18] (Inﬂuencing Opinion Dynamics to promote Sustainable Food Choices, IODSFC), Under a ﬁxed budget, the IODSFC method computes the optimal control via numerical optimization. In contrast, we deriv e a closed-form update for u ( t ) , av oiding optimization. For a fair comparison, since they assume kno wn parameters, we assume likewise that the parameter θ is kno wn for both methods in our simulation. Across varying budgets (Fig. 2), our approach attains performance close to the optimal, with the gap narrowing as the budget increases. Notably , as opposed to IODSFC we also pro vide analytic con ver gence conditions (ref. Theorem 2) which enable this control design rather than relying on optimization. (II) The work by [23] (Network-aware Recommender Sys- tem via Online Feedback Optimization, NRS-OFO) proposes a projected-gradient control framew ork for recommender systems. For a fair comparison we assume in both methods that the state x ( t ) , and the system dynamics are kno wn, and we implement their ‘Gradient Estimation & Optimization’ procedure (Section 3: Lev el III). W e adopt the loss function L ( x ( t )) = 1 2 ∥ x ( t ) − d ∥ 2 2 , and minimize it via w .r .t. the control input U which requires the steady-state Jacobian ∇ U x ∗ . The NRS-OFO method updates the control input in in- tervals, and lets the system conv erge to a steady state in between updates (the control input is constant within each interval). In our model, the only steady state (under bounded u ( t ) ) is x ∗ = d , though in other frameworks the steady state may be slo w to reach or misaligned with the target. T o accurately represent this in our simulation of the model, we compute U ( t ) via gradient descent and keep it constant until con v ergence or until the cumulativ e cost of control c u ( T ) reaches the budget C max . As sho wn in Fig. 3a, for a giv en b udget C max = 15 our method con ver ges faster and attains a lower error . In Fig. 3b, we show the steady-state error i.e. ∥ x ∗ − d ∥ across budgets C max ∈ { 10 , 15 , . . . , 50 } for both methods. Our method achiev es lower error for the same cost because it designs U ( t ) to satisfy a budget-constrained error guarantee, whereas NRS–OFO does not explicitly incorporate budget. Finally , in Fig. 3c we present an ablation between tw o online controllers that both use parameter estimation (as in section 4) but differ in their controllers: (i) gradient-descent updates to U , and (ii) our analytic control u j = r ( t ) / ˆ θ j , ∀ j . Both estimate ˆ θ online while steering opinions toward d , but our method conv erges faster and achie ves a lo wer steady-state error . Therefore, our approach outperforms NRS–OFO on achie ving lo wer error for a given cost, while NRS–OFO’ s advantage is that it can operate without explicit knowledge of the system dynamics, which our method requires. V I I . C O N C L U S I O N S W e presented an online control frame work for opinion dynamics with unknown inﬂuence parameters, establishing theoretical guarantees on conv er gence of both the opin- ion trajectory and parameter estimates. Our analysis gives explicit contraction conditions for tar get con ver gence. In benchmarks against recent methods, our work performs near- optimally while providing formal guarantees for the analytic (a) Step-wise Gradient descent implemen- tation of NRS-OFO comparison (b) Steady state error across budgets. (c) Adaptive Control with Gradient De- scent vs Analytical Solving Fig. 3: Comparison of our method with the NRS-OFO method under dif ferent experimental setups. (a) Step-wise Gradient Descent. (b) Steady State Errors under ﬁxed budgets. (c) Parameter estimation of θ using (10) with gradient descent versus with our proposed method. control design, and does not incur the computational o ver - head of optimization. Moreov er , by focusing on a speciﬁc opinion-dynamics model, we achieve faster conv er gence to the target and direct estimation of the underlying parameters designed for a speciﬁed budget, rather than treating them implicitly within model-agnostic approaches. Future work will include extending estimation beyond the susceptibility parameter to jointly infer the interaction graph and its related parameters. W e will also generalize the control and analysis to heterogeneous tar gets i.e. d  = d 1 for any scalar d , and designing budget-aw are controllers that ensure progress under non-uniform objectiv es. R E F E R E N C E S [1] M. H. DeGroot, “Reaching a consensus, ” Journal of the American Statistical Association , vol. 69, no. 345, pp. 118–121, 1974. [Online]. A vailable: http://www .jstor .org/stable/2285509 [2] D. Acemoglu and A. Ozdaglar, “Opinion dynamics and learning in social networks, ” Dynamic Games and Applications , vol. 1, no. 1, pp. 3–49, 2011. [3] A. Peralta, A. Pluchino, A. E. Biondo et al. , “Opinion dynam- ics: a multidisciplinary revie w and perspecti ve, ” arXiv preprint arXiv:2201.01322 , 2022. [4] N. E. Friedkin and E. C. Johnsen, Choice Shift and Gr oup P olarization , ser . Structural Analysis in the Social Sciences. Cambridge University Press, 2011, p. 211–232. [5] L. W ang, Y . Xing, Y . Y i, M. Cao, and K. H. Johansson, “ Adding links wisely: how an inﬂuencer seeks for leadership in opinion dynamics?” Jun. 2025, arXiv:2506.12463 [eess]. [Online]. A vailable: http://arxiv .org/abs/2506.12463 [6] A. Shrinate and T . T ripathy , “Leveraging Network T opology in a T wo-way Competition for Inﬂuence in the Friedkin-Johnsen Model, ” Apr . 2025, arXiv:2504.03397 [eess]. [Online]. A vailable: http://arxiv .org/abs/2504.03397 [7] A. V . Proskurnikov , R. T empo, M. Cao, and N. E. Friedkin, “Opinion ev olution in time-varying social inﬂuence networks with prejudiced agents, ” vol. 50, no. 1, pp. 11 896–11 901. [Online]. A vailable: http://arxiv .org/abs/1704.06900 [8] L. W ang, C. Bernardo, Y . Hong, F . V asca, G. Shi, and C. Altaﬁni, “ Achieving consensus in spite of stubbornness: time- varying concatenated friedkin-johnsen models, ” in 2021 60th IEEE Confer ence on Decision and Control (CDC) . IEEE, pp. 4964–4969. [Online]. A vailable: https://ieeexplore.ieee.org/document/9683466/ [9] T .-C. Lee, J.-K. Huang, and Y . Su, “ A uniﬁed framework for con ver gence analysis in social networks, ” in 2024 IEEE 63r d Confer ence on Decision and Contr ol (CDC) , pp. 2940–2945, ISSN: 2576-2370. [Online]. A vailable: https://ieeexplore.ieee.org/document/ 10886608/ [10] I. Kozitsin, “Optimal control of opinion dynamics: Bounded- conﬁdence and dissimilativ e inﬂuence models, ” arXiv preprint arXiv:2207.01300 , 2022. [11] S. Neumann, Y . Dong, and P . Peng, “Sublinear -time opinion estimation in the friedkin–johnsen model, ” in Proceedings of the ACM W eb Confer ence 2024 , 2024, pp. 2563–2571. [12] C. Monti, G. De Francisci Morales, and F . Bonchi, “Learning opinion dynamics from social traces, ” in Pr oceedings of the 26th ACM SIGKDD International Confer ence on Knowledge Discovery & Data Mining , 2020, pp. 764–773. [13] M. Okawa and T . Iwata, “Predicting opinion dynamics via sociologically-informed neural networks, ” in Proceedings of the 28th ACM SIGKDD confer ence on knowledge discovery and data mining , 2022, pp. 1306–1316. [14] Y . Abbasi-Y adkori, D. P ´ al, and C. Szepesv ´ ari, “Improved algorithms for linear stochastic bandits, ” in Advances in Neural Information Pr ocessing Systems , vol. 24, 2011. [15] H. Bastani and M. Bayati, “Online decision-making with high- dimensional covariates, ” Operations Researc h , vol. 68, no. 1, pp. 276– 294, 2020. [16] N. Cesa-Bianchi and G. Lugosi, Pr ediction, Learning, and Games . Cambridge Univ ersity Press, 2006. [17] G. Goel and B. Hassibi, “Re gret-optimal estimation and control, ” IEEE T ransactions on Automatic Contr ol , vol. 68, no. 5, pp. 3041–3053, 2023. [18] A. Fontan, P . E. Colombo, R. Green, and K. H. Johansson, “Inﬂuencing opinion dynamics to promote sustainable food choices, ” IF AC- P apersOnLine , v ol. 58, no. 30, pp. 169–174, 2024, 5th IF AC W orkshop on Cyber-Physical Human Systems. [Online]. A vailable: https: //www .sciencedirect.com/science/article/pii/S2405896325001752 [19] A. V . Proskurnikov and R. T empo, “ A tutorial on modeling and analysis of dynamic social networks. part ii, ” Annual Reviews in Contr ol , vol. 45, 2018. [Online]. A v ailable: https://www .sciencedirect. com/science/article/pii/S1367578818300142 [20] R. M. Jungers, The Joint Spectr al Radius: Theory and Applications , ser . Lecture Notes in Control and Information Sciences. Springer , 2009, vol. 385. [21] M. H. DeGroot, “Reaching a consensus, ” Journal of the American Statistical Association , vol. 69, no. 345, pp. 118–121, 1974. [Online]. A vailable: http://www .jstor .org/stable/2285509 [22] P . Ioannou and B. Fidan, Adaptive control tutorial . Siam, 2006, vol. 11. [23] S. Chandrasekaran, G. D. Pasquale, G. Belgioioso, and F . D ¨ orﬂer , “Network-aware recommender system via online feedback optimization. ” [Online]. A vailable: http://arxiv .org/abs/2408.16899

On Online Control of Opinion Dynamics

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment