강화학습 기반 제어와 외란 관측기 및 이벤트 트리거 메커니즘을 결합한 통합 제어 구조

Reading time: 5 minute
...

📝 Abstract

This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.

💡 Analysis

This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.

📄 Content

Learning-based methods have become a fundamental paradigm in modern engineering systems, enabling algorithms to improve performance through data-driven adaptation without relying solely on explicit mathematical models. Over the past decade, advances in machine learning, particularly in function approximation, optimization, and representation learning, have significantly expanded the capability of intelligent systems operating under uncertainty, compared to traditional analytical methods (Qin et al. (2023); Zhang et al. (2024); Hu et al. (2025)). These approaches have been increasingly adopted in control, robotics, and even generative language models (Lu et al. (2020); Zhao et al. (2024); Tang et al. (2025); Yao et al. (2025)). However, conventional model-based techniques may be limited in their ability to handle nonlinearities, unknown disturbances, or incomplete system knowledge.

Reinforcement Learning (RL) has gained attention for complex decision-making and control in uncertain, dynamic environments (Tang et al. (2024d)). In control engineering, RL-based methods offer a data-driven alternative to classical model-based designs. This is useful when accurate system models are difficult to obtain. Among these methods, ADP integrates RL with optimal control theory. It facilitates near-optimal control of nonlinear systems by approximating value functions and control policies through function approximators. This eliminates the need to explicitly solve the Hamilton-Jacobi-Bellman (HJB) equation (Lewis and Vrabie (2009)). However, conventional ADP frameworks often rely on continuous or periodic updates to neural network parameters. These updates impose significant computational burdens and may lead to overfitting to transient disturbances or noise.

Event-triggered strategies have been widely adopted in diverse control applications, including networked and embedded systems, multi-agent coordination, and resourceconstrained robotic platforms (Onuoha et al. (2024b,a)). Meanwhile, the ETM has been widely employed in both control and ADP frameworks to reduce computational load (Han et al. (2024); Heemels et al. (2012); Tabuada (2007)). Unlike time-driven schemes, ETMs update only when systems meet a state-or error-based condition. State deviation or estimation error often directly triggers updates. This approach reduces redundant updates and preserves closed-loop stability (Dong et al. (2017); Xue et al. (2020); Onuoha et al. (2024b)). By limiting updates to key events, event-triggered ADP boosts efficiency and yields policies less sensitive to disturbances. Despite these advantages, engineers must ensure robustness against external disturbances and modeling uncertainties. In practice, environmental perturbations, unmodeled dynamics, nonlinear couplings, and parameter uncertainties cause disturbances. Many robust control approaches employ feedback to reduce perturbations rather than explicitly use feedforward compensation (Tang (2019); Tang et al. (2016Tang et al. ( , 2024a))). In this context, a ESO estimates the original states and accumulated interference in real time. This allows proactive compensation of parameter mismatches, unmodeled dynamics, and external perturbations in nonlinear systems (Luo et al. (2020); Tang et al. (2024c); Han (2009); Chen et al. (2016); Ran et al. (2021); Pu et al. (2015); Tang et al. (2019)).

Recent work combines ESO-based disturbance rejection with RL for uncertain nonlinear systems (Ran et al. (2022); Tang et al. (2024b)). However, these ESO-RL schemes primarily operate in a time-driven manner: both the controller and learning updates run continuously or periodically, lacking an event-triggered learning mechanism. Many continuous-time ADP designs also impose restrictive Persistence of Excitation (PE) conditions for parameter convergence (Kamalapurkar et al. (2016)), making them hard to verify and enforce in practice.

Inspired by these observations, we develop a composite control framework for output-feedback control of uncertain nonlinear systems with lumped disturbances. The main contributions are summarized as follows:

(1) A unified control structure incorporating ETM is

In this paper, we identify the control of a set of uncertain affine nonlinear systems described by

where x = [x 1 , . . . , x n ] T ∈ R n denotes the state of the measured subsystem with relative degree n; z ∈ R p represents the zero-dynamics state; η ∈ R denotes an external disturbance or uncertain parameter; u ∈ R is the control input; f z : R n ×R p ×R → R p is a smooth nonlinear mapping describing the evolution of zero dynamics; f, g : R n × R p × R → R are uncertain nonlinear functions characterizing the input and drift dynamics gain of the x subsystem; and A ∈ R n×n , B ∈ R n×1 and C ∈ R 1×n are the standard companion matrices defining a nominal chain-of-integrators structure of the output dynamics.

To enable subsequent observer and controller design, we impose the following standard assumptions. Ass

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut