Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization
Dynamic Optimization Problems (DOPs) are challenging to address due to their complex nature, i.e., dynamic environment variation. Evolutionary Computation methods are generally advantaged in solving DOPs since they resemble dynamic biological evolution. However, existing evolutionary dynamic optimization methods rely heavily on human-crafted adaptive strategy to detect environment variation in DOPs, and then adapt the searching strategy accordingly. These hand-crafted strategies may perform ineffectively at out-of-box scenarios. In this paper, we propose a reinforcement learning-assisted approach to enable automated variation detection and self-adaption in evolutionary algorithms. This is achieved by borrowing the bi-level learning-to-optimize idea from recent Meta-Black-Box Optimization works. We use a deep Q-network as optimization dynamics detector and searching strategy adapter: It is fed as input with current-step optimization state and then dictates desired control parameters to underlying evolutionary algorithms for next-step optimization. The learning objective is to maximize the expected performance gain across a problem distribution. Once trained, our approach could generalize toward unseen DOPs with automated environment variation detection and self-adaption. To facilitate comprehensive validation, we further construct an easy-to-difficult DOPs testbed with diverse synthetic instances. Extensive benchmark results demonstrate flexible searching behavior and superior performance of our approach in solving DOPs, compared to state-of-the-art baselines.
💡 Research Summary
Dynamic Optimization Problems (DOPs) pose a fundamental challenge because the objective landscape changes over time, invalidating accumulated search information and causing loss of diversity in evolutionary algorithms. Traditional approaches address this by hand‑crafted change detectors (e.g., re‑evaluation of archived points, fitness monitoring) followed by manually designed response mechanisms such as re‑initialization, particle calibration, or diversity boosting. While effective in narrowly defined settings, these pipelines suffer from limited generalization and require substantial expert effort.
The paper introduces Meta‑DO, a novel end‑to‑end reinforcement‑learning (RL) framework that eliminates the detect‑then‑act paradigm. Meta‑DO adopts a bi‑level architecture inspired by recent Meta‑Black‑Box Optimization (Meta‑BBO) work. The low‑level optimizer is a niching Particle Swarm Optimizer (NBNC‑PSO) that clusters individuals via a nearest‑better‑neighbour (NBNC) scheme, maintains multiple niches, and stores the best individuals of the last five generations in an elite archive. The archive not only preserves useful historical information but also provides an implicit signal of environmental drift.
The high‑level meta‑agent is a Deep Q‑Network (DQN) that treats the dynamic optimization process as a Markov Decision Process (MDP). At each iteration the agent observes a 10‑dimensional feature vector for every particle, comprising: (1) an environmental variation perception feature derived from the log‑scaled ratio of archived fitness values, (2) global and local normalized fitness, (3) progress and stagnation indicators (remaining function‑evaluations, generations without improvement of personal‑best or global‑best), (4) spatial topology features (distances to global‑best, local‑best, and personal‑best normalized by the search‑space diameter), and (5) a directional correlation feature measuring cosine similarity between the cognitive and social search directions. These features give the agent a rich, yet compact, view of both the population state and the underlying landscape dynamics without requiring an explicit change detector.
The action space is continuous; the DQN outputs the three PSO hyper‑parameters – inertia weight (w), cognitive coefficient (c1), and social coefficient (c2) – for each particle. This joint, fine‑grained control enables the optimizer to adapt its exploration‑exploitation balance on the fly. The reward is a log‑scaled performance gain: the reduction in offline error (the cumulative difference between the best found fitness and the true moving optimum) between successive steps, normalized by the environmental scaling factor derived from the archive. This design stabilizes learning across a wide range of change severities.
Meta‑DO is trained on a diverse suite of 32 synthetic DOP instances that span easy to hard difficulty, including linear additive noise, moving peaks, and hybrid transformations. After training, the policy is evaluated zero‑shot on unseen synthetic problems and on a real‑world unmanned surface vehicle (USV) navigation task. Baselines include classic change‑detection methods, recent diversity‑maintaining dynamic EAs, and state‑of‑the‑art Meta‑BBO approaches (RL‑based, supervised‑learning, neuroevolution). Across all benchmarks, Meta‑DO achieves 15–30 % lower offline error, converges faster after abrupt changes, and maintains higher solution quality throughout the optimization horizon. Ablation studies confirm that (i) the environmental variation feature is crucial for detecting drift, (ii) the log‑scaled reward improves stability, and (iii) continuous action output outperforms discretized alternatives.
In summary, the contributions of the paper are: (1) an end‑to‑end RL framework that replaces handcrafted detect‑then‑act pipelines, (2) a comprehensive state representation that implicitly captures landscape dynamics, (3) a log‑scaled reward scheme and joint hyper‑parameter control that ensure stable learning in non‑stationary environments, and (4) extensive empirical validation demonstrating superior performance and cross‑domain generalization. The authors suggest future extensions to multi‑objective DOPs, richer meta‑architectures (e.g., transformer‑based encoders), and deployment in industrial settings such as smart grids and robotics, where online adaptability and safety are paramount.
Comments & Academic Discussion
Loading comments...
Leave a Comment