LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding

LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Designing state encoders for reinforcement learning (RL) with multiple information sources – such as sensor measurements, time-series signals, image observations, and textual instructions – remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules – such as their representation quality – limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline in which the LLM serves as a neural architecture design agent, leveraging language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.


💡 Research Summary

The paper tackles the under‑explored problem of designing state encoders for reinforcement learning (RL) agents that must process heterogeneous observations such as sensor vectors, time‑series signals, images, and textual instructions. Rather than hand‑crafting a monolithic encoder, the authors formalize the task as a composite neural architecture search (NAS) problem: for each of the M input sources a dedicated encoder module fθi is selected from a source‑specific search space, and a fusion module gϕ combines the module outputs into a compact latent state s = gϕ(fθ1(x1),…,fθM(xM)). The overall design space is the Cartesian product of all module‑level choices, which makes traditional NAS approaches—originally built for single‑modality supervised tasks—inefficient because evaluating each candidate in RL requires costly environment interactions.

To overcome this, the authors introduce LA‑CER (LLM‑driven Architecture Search for Composite State Encoders in RL), a closed‑loop pipeline that uses a large language model (LLM) as a neural‑architecture design agent. The LLM is prompted with a concise textual summary of the current candidate architecture, its RL performance (average return, task‑specific metric), and side‑information derived from intermediate module outputs (e.g., representation diversity, activation statistics, clustering scores). Using its pretrained linguistic and reasoning abilities, the LLM generates new candidate configurations—either one at a time (LA‑CER‑1) or in batches of five (LA‑CER‑5). These candidates are instantiated, trained end‑to‑end with a fixed RL algorithm (PPO) for a predetermined number of interaction steps, evaluated, and their richer feedback is fed back to the LLM for the next iteration. The loop continues until a predefined evaluation budget (50 total candidates) is exhausted.

The experimental domain is a mixed‑autonomy traffic‑control simulator where each timestep provides three distinct observation streams: (i) a time‑series of key traffic metrics, (ii) a fixed‑dimensional vector of lane‑level densities, speeds, and autonomous‑vehicle penetration, and (iii) a history of vehicle‑sequence actions. The authors allocate a transformer‑based encoder for each time‑series input, a feed‑forward network (FFN) for the vector input, and another FFN for the fusion stage. Search spaces for each module include choices of depth, hidden dimension, number of attention heads, etc., following standard transformer‑NAS taxonomies.

Baselines comprise (1) an expert‑designed encoder, (2) traditional NAS methods (DARTS, ENAS, PEPNAS) each generating five candidates per iteration, and (3) the LLM‑based GENIUS framework (GPT‑4) which proposes a single candidate per iteration. All methods are allocated the same total number of evaluated architectures (50). Results show that both LA‑CER variants achieve substantially higher average traffic speed than any baseline, converging to superior performance with far fewer evaluations. Ablation studies reveal that providing the LLM with representation‑quality signals (the side‑information) yields the biggest boost, while merely supplying the task metric or reward alone yields modest gains.

Key insights include:

  • Composite NAS formulation enables systematic joint optimization of source‑specific encoders and fusion, addressing the multi‑modal nature of many RL problems.
  • LLM as a design agent leverages prior knowledge about neural components and can reason over textual summaries of performance, dramatically reducing the number of expensive RL evaluations needed.
  • Side‑information from intermediate module outputs acts as a valuable auxiliary signal, allowing the LLM to prefer architectures whose sub‑modules produce high‑quality representations even before the final RL reward is observed.

The authors conclude that LA‑CER demonstrates a viable path toward sample‑efficient, automated architecture design for multi‑source RL. Future work will extend the approach to more complex domains such as goal‑oriented navigation, robotic manipulation with visual, tactile, and language inputs, and will explore meta‑learning techniques to continuously improve the LLM’s design policy.


Comments & Academic Discussion

Loading comments...

Leave a Comment