Autonomous Question Formation for Large Language Model-Driven AI Systems
Large language model (LLM)-driven AI systems are increasingly important for autonomous decision-making in dynamic and open environments. However, most existing systems rely on predefined tasks and fixed prompts, limiting their ability to autonomously identify what problems should be solved when environmental conditions change. In this paper, we propose a human-simulation-based framework that enables AI systems to autonomously form questions and set tasks by reasoning over their internal states, environmental observations, and interactions with other AI systems. The proposed method treats question formation as a first-class decision process preceding task selection and execution, and integrates internal-driven, environment-aware, and inter-agent-aware prompting scopes to progressively expand cognitive coverage. In addition, the framework supports learning the question-formation process from experience, allowing the system to improve its adaptability and decision quality over time. xperimental results in a multi-agent simulation environment show that environment-aware prompting significantly reduces no-eat events compared with the internal-driven baseline, and inter-agent-aware prompting further reduces cumulative no-eat events by more than 60% over a 20-day simulation, with statistically significant improvements (p < 0.05).
💡 Research Summary
The paper addresses a fundamental limitation of current large‑language‑model (LLM)‑driven autonomous agents: they assume that the task to be solved is already known and focus on improving solution quality for that predefined problem. In dynamic, open‑world settings, the more critical capability is the ability to recognize what problem should be addressed in the first place. To fill this gap, the authors propose a human‑simulation framework that treats question formation as a first‑class decision process that precedes task selection and execution.
Core Concept
The system’s state at any time t is defined as a triple Cₜ = (Sₜ, Eₜ, Aₜ), where Sₜ is the internal condition of the AI (e.g., sensor readings, internal goals), Eₜ is the set of environmental observations, and Aₜ captures interaction context with other agents. A three‑stage prompting pipeline is introduced:
-
Internal‑driven prompting – The LLM receives a prompt containing only the internal state. Anomalies are detected by measuring deviation from a learned normal operating range (‖xₜ‑x_norm‖ > δ). When such deviations occur, the LLM is asked to generate questions about risk, repair, or replanning. Experience tuples (Sₜ, Qₜ, τₜ, Rₜ) are stored in a memory buffer M, allowing the system to learn, over time, which internal conditions merit questioning even when no immediate threat is present.
-
Environment‑aware prompting – The prompt is enriched with the current environmental factors Eₜ. Importance of each factor eᵢₜ is estimated by an importance function I(eᵢₜ|Cₜ)=f(eᵢₜ,G,Sₜ), where G is the high‑level goal (“live better”). Two heuristics are used to surface salient factors: (a) temporal differences Δeᵢₜ = eᵢₜ‑eᵢₜ₋₁, and (b) deviation from learned normal values (|eᵢₜ‑¯eᵢ| > εᵢ). Large changes trigger the LLM to ask questions about causes, risks, or opportunities associated with those changes.
-
Inter‑agent‑aware prompting – Information about other agents’ actions and states (Aₜ) is converted into natural‑language context and appended to the prompt. This enables the LLM to generate meta‑questions concerning social consequences, cooperation opportunities, or indirect environmental effects (e.g., how another agent’s resource consumption may affect the system’s long‑term utility).
The pipeline can be expressed as:
Cₜ →π_Q→ Qₜ →π_T→ τₜ →π→ (Rₜ, Uₜ)
where π_Q is the question‑formation policy, π_T the task‑selection policy, and π the execution policy. The objective is to maximize discounted long‑term utility U(Sₜ) over a planning horizon, i.e., to “live better.”
Learning Mechanism
Question‑formation policies are not static. The system continuously records (Sₜ, Qₜ, τₜ, Rₜ) tuples and applies reinforcement or meta‑learning to update π_Q, thereby reducing reliance on handcrafted rules. Over repeated episodes the agent discovers which internal anomalies, environmental changes, or social interactions are most predictive of future utility loss, and it learns to ask the corresponding questions proactively.
Experimental Evaluation
A multi‑agent simulation was built to emulate resource consumption, environmental regeneration, and social interaction. Three configurations were compared: (1) internal‑driven baseline, (2) internal + environment‑aware, and (3) internal + environment + inter‑agent‑aware. The primary metric was the number of “no‑eat” events—situations where an agent could not obtain food (or analogous resources) and therefore halted activity, reflecting a failure to anticipate critical problems.
Results showed that adding environment‑aware prompting significantly reduced no‑eat events compared with the baseline. Incorporating inter‑agent awareness further reduced cumulative no‑eat events by more than 60 % over a 20‑day simulated period, with statistical significance (p < 0.05). Correspondingly, overall survival rates and cumulative utility improved markedly.
Contributions and Implications
- Problem Reframing – The paper reframes autonomous behavior as a two‑step process: first discover the right question, then solve it. This mirrors human cognition and addresses a gap in current LLM‑driven agents.
- Human‑Simulation Framework – By explicitly modeling internal, environmental, and social contexts and feeding them to an LLM through progressively richer prompts, the framework enables semantic, high‑level reasoning about why a situation matters, without pre‑defined reward functions.
- Learnable Question Formation – The system learns from its own experience, gradually internalizing effective question patterns and reducing dependence on exhaustive rule sets.
Limitations and Future Work
The current validation is confined to a simulated environment with abstracted resource dynamics. Real‑world deployment would need to handle noisy sensor streams, real‑time latency of LLM calls, and safety constraints. Moreover, the paper does not explore how human operators might intervene or provide feedback on generated questions, an area ripe for future research. Extending the framework to embodied robotics, complex multi‑modal perception, and human‑AI collaborative settings would test its scalability and robustness.
In summary, the work introduces a novel cognitive layer—autonomous question formation—into LLM‑based AI agents, demonstrating that richer prompting scopes dramatically improve adaptability and long‑term performance in open, dynamic environments. This contribution opens new avenues for building truly self‑directed AI systems that can not only act but also decide what to act upon.
Comments & Academic Discussion
Loading comments...
Leave a Comment