Reading time: 44 minute
...

๐Ÿ“ Original Info

  • Title:
  • ArXiv ID: 2512.19799
  • Date:
  • Authors: Unknown

๐Ÿ“ Abstract

Advances in AI have produced agents whose knowledge and operational capabilities are comparable to those of human scientists, revealing their potential to assist, accelerate, and automate scientific research, thereby fundamentally reshaping the paradigm of scientific discovery. However, prior works either evaluate models on well-defined benchmarks, such as Olympiad problems (e.g., IMO, IPhO) or factual question answering, or focus on general-purpose tasks such as literature retrieval and information integration, rather than end-to-end problem solving in open-ended scientific scenarios. As a result, AI remains inferior and auxiliary in real-world scientific research. In particular, in physics, a fundamental yet abstract, inherently complex, and intellectually demanding domain, research often requires both intensive analytical reasoning and code-based numerical computation, a dual capability largely absent in previous agents. To meet the demands of physics research, we propose P H Y S M A S T E R, an LLM-based agent designed to operate as an autonomous theoretical and computational physicist. PHYSMASTER integrates theoretical reasoning and numerical computation techniques, and is further equipped with LANDAU, the Layered Academic DAta Universe that preserves precise retrieved papers, manually curated prior knowledge and validated methodology traces for reuse to enhance decision reliability and stability. Further, an adaptive exploration methodology is applied to balance exploration with efficiency, and adapt to ultra-long-horizon tasks. Spanning the scale from the cosmos to elementary particles, PHYSMASTER has reliable capabilities across various subfields of theoretical physics, including high-energy theory, condensed matter theory, cosmology & astrophysics, and quantum information, as validated by: โ€ข Two typical acceleration cases, in which P H Y S M A S T E R compresses the labor-intensive engineering parts of genuine physics research into less than 6 hours, which typically requires 1-3 months of a senior Ph.D.; โ€ข Two automation cases, in which the P H Y S M A S T E R is provided with a human-specified hypothesis or selected methods and automatically execute the exploration loop, conducting experiments, validating hypotheses, thereby compressing the end-to-end loop into 1 day, whereas the timeline is rather unpredictable and could take up months. โ€ข One autonomous discovery case, where the agent conducts fully independent exploration of a scientific problem that remains open and proposes an innovative method, marking the transition from AI co-scientist to autonomous AI scientist. From acceleration and automation to autonomous discovery, our work reveals the potential of AI in fundamental science and contributes to further AI-driven scientific discovery.

๐Ÿ“„ Full Content

The advancement in Large Language Models (LLMs) has profoundly reshaped both the ways we live and work, marking a new era in Artificial Intelligence (AI). From early conversational systems such as ChatGPT to more recent models Guo et al. (2025); Jaech et al. (2024); OpenAI (2024b), LLMs have exhibited substantial gains in abstract reasoning, long-horizon planning, and multi-step problem-solving. When augmented with tool-use and action-taking capabilities Schmidgall et al. (2025), these systems increasingly blur the boundary between passive language understanding and active task execution, raising the prospect that AI systems may approach-or in narrowly defined domains, surpass-human expert-level performance.

This aligns with the staged framework for Artificial General Intelligence (AGI) articulated by OpenAI OpenAI (2024a), which conceptualizes AI capabilities as several levels:

โ€ข Level 1 -Chatbots: AI with conversational language.

โ€ข Level 2 -Reasoners: human-level problem-solving.

โ€ข Level 3 -Agents: systems that can take actions.

โ€ข Level 4 -Innovators: AI that can aid in invention.

โ€ข Level 5 -Organizations: AI that can do the work of an orgnization.

Current frontier systems, augmented with tool-use, planning, and execution capabilities, are believed to be at Level 3, which is Agents. Meanwhile, the transition from agentic execution (Level 3) to genuine innovation (Level 4) is accelerating, where AI systems can autonomously generate novel hypotheses, conduct verification and yield novel discoveries. Consequently, the human-AI collaboration paradigm will be fundamentally reshaped: from human-directed AI assistance towards AI-orchestrated task execution with human oversight, and ultimately to greater autonomy.

It is widely recognized that AI holds the potential to transform scientific research paradigms and accelerate discoveries. In earlier stages, AI primarily functioned as a powerful, tool-like enabler of science, supporting tasks such as prediction, simulation, and data-driven inference. Scientific foundation models have emerged as powerful capability engines, with recent domain-specific models setting or approaching frontiers in key sub-tasks. For instance, AlphaFold3 advances biomolecular structure and interaction prediction Abramson et al. (2024), while GNoME enables large-scale discovery of stable inorganic crystals Merchant et al. (2023). GraphCast demonstrates that learned surrogates can rival numerical simulators for complex dynamical systems with significant speedups Lam et al. (2023), and Uni-Mol provides a universal 3D molecular representation framework unifying diverse downstream tasks Zhou et al. (2023). These breakthroughs furnish strong engines for inference, generation, and simulation-essential components of scientific cycles-yet they underscore that models alone address only fragments of scientific work and become transformative when integrated into larger systems.

As AI general capabilities advance, AI is evolving from a mere scientific tool to an active research labor force, enabling the rise of AI scientists that can autonomously participate in research and facilitate discovery, though within engineered environments. The AI co-scientist from Google/DeepMind, a multi-agent Gemini-based system, employs generate-debate-evolve cycles to propose biomedical hypotheses, some validated experimentally, while amplifying human scientists in prioritization and leaving final judgment to them Natarajan et al. (2025). In parallel, The AI Scientist and AI Scientist-v2 from Sakana AI automate the research loop in machine learning-from ideation to experiment design and paper writing-achieving peerreviewed acceptance of fully AI-generated papers Lu et al. (2024); Yamada et al. (2025). Beyond these, Robin offers a multi-agent framework automating literature research, hypothesis generation, experiment planning, and data analysis, applied in lab-in-the-loop settings to discover therapeutic candidates Ghareeb et al. (2025). Further, the recent Kosmos sustaining long-horizon cycles of literature search, data analysis, and hypothesis generation to produce traceable scientific reports and cross-domain discoveries Mitchener et al. (2025). Denario presents a modular multi-agent research assistant that covers idea generation, literature review, planning, and code execution across multiple scientific domains; however, the authors also reports frequent computational mistakes and occasional unsupported claims, underscoring the continued need for rigorous mathematical/numerical verification Villaescusa-Navarro et al. (2025). Despite these impressive demonstrations, current AI scientist systems are still largely optimized for text-centric domains, with limited ability to manipulate rigorous mathematical formalisms and to conduct robust numerical computation for verification, while also lacking the adaptability to long-horizon workflows.

Physics is a uniquely fundamental and comprehensive enterprise: it seeks universal principles spanning scales from the cosmos (Peebles, 2020) to the micro quantum world where Quantum Chromodynamics (QCD) governs quarks and gluons (Marciano and Pagels, 1978). Its unification comes from structures such as symmetry and gauge principles (Noether, 1983;Yang and Mills, 1954), effective field theory (Weinberg, 1979), and renormalization-group flow (Wilson, 1975), which systematically connect microscopic laws to macroscopic universality. At the research frontier, these ideas are expressed in highly abstract formalism. For instance, Lie groups and representation theory (Georgi, 2000), differential geometry of gauge fields (Wu and Yang, 1975), and topology/anomaly constraints (Adler, 1969) that rigidly delimit what theories are possible. Meanwhile, state-of-the-art numerical methods (e.g., Quantum Monte Carlo, DMRG) is essential for quantitatively validating the proposed theories. Therefore, physics stands as a monument to human intellect, shaped by generations of talented physicists who devoted their passion and wisdom, from Newton, Maxwell to Einstein, Landau, T.-D. Lee and C.-N. Yang. If AI can be integrated into physics research, it will liberate the talented from tedious engineering tasks and repetitive computations, accelerate the emergence of inspiration and its validation, and thereby help establish a new human-machine collaborative research paradigm.

Existing AI-for-physics efforts have achieved striking results but largely within isolated slices of the research loop. AI Feynman demonstrates physics-inspired symbolic regression that can recover compact closed-form relations from data Udrescu and Tegmark (2020), yet it does not autonomously perform the broader workflow of choosing formalisms, managing approximations, and closing the loop with numerical verification and uncertainty control. Physics-Informed Neural Networks (PINNs) incorporate PDE constraints into learning and can serve as flexible surrogates Raissi et al. (2019), but they typically presuppose a fixed modeling setup and often require expert tuning to remain reliable in stiff, multiscale, or long-time regimesagain falling short of end-to-end theory-computation iteration. In quantum many-body physics, neural quantum states provide expressive variational wavefunction representations Carleo and Troyer (2017), but they still rely on expert-specified Hamiltonians and sampling/optimization protocols, rather than perform an end-to-end research loop.

To meet the demands of physics research, we propose PHYSMASTER, an LLM-based agent designed to operate as an autonomous theoretical and computational physicist. P H Y S M A S -T E R integrates theoretical reasoning and numerical computation techniques, and is further equipped with a Layered Academic DAta Universe (L A N D A U), that preserves precisely retrieved papers, manually curated prior knowledge and validated methodology traces for future reuse, enhancing both efficiency and decision stability. Further, an adaptive multitrajectory exploration methodology is applied to balance exploration with efficiency, and adapt to ultra-long-horizon tasks. While mainstream LLMs primarily excel at text generation and prone to unverified derivations or computational hallucinations, P H Y S M A S T E R provides:

  1. Ultra-long-horizon workflows via MCTS exploration with traceable trajectory 2. Executable coding environments for feedbacks and self-evolution. 3. The LANDAU preserves manually curated prior knowledge and validated methodology traces for reuse and retrieval-augmented generation (RAG) to enhance reliability.

In contrast to prior AI scientist prototypes, which operate effectively in stage-specific or domain-specific pipelines (e.g., literature-driven hypothesis generation or ML research automation), P H Y S M A S T E R is tailored to physics research by: 1. Adopting a physicist mindset co-designed with domain experts 2. Possessing the dual capability of rigorous theoretical reasoning and executable coding 3. Leveraging an evolving physics-oriented knowledge base 4. Autonomously executing an end-to-end, ultra-long-horizon research loop rather than being confined to a single stage By bridging theoretical insight with computational prowess, P H Y S M A S T E R not only automates routine tasks but also fosters novel discoveries, paving the way for AI-led revolutions in physics research.

We establish a phased multi-agent system (MAS) that solve problems via sequential collaboration, in which the workflow can be divided into three major phases: Pre-Task, Task Execution, and Post-Task.

Figure 2 | The MAS Architechture and Workflow of P H Y S M A S T E R Our system adopts the mindset of physicists and integrate theoretical reasoning with code execution, thereby closing the loop from natural-language queries to research report with scientific rigor.

As the initial stage of the workflow, a Clarifier transforms the original natural-language query into a structured task by extracting essential information and decomposing it into subtasks.

The primary goal of this stage is to refine natural-language questions that tend to be information-heavy, lack semantic hierarchy, or contain ambiguities. This prevents structural redundancy and semantic ambiguity from inflating the token budget, obscuring key details, or even causing misunderstandings that lead the reasoning trajectory away from the intended problem. This process effectively improves both efficiency and efficacy, while sparing users from the burden of manually reorganizing queries into detailed, rigid formats. Conceptually, this stage also emulates how physicists in real research construct physical intuition-that is, by extracting simplified, intuitive, and abstract physical processes.

During this phase, the following information is extracted from the query:

โ€ข Basic information: topic, domain, task description, Input and output formats, etc.

โ€ข Task type:

-Engineering Computation: well-established models and mature methods requiring high-precision solutions, such as first-principles calculations or many-body numerics. -Hypothesis Testing: semi-open tasks that follow human-provided ideas or selected methodologies to attempt solving a problem or testing a hypothesis, involving a certain degree of innovation.

-Open-ended Exploration: fully open tasks lacking a predefined framework or mechanism, requiring autonomous hypothesis formation and validation. -Phenomenological Analysis: bridging theory and experiment by extracting parameters from data, constructing effective parametrizations, fitting models, etc.

โ€ข Physical constraints: symmetries, conservation laws, dimensional analysis, spacetime and energy scales, etc. โ€ข Relevant knowledge for literature retrieval and knowledge-base construction.

โ€ข Sequence of subtasks for dynamic scheduling during Task Execution.

Building upon our previous work PAPER SEARCH MASTER, a comprehensive academic paper search agent, we develop a physics-oriented literature retrieval module that constructs a taskspecific local library L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ . To ensure coverage, depth, and reliability, the retrieval process is jointly driven by two complementary agents:

โ€ข Quick Thinker: Operates based on intuition-driven expansion. It searches directly from queries or extends from retrieved papers using multiple retrieval tools, rapidly forming a broad candidate pool. โ€ข Reasoner: Utilizes strong reasoning capabilities to understand the problem context deeply and filter out papers with high semantic relevance to the query.

For each piece relevant knowledge extracted during the clarification, a precise retrieval round is executed. From each highly-relevant paper, we extract two categories of knowledge:

โ€ข Qualitative Knowledge: Physical principles, mechanisms, and conceptual structures (e.g., the existence of a phase transition, dominant competing effects). These guide the task-execution agent to grasp essential physics and avoid foundational errors. โ€ข Quantitative Knowledge: Analytical expressions, precise numerical results, or calibrated parameters that support model construction and numerical computation, forming the basis for reliable criticism.

This task-specific local library L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ serves as the foundation for retrieval-augmented generation (RAG) in the Task Execution stage. The integration of qualitative and quantitative knowledge enables the system to overcome hallucination and shift from language generation to evidence-grounded scientific reasoning. The benefits of L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ are summarized as follows:

โ€ข Traceability: Each RAG decision is interpretable and verifiable via evidence chain.

โ€ข Generalization: Task-specific construction of L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ enables rapid adaptation to new research directions (e.g., quantum many-body, condensed matter, high-energy physics) without additional fine-tuning. โ€ข Long-Context Integration: Physics research often spans multiple papers, decades of theoretical development, and methodological comparisons. Through retrieval-filteringreconstruction, RAG allows LLMs to operate on knowledge far exceeding their native context window in a structured and efficient manner.

Solving real scientific problems in theoretical or computational physics is typically an ultralong-horizon test-time scaling task, often requiring proper task decomposition, flexible subtask scheduling, and a large number of iterations of draft-evaluate-revise that exceed the context window of any single model. Therefore, context management becomes the core systems challenge, meaning that the agent must preserve progress and reuse experience without accumulating an ever-growing prompt.

Inspired by ML-Master Liu et al. (2025), we adopt Monte Carlo Tree Search (MCTS) to balance exploration and efficiency when dealing with such tasks. In essence, each MCTS node represents an attempt to search for or refine a partial solution. Through multi-trajectory expansion, the process ultimately yields an optimal trajectory that serves as the complete solution. As mentioned previously, the raw query is decomposed into executable subtasks by the clarifier before task execution, and these subtasks are individually assigned to MCTS nodes. Along with each subtask, the assigned MCTS node is provided with (a) a concise summary of relevant prior exploration results (primarily inherited from the parent node), and (b) RAGbased background knowledge required for the current subtask, ensuring the completeness and comprehensiveness of the available information. Task decomposition and flexible scheduling enable ultra-long-horizon tasks to be completed through iterative execution, while selectively scoped information, rather than the raw full history, effectively addresses the challenges of context management.

More concretely, two agents collaborate hierarchically during task execution:

โ€ข Supervisor: Operates independently of individual nodes and is responsible for flexible scheduling and progress management. It performs:

-Scheduling / Progress Management: Decides the subtask to be assigned to MCTS node and provide current progress and provide RAG-based background knowledge needed for the current subtask. -Evaluation / Summarizing: Invokes the knowledge base to rigorously evaluate and summarize the Theoretician’s output.

โ€ข Theoretician: For each assigned subtask, the Theoretician constructs theoretical models, performs analytical reasoning, or translates models into executable code for numerical computation.

A decisive factor for successful long-horizon iteration is obtaining correct feedback signals, since the guidance for refinement and expansion is essential. In the MCTS Exploration of P H Y S M A S T E R, RAG-Based factual feedback is produced by the supervisor as a critic: it (i) assigns scalar rewards, (ii) determines the types of subsequent nodes, (iii) summarizes the current exploration state to preserve the minimal information needed and (iv) actionable critique to guide next step. The information above are fed back into the tree policy to guide subsequent selection and expansion.

During tree search, nodes are selected using UCT (Upper Confidence bounds applied to Trees) scores, and the structured search tree is expanded iteratively. The UCT value for a node ๐‘ฃ is calculated as:

, where ๐‘„ ๐‘ฃ is the total reward accumulated for the node, ๐‘ ๐‘ฃ is the number of visits to the node, ๐‘ parent is the number of visits to its parent, and ๐ถ is a constant that controls the trade-off between exploration and exploitation.

MCTS provides a principled mechanism to (i) exploit high-reward partial solutions, while (ii) continuing to explore under-investigated alternatives to avoid premature convergence. This treestructured exploration is especially beneficial in long-horizon settings because the search tree itself serves as an externalized record of diverse trajectories. By prioritizing under-explored paths through UCT and supporting parallel multi-trajectory expansion, MCTS enhances efficiency and scalability in navigating vast solution spaces typical of physics problems.

In honor of the eminent universal physicist Lev Landau, we propose L A N D A U, the Layered Academic DAta Universe affiliated to P H Y S M A S T E R, which is designed to navigate P H Y S -M A S T E R and consists of three layers:

where L, M, P denote library, priors, methodology, respectively.

We define these layers as follows:

โ€ข Library L: Knowledge extracted from the precise retrieved papers.

โ€ข Methodology M: Validated effective reasoning path either manually curated or extracted from each successful task, enabling efficient reuse in familiar domains. โ€ข Priors P: Manually curated high-confidence knowledge, including concise verified conclusions or distilled textbook/authoritative papers. Such priors are essential in enhancing the reliability of the critic and preventing potential fundamental errors.

The library L is evolving via the accumulation of local library L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ . After each task, the task-specific library L ๐ฟ๐‘œ๐‘๐‘Ž๐‘™ built via literature retrieval are integrated into L; that is

Similarly, for the Methodology component M, upon successful task completion, validated effective reasoning trajectories and technical details are summarized, structured, and archived as methodology M ๐‘– for future reuse; that is

Such continual accumulation enriches long-term knowledge and enables the autonomous evolution of L A N D A U towards comprehensive domain-specific knowledge base.

In physics-a foundational discipline characterized by high abstraction, intrinsic complexity, and substantial intellectual demand-the research inevitably faces several persistent challenges:

โ€ข Time dominated by repetitive engineering. A nontrivial fraction of research hours is spent on labor-intensive yet low-level tasks, such as repeatedly implementing numerical solvers and conducting long cycles of manual hyperparameter tuning. โ€ข An exceptionally long path from ideas to verifiable conclusions. Translating a concept into a defensible result typically requires a prolonged pipeline-idea โ†’ derivation โ†’ repeated numerical experiments โ†’ rigorous comparison to prior literature-often spanning months before a final judgment can be made. โ€ข High abstraction and experience-dependent insight. Theoretical progress is frequently driven by tacit expertise, intuition, and hard-earned scientific taste, making discovery uneven and difficult to scale.

Correspondingly, we distill three representative problem classes from real-world physics research workflows, ordered by increasing degrees of autonomy:

  1. Acceleration: The labor-intensive engineering components of physics research (e.g., coding, debugging, and standard numerical routines) where methods basically wellestablished and outcomes are relatively predictable. We expect P H Y S M A S T E R to eliminate the barrier to mastering trivial techniques and liberate the talented researchers from tedious, repetitive engineering tasks.

Semi-open problems with moderate novelty. Given a human-specified hypothesis, plan, and a selected class of methods, we expect PHYSMASTER to automatically execute the exploration loop, running experiments, validating hypotheses, and iterating efficiently, thereby substantially compressing the end-to-end research cycle. 3. Autonomous discovery. Starting from empirical phenomena or scientific challenges, conduct fully independent exploration: propose hypotheses, design and run validation protocols, and iterate towards convincing result, realizing the leap from an AI co-scientist to an AI auto-scientist.

Starting from empirical phenomena or problems remaining open, conduct fully independent exploration-propose hypotheses, design and conduct validations, and iterates-realizing the transition from an AI co-scientist to an AI auto-scientist.

Lattice QCD (LQCD) provides a first-principles, nonperturbative framework for computing hadronic observables by evaluating the QCD path integral on a discretized Euclidean spacetime using Monte Carlo techniques Gattringer and Lang (2010); Wilson (1974). Within this framework, transverse-momentum-dependent observables (TMDs) and their rapidity evolution are characterized by the Collins-Soper (CS) kernel ๐พ (๐‘ โŠฅ , ๐œ‡) Collins ( 2011), whose nonperturbative behavior at large ๐‘ โŠฅ can be accessed through lattice methods.

The present study addresses the stage of the LQCD workflow that transforms Euclidean correlators into continuum-like quasi observables and, subsequently, into the CS kernel. This transformation relies on a controlled sequence of renormalization, large-momentum analysis, and Fourier transformation steps consistent with the LaMET framework Ji (2013Ji ( , 2014)); Ji et al. (2021), and is used here as a representative application of the automated analysis pipeline.

In this section we summarize the full lattice-QCD workflow executed by the PHYSMASTER to extract the Collins-Soper (CS) kernel from quasi-TMD wave functions of the pion. The system performs an end-to-end chain of operations, starting from raw Euclidean two-point correlators and Wilson-loop data provided by Ref.

removes the dominant time dependence related to the hadron energy.

To isolate the single-hadron matrix element, P H Y S M A S T E R performs either a one-state or two-state fit:

where the fit range is selected automatically using log-effective-plateau diagnostics combined with covariance-aware ๐œ’ 2 /dof ranking. The removal of the linear divergence leads to a significantly improved large-๐‘ง behavior compared to the bare results.

Fig. 4 presents a comparison between the results obtained using Ref. Tan et al. (2025) and those from P H Y S M A S T E R for extracting the matrix element at ๐‘ƒ ๐‘ง = 1.47GeV, ๐‘ง = {0, 2}๐‘Ž, and ๐‘ = 3๐‘Ž. The raw data and the corresponding fitted results from both approaches agree within uncertainties, indicating that P H Y S M A S T E R can reliably handle large-scale and otherwise tedious fitting tasks, while automatically determining a reasonable fit window. This yields the full coordinate-space distribution of the bare quasi-TMDWF matrix element for all integer ๐‘ง โˆˆ [-12, 12]๐‘Ž. Fig. 5 displays the coordinate-space profile of the bare matrix elements at ๐‘ƒ ๐‘ง = 1.47 GeV and ๐‘ = 3๐‘Ž, as obtained from both Ref. Tan et al. (2025) and P H Y S -MASTER. The real and imaginary components extracted from the two methods exhibit mutually compatible behavior within statistical uncertainties.

To remove the linear divergence of the staple-shaped gauge link, the computation uses the nonperturbative renormalization prescription Dotsenko and Vergeles (1980); Ebert et al. (2019a); Ji et al. (2018) ฮฆren (๐‘ง, ๐‘) = ฮฆbare (๐‘ง, ๐‘)

where ๐‘ ๐ธ is supplied as the expectation value of Wilson loops with longitudinal length ๐‘ง + 2๐ฟ at fixed transverse separation ๐‘ = 3๐‘Ž. Thus, the results shown in Fig. 6 indicate that PHYSMASTER identifies the correct ๐‘ง + 2๐ฟ entry for each matrix element and performs the renormalization pointwise.

At large ๐œ† the lattice signal suffers exponential noise degradation. PHYSMASTER stabilizes the correlator tail through a physics-motivated joint fit to the expected LaMET asymptotic form Alexandrou et al. (2020); Izubuchi et al. (2018); Ji (2013):

A simultaneous fit to the real and imaginary parts is carried out over a sliding window of large ๐œ†, with Bayesian priors enforcing smoothness and positivity constraints on ๐œ† 0 .

Re[ (b = 3a, P z = 1.47GeV)] arxiv : 2511.22547 PhysMaster The fitted asymptotic tail replaces the noisy data beyond the breakdown point, producing a smoothly continued coordinate-space matrix element ฮฆ(๐œ†, ๐‘) valid over an extended domain. Fig. 7 presents the extrapolated coordinate-space matrix elements at ๐‘ = 3๐‘Ž and ๐‘ƒ ๐‘ง = 1.47 GeV obtained from both Ref. Tan et al. (2025) and PHYSMASTER. As is evident from the comparison, the extrapolation performed in Ref. Tan et al. (2025) is noticeably more conservative. In contrast, P H Y S M A S T E R is capable of enforcing a prescribed asymptotic behavior while performing the continuation. Although the two approaches differ in the oscillatory region, both extrapolated curves consistently approach zero at large separations.

The quasi-TMD wave function in momentum space is defined as Lin et al. (2018); Radyushkin (2017) f

2 ) ๐œ† ฮฆ(๐œ†, ๐‘ โŠฅ ).

(5)

After replacing the unstable large-๐œ† signals with the extrapolated values and discretizing the above formula, P H Y S M A S T E R produced the results shown in the left panel of Fig. 8. The comparison indicates that the real parts obtained from Ref. Tan et al. (2025) and PHYSMASTER exhibit noticeable differences within the physical region ๐‘ฅ โˆˆ [0, 1], while the imaginary parts remain largely consistent. This discrepancy in the real part can be attributed to the different extrapolation schemes applied at large ๐œ†, and it is typically accounted for as a source of systematic uncertainty. For each ๐‘ฅ โˆˆ (0, 1) the kernel is computed with full covariance propagation. PHYSMASTER then removes the residual ๐‘ฅ dependence by:

โ€ข fitting ๐พ (๐‘ฅ) to a constant plateau in the central ๐‘ฅ region,

โ€ข performing a correlated average weighted by the inverse covariance.

The right panel of Fig. 8 presents a comparison between the Collins-Soper kernel extracted using Ref. Tan et al. (2025) and P H Y S M A S T E R. The two central values of ๐พ (๐‘ โŠฅ =3๐‘Ž, ๐œ‡) are consistent within uncertainties. However, the statistical errors from P H Y S M A S T E R are noticeably smaller, which may be due to the neglect of systematic uncertainties arising from the extrapolation procedure.

The results above illustrate that P H Y S M A S T E R provides an efficient and robust framework for automating complex lattice-QCD analyses. Starting from raw two-point correlators and Wilson-loop inputs, the pipeline executes all essential stages-from correlator fitting and renormalization to large-๐œ† stabilization, Fourier transformation, and LaMET matching-in a consistent and reproducible manner, substantially reducing manual intervention.

A key strength of PHYSMASTER lies in its scalability. As the dimensionality of the problem increases, for example through additional momenta, lattice ensembles, or operator structures, the framework can process the resulting datasets efficiently while maintaining statistical stability and reproducibility. Automated selections of fit ranges and continuation parameters further limit human bias, leading to reduced statistical fluctuations relative to traditional manual workflows.

The CS-kernel analysis presented here constitutes only a limited demonstration of the capabilities of PHYSMASTER. The primary objective is not the CS kernel itself, but the acceleration and standardization of the entire lattice QCD analysis workflow. Future developments will focus on automated treatments of systematic uncertainties and extensions to broader classes of observables, multiple lattice spacings, and continuum extrapolations. In this sense, the present study serves as a proof of principle, and readers are invited to anticipate more comprehensive applications of P H Y S M A S T E R to large-scale lattice QCD computations.

The variational method is a cornerstone of quantum mechanics, providing a rigorous upper bound to the ground-state energy of many-body systems, as described in standard texts by Ref. Griffiths and Schroeter (2018) and Ref. Levine (2013). While modern quantum chemistry relies heavily on established software packages (e.g., the Gaussian suite of Frisch et al. (2016) and the PySCF framework by Sun et al. ( 2018)) using Gaussian-type orbitals (GTOs) as pioneered by Boys (1950), constructing a solver ab initio-starting from the analytical evaluation of integrals and basis set design-remains a fundamental exercise in theoretical physics. Crucially, this evaluation precludes access to external global knowledge bases or literature searches, thereby isolating the agent’s intrinsic ability to reason through the Hamiltonian’s structure, justify approximations, and ensure numerical robustness purely from first principles.

The specific scientific problem addressed here is the calculation of the first electronic excitation energy of the neutral Lithium atom (Li, ๐‘ = 3), corresponding to the transition 1๐‘  2 2๐‘  1 โ†’ 1๐‘  2 2๐‘ 1 . This requires solving the Schrรถdinger equation for a three-electron system under the Born-Oppenheimer approximation. Unlike routine computations, this task imposes a strict constraint: the solver must be built from scratch using only the Julia standard library, forcing the autonomous agent to derive, implement, and optimize the necessary theoretical components-including basis set construction, angular momentum algebra, and numerical integration schemes-without black-box dependencies.

Solving this problem autonomously involves overcoming significant theoretical and algorithmic barriers:

โ€ข Theoretical Derivation: The agent must correctly derive the energy functionals for openshell doublet states. This involves handling two-electron Coulomb (๐ฝ) and Exchange (๐พ) integrals, where specific angular momentum coupling coefficients (e.g., the factor of 1/3 in the ๐‘ -๐‘ exchange interaction) must be rigorously determined using the coupling schemes of Ref. Condon and Shortley (1935). โ€ข Numerical Singularity and Tails: The radial integrals involve evaluating wave functions from the nucleus (๐‘Ÿ โ†’ 0, where 1/๐‘Ÿ potentials diverge) to the asymptotic tail (๐‘Ÿ โ†’ โˆž).

Naive integration schemes often fail to capture both regimes efficiently. โ€ข Basis Set Design: Without access to pre-tabulated basis sets (like the STO-3G set defined by Ref. Hehre et al. (1969)), the agent must construct a minimal, physically motivated basis (Slater-Type Orbitals) and implement an optimization strategy to find the variational parameters that minimize the energy.

Manually deriving these formulas and writing a bug-free, optimized numerical integrator typically requires days of effort for a graduate student.

The workflow proceeded in three distinct, self-correcting stages: Theoretical Construction, Algorithmic Implementation, and Critical Verification. a. Theoretical Construction and Basis Design. First, the physical model was established by selecting Slater-Type Orbitals (STOs), ๐‘… ๐‘›,๐‘™ (๐‘Ÿ) โˆ ๐‘Ÿ ๐‘›-1 ๐‘’ -๐œ๐‘Ÿ , as the basis functions. These were chosen for their correct cusp behavior at the nucleus, which offers superior convergence compared to GTOs for minimal basis calculations, as analyzed by Ref. Slater (1930). Energy expressions for the doublet configurations were derived as follows:

Crucially, for the ๐‘ -๐‘ exchange integral ๐พ ๐‘ ๐‘ , the contribution was identified as arising solely from the ๐‘˜ = 1 term in the multipole expansion, with the prefactor 1/3 derived from the integration over spherical harmonics.

b. Numerical Implementation. To handle the semi-infinite integration domain [0, โˆž), a non-uniform grid mapping was implemented:

This transformation maps the infinite range to a finite interval while naturally concentrating grid points near the nucleus (๐‘  โ†’ 0), thereby improving the accuracy for the nuclear potential ๐‘‰ โˆ -๐‘/๐‘Ÿ.

For the two-electron integrals, which formally scale as ๐‘‚(๐‘ 2 ), an efficient ๐‘‚(๐‘) algorithm was implemented utilizing the split-domain identity for the radial multipole expansion found in Ref. Szabo and Ostlund (1996):

where B and C are cumulative integrals of the density ๐‘”(๐‘Ÿ). Additionally, a Gram-Schmidt orthogonalization routine was implemented to ensure that the 1๐‘  and 2๐‘  orbitals remained orthogonal (โŸจ1๐‘ |2๐‘ โŸฉ = 0) despite being defined by distinct variational exponents ๐œ.

c. Optimization and Verification. A nested grid-search algorithm was employed to optimize the exponents. The ground state parameters (๐œ 1 , ๐œ 2 ) were optimized first, followed by the optimization of the excited state parameter ๐œ ๐‘ with ๐œ 1 fixed at its ground-state value. Following the implementation, the theoretical framework and workflow underwent a rigorous verification process. Analytical normalization constants were verified, the absence of the ๐‘˜ = 2 term in the direct ๐‘ -๐‘ interaction was confirmed to be theoretically consistent, and the stability of the integration grid (๐‘ = 4001 points) was validated.

The agent successfully constructed a full-stack variational solver and computed the following energies:

โ€ข Ground State (1๐‘  2 2๐‘ ): ๐ธ ๐‘” = -7.4178 Ha (Optimized ๐œ 1๐‘  = 2.68, ๐œ 2๐‘  = 0.63).

โ€ข Excited State (1๐‘  2 2๐‘): ๐ธ ๐‘’ = -7.3504 Ha (Optimized ๐œ 2๐‘ = 0.52).

The resulting first excitation energy is:

This result is in remarkable agreement with the experimental value of approximately 0.0679 Ha (โˆผ 1.85 eV) from the database of Ref. Kramida et al. (2023), with a deviation of only 0.0004 Ha. This high accuracy, achieved with a minimal basis set, demonstrates the agent’s ability to not only write code but to perform high-level physical reasoning-correctly handling angular momentum algebra, designing stable numerical schemes, and rigorously optimizing variational parameters. The work highlights the potential of AI agents to accelerate the “engineering” aspects of theoretical physics, transforming abstract problem statements into high-precision numerical results in a fully autonomous loop, consistent with the general theoretical framework detailed by Sachdev (2011).

While this problem itself is not particularly challenging academically, its significance lies in demonstrating the feasibility of long-horizon, high-rigor, and fully verifiable production-grade scientific computing workflows-an essential prerequisite for closing the loop from physical reasoning, algorithmic implementation, to autonomous research at scale.

Quantum Monte Carlo (QMC)

P H Y S M A S T E R addresses one of the most challenging problems in quantum many-body physics: the precise determination of the quantum phase transition point in the Union Jack Bose-Hubbard Model (BHM). The BHM serves as the canonical theoretical framework for describing the quantum dynamics of interacting bosons on optical lattices, capturing the competition between kinetic delocalization and interaction-induced localization that drives the zero-temperature Superfluid (SF) to Mott Insulator (MI) quantum phase transition Fisher et al. (1989); Greiner et al. (2002). The establishment of this framework laid the theoretical groundwork for the research leading to the 2001 Nobel Prize in Physics on Bose-Einstein condensation. While the square lattice has been exhaustively characterized, the isotropic Union Jack lattice, which was constructed by augmenting a square lattice with diagonal next-nearest-neighbor (NNN) hopping, presents a topologically distinct, non-bipartite geometry with a high coordination number of ๐‘ง = 8.

Determining the critical ratio (๐‘ก/๐‘ˆ) ๐‘ on such highly connected architectures is fundamental for validating universality classes in complex geometries ล ฤ…cki et al. (2016). The Hamiltonian governing the system is given by:

where the hopping amplitude ๐‘ก is uniform for both nearest-neighbor (NN) and diagonal bonds.

The enhanced connectivity (๐‘ง = 8) drastically lowers the energy cost for particle delocalization compared to the square lattice (๐‘ง = 4), theoretically suppressing the critical point well below the standard square lattice value of (๐‘ก/๐‘ˆ) sq ๐‘ โ‰ˆ 0.0597. Precise numerical benchmarks are required to quantify this suppression and confirm the robustness of the (2 + 1)D XY universality class in the presence of frustration-free triangular loops, consistent with the general theoretical framework detailed by Ref. Sachdev (2011).

This work is not merely code implementation or parameter scanning; it constitutes a fullstack, algorithmically-driven, Ph.D. thesis-level research effort. The primary obstacle lies in overcoming the severe algorithmic opacity inherent in high-precision Quantum Monte Carlo (QMC), utilizing techniques such as those reviewed by Ref. Sandvik (2010). The primary obstacle lies not in the derivation of equations, but in the stochastic engineering required to implement a valid Stochastic Series Expansion (SSE) with directed-loop updates, building upon the original worm algorithm of Ref. Prokof’ev et al. (1998) and the directed-loop formulation of Ref. Syljuรฅsen and Sandvik (2002). PHYSMASTER was required to autonomously resolve topological subtleties specific to the Union Jack lattice, particularly the non-trivial accounting of winding number topology and the challenge of critical slowing down.

PHYSMASTER had to autonomously implement a highly optimized Robbins-Monro stochastic root-finding scheme, root-finding scheme, following the method of Ref. Robbins and Monro (1951), to tune the chemical potential ๐œ‡ against a vanishing compressibility (๐œ… โ†’ 0) near the critical point-a task where naive bisection fails. Furthermore, extracting the thermodynamic limit requires physical intuition to distinguish genuine finite-size scaling behavior from corrections to scaling. Remarkably, the agent completes this entire research task without any access to global knowledge bases or literature search. This proves its ability to derive every algorithmic detail-from the SSE update weights to the topological constraints-directly from the first principles of the Hamiltonian, demonstrating a capacity for independent scientific discovery and rigorous theoretical reasoning.

The learning curve to master QMC for a Ph.D. student is steep, while a researcher typically needs more than one year to reach a senior level of expertise to effectively apply it. Futhermore, the independent completion of this project, which involves the construction, debugging, and optimization of high-precision QMC code, would conventionally necessitate one to three months’ full-time effort. However, the deployment of PHYSMASTER profoundly accelerates this research cycle, drastically reducing the time commitment required.

PHYSMASTER employ the Stochastic Series Expansion (SSE) QMC algorithm with directedloop updates to sample the partition function ๐‘ = Tr[๐‘’ -๐›ฝ ฤค ] without discretization error Wenzel and Janke (2009); Xu et al. (2019). Simulations are conducted on ๐ฟ ร— ๐ฟ tori with periodic boundary conditions for system sizes ๐ฟ โˆˆ {8, 12, 16, 20}.

To strictly target the tip of the ๐‘› = 1 Mott lobe, PHYSMASTER operated in the grand-canonical ensemble. The chemical potential ๐œ‡ was dynamically tuned using a compressibility-driven Robbins-Monro recursion algorithm. At the ๐‘˜-th update step, ๐œ‡ was adjusted as:

where ๐›ผ ๐‘˜ is a decaying step size sequence. This adaptive method utilizes the on-the-fly measurement of compressibility ๐œ… ๐‘˜ to stabilize convergence, achieving a density tolerance of | โŸจ๐‘›โŸฉ -1| โ‰ค 5 ร— 10 -4 across all parameter sets.

To probe the quantum critical point (dynamic exponent ๐‘ง = 1), PhysMaster fixed the aspect ratio ๐›ฝ = 1.5๐ฟ (in units of 1/๐‘ˆ). The critical point was identified via the scaling of the superfluid stiffness ๐œŒ ๐‘  โˆ ๐‘Š 2 /๐›ฝ, where ๐‘Š 2 is the squared winding number. PhysMaster utilized single-histogram reweighting of the kinetic operator count ๐พ to interpolate ๐‘Š 2 (๐‘ก) continuously, allowing for a precise determination of the crossing location significantly below the simulation grid spacing. Crossing points ๐‘ก * (๐ฟ 1 , ๐ฟ 2 ) of the scale-invariant observable ๐‘Š 2 are determined for successive system sizes. PHYSMASTER employ single-histogram reweighting of the kinetic operator count ๐พ to interpolate ๐‘Š 2 (๐‘ก) continuously between simulated grid points (๐‘ก โˆˆ [0.027, 0.033]), allowing for a precise determination of the crossing location significantly below the simulation grid spacing.

The thermodynamic limit was extracted by PhysMaster by autonomously extrapolating the finite-size crossing points ๐‘ก * (๐ฟ 1 , ๐ฟ 2 ) against the scaling variable 1/ โˆš ๐ฟ 1 ๐ฟ 2 :

where

P H Y S M A S T E R’s achievement is its ability to autonomously execute the entire scientific discovery loop. From physical modeling and high-precision algorithm optimization to complex finite-size scaling extrapolation, it can accomplish the whole task without any human intervention. such an autonomous work establishes the most rigorous numerical benchmark for the Bose-Hubbard model on the isotropic Union Jack lattice. Through the automated integration of SSE directed-loop updates, stochastic ๐œ‡-tuning, and histogram reweighting, P H Y S M A S T E R determine the critical point to be:

The small statistical uncertainty (โˆผ 0.7%) confirms the efficacy of the methodology. Physically, this value represents a โ‰ˆ 50% reduction compared to the square lattice critical point (0.0597), quantitatively demonstrating the strong stabilization of the superfluid phase driven by the high coordination number (๐‘ง = 8). The results are fully consistent with (2 + 1)D XY universality class predictions.

Most significantly, this result was achieved through an entirely autonomous AI workflow. The agent successfully navigated the full scientific loop: from implementing the correct winding number topology for diagonal bonds to executing high-precision finite-size scaling analysis. This demonstrates an emerging capability for AI systems to independently drive discovery in computational many-body physics, moving from simple code generation to complex, physically motivated algorithmic reasoning. This project serves as a powerful demonstration of how integrated AI systems can dramatically enhance the efficiency and accessibility of computational physics research, setting a new standard for automated scientific discovery. According to the FSS hypothesis, the curves for different ๐ฟ intersect precisely at the quantum critical point ๐‘ก ๐‘ . The estimated critical value is ๐‘ก ๐‘ /๐‘ˆ = 0.02992 ยฑ 0.00020, which is marked by the dashed line and the shaded grey region, demonstrating the uncertainty of the intersection point.

Tidal Disruption Events (TDEs) occur when a star passes near a supermassive black hole (SMBH; โˆผ 10 6 ๐‘€ โŠ™ ), leading to its complete tidal destruction and formation of an elongated debris stream, roughly half of which falls back toward the black hole. During fallback, the stream accelerates and thins, and near pericenter, fluid elements collide at high relative speeds, dissipating energy in a process known as the nozzle shock. General relativistic effects, including apsidal and nodal precession, further modify stream trajectories and dissipation near rapidly spinning SMBHs. TDEs are central to the frontier of black hole physics, building upon the foundational work recognized by the 2020 Nobel Prize in Physics for confirming the existence of supermassive black holes and validating General Relativity.

Previous studies have largely considered single-energy fallback, ignoring the star’s finite size. In reality, different stellar regions experience varying tidal potentials, producing a spread of debris energies and slightly differing geodesics. These multi-energy streams intersect at non-zero angles at the nozzle, combining transverse compression with longitudinal velocity projections, potentially enhancing dissipation. P H Y S M A S T E R refer to this mechanism, arising from the tidal energy spread, as differential precession.

Since the establishment of the standard TDE framework in the 1980s (Carter and Luminet, 1983), theoretical and simulation studies have evolved rapidly, refining the initial models. Traditional theory posits that nozzle shocks near pericenter dissipate significant kinetic energy, aiding accretion disk formation (Evans and Kochanek, 1989;Shiokawa et al., 2015). Yet, fully simulating a TDE-from stellar approach and disruption to stream formation, collisions, disk circularization, and radiation-requires comprehensive GR magnetohydrodynamics (MHD) with radiative transfer, exceeding current computational capabilities (Curd and Narayan, 2019;Dai et al., 2018). The precise mechanisms of energy dissipation and disk formation remain unresolved.

Post-2021, the community has recognized that nozzle shock dissipation rates may be overestimated by 2-3 orders of magnitude (Bonnerot et al., 2016;Guillochon and Ramirez-Ruiz, 2015), though full simulations were lacking. In an October 2025 arXiv preprint, researchers employed adaptive particle refinement in 3D Newtonian smoothed particle hydrodynamics (SPH), boosting resolution by โˆผ 2 16 times, confirming overestimation in Newtonian single-energy frameworks and challenging explanations for rapid disk formation (Hu et al., 2025a). This study aims to test a novel hypothesis: Can GR-induced differential precession significantly enhance dissipation rates?

The hypothesis rests on the physics that finite stellar radius imparts a tidal energy spread, yielding a multi-energy debris bundle in Kerr spacetime. Varying energies lead to distinct semi-major axes and eccentricities, inducing differential relativistic precessions (apsidal and Lense-Thirring nodal). Upon stream reconvergence, this causes non-zero crossing angles, adding longitudinal velocity dissipation. Meanwhile, another October 2025 arXiv paper reported the first Newtonian MHD TDE simulation of a magnetized star, suggesting MHD instabilities might elevate dissipation (Abolmasov et al., 2025). UC Berkeley’s Wenbin Lu has articulated this potential enhancement mechanism in TDE nozzle shocks (Bonnerot and Lu, 2022).

Tidal Disruption Event (TDE) simulations face significant theoretical and computational challenges. In general relativity (GR), calculations in Kerr spacetime are highly complex, requiring careful handling of coordinate singularities, Christoffel symbols, and frame transformations. At the same time, TDE fluid dynamics involves multi-scale, high-resolution simulations where capturing shocks and fine structures without artificial dissipation is extremely demanding. AI systems aiming to perform these tasks must combine rigorous algebraic manipulation with physical intuition and autonomous algorithm selection.

PHYSMASTER must autonomously solve coupled geodesic equations in Kerr spacetime and numerically differentiate genuine physical effects from artifacts related to coordinate singularities or Christoffel symbols. The AI must independently select and optimize the kernel functions, artificial viscosity parameters, and Balsara limiter in the SPH scheme to accurately capture the oblique shock without introducing non-physical dissipation-a task usually requiring months of expert tuning.

For nearly parabolic orbits (๐‘’ โ‰ƒ 1), the 1PN Schwarzschild apsidal advance per orbit is

The differential advance across the debris energy spread is

Including SMBH spin ๐‘Ž โ€ข , the 1.5PN spin-orbit contribution is

with differential |ฮ”(ฮ”๐œ” ๐‘†๐‘‚ )| โ‰ˆ 0.376 โ€ข per orbit. The Lense-Thirring nodal precession is

Tracking two representative debris energies up to the first nozzle crossing near pericenter ๐‘Ÿ ๐‘ โ‰ƒ ๐‘Ÿ ๐‘ก , the accumulated misalignment angle is

The relative velocity in the oblique collision is

leading to a specific dissipation

๐‘ฃ 2 rel โ‰ƒ 2.37 ร— 10 14 J kg -1 , compared with ๐œ€ diss,base = 5.93 ร— 10 13 J kg -1 without differential precession. For a fallback rate ๐‘€ fb = 3.31 ร— 10 22 kg s -1 , ๐ธ PN = ๐‘€ fb ๐œ€ diss = 7.86 ร— 10 36 W (7.86 ร— 10 43 erg s -1 ), |ฮ”๐ธ/๐ธ| = 1.5 ร— 10 -3 about โˆผ 2 order of magnitude enhancement over the baseline value according to Hu et al. (2025b).

Thermalization efficiency is calibrated with lightweight SPH simulations of a 2D oblique ribbon-ribbon collision at ๐œƒ = 20.3 โ€ข . PHYSMASTER employ a pairwise-symmetric SPH scheme with cubic-spline kernel, Balsara viscosity limiter, time-centered leapfrog integration, and adaptive timestep ฮ”๐‘ก = CFL โ„Ž/๐‘ฃ max (CFL= 0.005). Parameters are ๐›พ = 5/3, ๐œŒ 0 = 1.0, ๐‘ข 0 = 0.05, ๐‘ฃ 0 = 1.0, resolution ๐‘‘๐‘ฅ = 0.6, and smoothing length โ„Ž = 1.2 ๐‘‘๐‘ฅ. The run stops at shock saturation when ๐‘‘๐‘ˆ/๐‘‘๐‘ก < 0.05(๐‘‘๐‘ˆ/๐‘‘๐‘ก) peak . The thermalization fraction is again โˆผ 2 order of magnitude larger than the baseline case.

Semi-analytic PN/Kerr estimates predict moderate misalignment (๐œƒ โ‰ˆ 20 โ€ข ) at ๐‘Ÿ ๐‘ โ‰ˆ 47๐‘Ÿ ๐‘” , with oblique shocks thermalizing โˆผ 50% of relative kinetic energy in SPH proxies, consistent with strong shocks dissipating the normal velocity component. Differential precession enhances nozzle dissipation by a factor โˆผ 4 over single-energy baselines, agreeing in order of magnitude across methods.

The traceable workflow yields interpretable results matching initial estimates within 0.5 orders of magnitude. Outputs reproduce expected semi-analytic values under typical parameters, with refined numerical integration and SPH confirming predictions.

Scientifically, the output of P H Y S M A S T E R validates the hypothesized physics: while differential precession opens an extra dissipation channel, enhancements are less than 1 order of magnitude in typical parameter space. This negative result excludes a competitive hypothesis for resolving TDE energy crises, directing future theory.

Looking forward, scaling this framework to a broad parameter sweep over black hole mass, spin, orbital inclination, and stellar structure would transform the study of TDE circularization from isolated case analyses into a systematic, semi-automated exploration of dissipation channels across phase space, enabling rapid falsification or validation of competing disk-formation scenarios.

The autonomous derivation, coding, and analysis of PHYSMASTER is comparable to senior Ph.D in astrophysics, which highlights AI’s potential in frontier exploration.

Semi-leptonic decays of charmed mesons provide a clean environment for studying weak interactions and non-perturbative QCD dynamics. At the quark level, these processes are driven by the charged-current transition

governed by the CKM matrix elements ๐‘‰ ๐‘๐‘‘ and ๐‘‰ ๐‘๐‘  . Because the charged lepton and neutrino interact only weakly, the hadronic and leptonic components of the amplitude factorize. As a result, semi-leptonic channels such as ๐ท โ†’ ๐œ‹โ„“๐œˆ, ๐ท โ†’ ๐พโ„“๐œˆ are ideal probes for extracting heavy-to-light form factors and testing flavor symmetry patterns.

Therefore, constructing the low-energy effective Hamiltonian on hadronic-level, and deriving the amplitudes of these semi-leptonic decays, is essential for understanding symmetry patterns, estimating form-factor relations, and providing controlled predictions relevant for phenomenology and lattice QCD studies.

Although the specific problem studied here has not yet been systematically investigated, closely related studies on the weak decays of singly and doubly charmed baryons have been extensively explored in the literature Lรผ et al. (2016); Shi et al. (2018); Wang et al. (2017), employing similar symmetry-based and effective-Hamiltonian approaches. Consequently, this well-motivated and timely problem provides a suitable and nontrivial testbed for evaluating the capability of P H Y S M A S T E R to autonomously explore and solve open problems in theoretical high-energy physics.

From the perspective of flavor SU(3) symmetry, the charmed mesons ๐ท = (๐‘ q) form an 3 representation, while the light pseudo-scalar mesons reside in the octet 8. The weak current q๐›พ ๐œ‡ (1 -๐›พ 5 )๐‘ transforms as another 3.

These transformation properties constrain the structure of the effective hadronic Hamiltonian and lead to predictive decay amplitudes in terms of the SU(3) representations.

At energies below the ๐‘Š-boson mass, P H Y S M A S T E R gives the proper charged-current interaction, which is captured by the four-fermion operator

with ๐‘ž = ๐‘‘, ๐‘ . Only the hadronic current participates in SU(3) flavor transformations.

โ€ข The initial charmed meson is represented as:

โ€ข The weak current is represented as ๐‘‚ ๐‘– = [0, ๐‘‰ ๐‘๐‘‘ , ๐‘‰ ๐‘๐‘  ] โˆˆ 3.

โ€ข The final pseudo-scalar meson belongs to the octet ๐‘ƒ ๐‘– ๐‘— โˆˆ 8:

Since the octet field is traceless, the most general SU(3)-invariant hadronic Hamiltonian coupling these representations can be written as follows, in which only one independent reduced structure contributes:

where ๐‘Ž is the SU(3) irreducible amplitude containing the non-perturbative QCD dynamics.

For the pseudo-scalar-to-pseudo-scalar transition ๐ท( ๐‘) โ†’ ๐‘ƒ( ๐‘ โ€ฒ ), only the hadronic matrix element of the vector current contributes:

The axial current does not contribute between two pseudo-scalars, โŸจ๐‘ƒ| q๐›พ ๐œ‡ ๐›พ 5 ๐‘|๐ทโŸฉ = 0.

Based on the hadronic Hamiltonian, the full semi-leptonic decay amplitude of charmed meson ๐ท is given by M

From this, the decay amplitudes of all allowed channels can be predicted by PHYSMASTER, as summarized in Table 1.

Decay channels Amplitudes (in unit of ๐‘Ž)

In this work, we demonstrate how P H Y S M A S T E R can automatically construct the semileptonic effective Hamiltonian for charmed meson decays, using both the quark-level weak interaction and the SU(3) flavor structure of the hadronic states.

As a result, P H Y S M A S T E R successfully determines the correct hadronic-level effective Hamiltonian for the semi-leptonic decays of charmed mesons, and provides physical predictions for the amplitudes of all decay channels. PHYSMASTER automates a derivation that traditionally requires substantial expertise in weak interactions and flavor symmetry.

Furthermore, this work shows that P H Y S M A S T E R is capable of constructing effective theoretical models based on physical assumptions and deriving testable physical predictions from them. This highlights PHYSMASTER ’s potential to autonomously explore open problems in physics and to obtain correct physical predictions through self-evaluation.

Looking ahead, extending this autonomous framework to a broader set of heavy-flavor systems and symmetry-breaking patterns would shift such analyses from isolated, hand-crafted derivations to a systematic and reproducible exploration of effective theories, enabling rapid comparison across channels, symmetries, and dynamical assumptions.

Current AI scientist systems are still largely optimized for text-centric domains, with limited ability to manipulate rigorous mathematical formalisms and conduct robust numerical computation, while also lacking the adaptability to long-horizon workflows.

More specifically, in physics, a fundamental yet abstract, inherently complex, and intellectually demanding domain, research often requires both intensive analytical reasoning and code-based numerical computation, a dual capability largely absent in previous agents.

To meet the demands of physics research, we propose PHYSMASTER, an LLM-based agent that can operate as an autonomous theoretical and computational physicist, aiming to liberate talented physicists from tedious engineering tasks and repetitive computations, accelerate the emergence of inspiration and its validation. P H Y S M A S T E R integrates theoretical reasoning and numerical computation techniques. Meanwhile, the exploration methodology of Monte Carlo Tree Search (MCTS) is integrated with hierarchical agent collaboration to balance exploration with efficiency, and adapt to Ultra-longhorizon tasks.

PHYSMASTER is further equipped with LANDAU, the evolving multi-layered knowledge base constructed to meet the information demands of real scientific research scenarios, enabling more efficient and robust scientific investigations, with a strong emphasis on functionality. Information extracted from retrieved pares, reusable validated reasoning paths, and reliable priors together constitute a representative paradigm for an AI scientist’s knowledge infrastructure. In contrast, other works on scientific knowledge base place greater emphasis on the systematic organization and structural completeness of knowledge, achieving superior coverage but differing in design philosophy from the current LANDAU framework. These approaches are complementary in nature, and their integration represents a promising direction for future exploration.

Across scales ranging from the cosmos to fundamental particles, and from elementary interaction laws to the emergence of diverse quantum phases of matter, our PHYSMASTER has demonstrated reliable capabilities and decent scientific autonomy the validation cases:

โ€ข Acceleration: In two typical cases, PHYSMASTER compresses the labor-intensive engineering parts of genuine physics research into less than 6 hours, which typically requires 1-3 months. We expect PHYSMASTER to eliminate the barrier to mastering trivial techniques and liberate the talented researchers from tedious, repetitive engineering tasks.

While PHYSMASTER achieves remarkable autonomy in theoretical and computational physics, it inherits certain limitations from its underlying LLMs. The agent’s performance is bounded by the LLM’s knowledge cutoff and reasoning depth, particularly in highly abstract areas such as string theory or formal quantum field theory, where pure symbolic manipulations remain challenging compared to code-intensive tasks. Additionally, residual hallucinations in the LLM-based critic can occur during evaluations of novel concepts, potentially introducing biases or overlooking viable paths in open-ended explorations. The reliance on retrieved knowledge also assumes high-quality literature access, which may vary in emerging or niche subfields.

To overcome these challenges and propel the development of reliable autonomous scientific agents, upcoming efforts will emphasize innovative strategies. We intend to embed sophisticated debugging and performance analysis utilities, allowing the system to iteratively refine its computational implementations for optimal resource usage and accuracy. Moreover, introducing an enhanced error-checking framework with automated cross-verification against established benchmarks and anomaly detection algorithms will fortify the system’s dependability, reducing the impact of inherent model inaccuracies. To broaden its analytical scope, we aim to fuse P H Y S M A S T E R with dedicated theorem-proving tools and symbolic computation platforms, thereby augmenting its proficiency in intricate theoretical proofs and derivations while leveraging its current expertise in empirical modeling. In the broader perspective, our vision includes transforming the architecture into an expansive collaborative network of agents adept at tackling multifaceted, cross-domain research endeavors, incorporating mechanisms for ongoing knowledge updates from external data sources and seamless integration with experimental apparatuses. These advancements are geared toward establishing comprehensive AI-orchestrated research pipelines, heralding a new era in AI4Science and expediting progress in core scientific fields.

Looking ahead, a central vision of our work is to transform P H Y S M A S T E R into a practical and powerful AI scientist product for use by physicists, which will be regarded as our highestpriority objective. At the same time, we place equal attention on extending the boundaries of system capabilities-particularly through the optimization of the LANDAU and its adaptation to a broader range of scientific computing scenarios-so that the system can be rigorously evaluated in increasingly complex scientific settings.

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut