Talking with the Latents -- how to convert your LLM into an astronomer
Recent advances in Large Language Models (LLMs) offer unique opportunities for scientific tasks, yet their ability to reason over complex numerical data remains largely unexplored. We propose a simple mechanism to introduce domain-specific physical knowledge into LLMs by fusing pre-trained latent physical features with a pre-trained language model. Our method employs a teacher-student knowledge distillation framework where a large LLM (teacher) generates synthetic question-answer supervision to transfer physical reasoning to a smaller LLM (student). The student is conditioned on latent physical features and trained via a lightweight adapter and Low-Rank Adaptation (LoRA). We demonstrate that this approach, applied to models with 1B, 8B, and 32B parameters, enables effective reasoning over real scientific data. Our models substantially outperform strong baselines, such as Gemini 3 Pro, across multiple downstream tasks without task-specific fine-tuning. We show that the model combines latent information with general physical understanding to predict complex properties and can be “steered” by identifying physically meaningful directions in the latent space. This allows for explicit physical manipulation and natural language interpretation of latent structures. While our experiments focus on astrophysics, the framework is domain-agnostic and applicable to various scientific fields. Our main contribution is a general framework for using LLMs as interpretable interfaces to scientific latent spaces, enabling a single model to perform diverse tasks through natural language guidance. This work marks a step toward developing scientifically capable and useful LLMs.
💡 Research Summary
The paper introduces a general framework for turning large language models (LLMs) into scientifically capable agents by fusing them with domain‑specific latent representations. The authors focus on astrophysics, using two pre‑trained stellar spectra encoders – a Conformer‑based Spectral Conformer (SC) and a Vision‑Transformer‑based Spectral ViT (SViT) – to extract 2048‑dimensional latent vectors from millions of LAMOST spectra. These vectors are projected into the token embedding space of a pre‑trained LLM via a lightweight Adapter Network (AN), which creates eight “latent tokens” that are prepended to the textual prompt. This early‑fusion design allows the language model to process both natural language and high‑dimensional scientific data in a single forward pass.
Training proceeds in two stages. First, a teacher‑student knowledge‑distillation pipeline generates synthetic question‑answer (Q‑A) pairs. A large LLM (the “teacher”, e.g., Gemini 3 Pro) is prompted with stellar parameters to produce concise, expert‑style descriptions, which become the target answers. The student model – a smaller LLM (Llama‑1B, Llama‑8B, or Qwen‑32B) – receives the latent tokens and the textual prompt and learns to reproduce the teacher’s answers. In the second stage, the Adapter Network is frozen and the LLM is fine‑tuned using Low‑Rank Adaptation (LoRA), enabling efficient parameter updates while preserving the pre‑trained knowledge.
The authors evaluate three research questions: (1) extraction of local physical concepts (effective temperature T_eff, surface gravity log g, metallicity
Comments & Academic Discussion
Loading comments...
Leave a Comment