FIRE: Multi-fidelity Regression with Distribution-conditioned In-context Learning using Tabular Foundation Models
Multi-fidelity (MF) regression often operates in regimes of extreme data imbalance, where the commonly-used Gaussian-process (GP) surrogates struggle with cubic scaling costs and overfit to sparse high-fidelity observations, limiting efficiency and generalization in real-world applications. We introduce FIRE, a training-free MF framework that couples tabular foundation models (TFMs) to perform zero-shot in-context Bayesian inference via a high-fidelity correction model conditioned on the low-fidelity model’s posterior predictive distributions. This cross-fidelity information transfer via distributional summaries captures heteroscedastic errors, enabling robust residual learning without model retraining. Across 31 benchmark problems spanning synthetic and real-world tasks (e.g., DrivAerNet, LCBench), FIRE delivers a stronger performance-time trade-off than seven state-of-the-art GP-based or deep learning MF regression methods, ranking highest in accuracy and uncertainty quantification with runtime advantages. Limitations include context window constraints and dependence on the quality of the pre-trained TFM’s.
💡 Research Summary
FIRE (Fidelity‑aware In‑context REgression) presents a novel, training‑free framework for multi‑fidelity (MF) regression that leverages pre‑trained tabular foundation models (TFMs) to overcome the scalability and statistical challenges of traditional Gaussian‑process (GP)‑based approaches. The authors begin by highlighting two core difficulties in MF settings: (1) extreme data imbalance, where high‑fidelity (HF) observations constitute a tiny fraction of the total dataset, and (2) non‑nested input locations, which break the assumptions of many GP autoregressive couplings. Conventional GP surrogates suffer from cubic computational cost, over‑fitting on scarce HF data, and an inability to capture heteroscedastic discrepancies when they condition HF corrections only on LF mean predictions.
FIRE addresses these issues in three stages. First, all available low‑fidelity (LF) sources are aggregated into a single dataset D_LF. A frozen TFM—specifically TabPFN v2.5, a transformer trained on millions of synthetic causal regression tasks—is used in an in‑context learning (ICL) fashion to produce a posterior predictive distribution for any HF input x_HF conditioned on D_LF. The distribution is summarized by its mean μ_θ(x), variance σ²_θ(x), and a set of quantiles Q = {0.1, 0.2, …, 0.9}. These summaries encode both point‑wise predictions and epistemic uncertainty, thereby exposing input‑dependent heteroscedasticity.
Second, the HF residual r = y_HF – μ_θ(x_HF) is modeled. An augmented feature vector z_aug =
Comments & Academic Discussion
Loading comments...
Leave a Comment