NeurIDA: Dynamic Modeling for Effective In-Database Analytics
Relational Database Management Systems (RDBMS) manage complex, interrelated data and support a broad spectrum of analytical tasks. With the growing demand for predictive analytics, the deep integration of machine learning (ML) into RDBMS has become critical. However, a fundamental challenge hinders this evolution: conventional ML models are static and task-specific, whereas RDBMS environments are dynamic and must support diverse analytical queries. Each analytical task entails constructing a bespoke pipeline from scratch, which incurs significant development overhead and hence limits wide adoption of ML in analytics. We present NeurIDA, an autonomous end-to-end system for in-database analytics that dynamically “tweaks” the best available base model to better serve a given analytical task. In particular, we propose a novel paradigm of dynamic in-database modeling to pre-train a composable base model architecture over the relational data. Upon receiving a task, NeurIDA formulates the task and data profile to dynamically select and configure relevant components from the pool of base models and shared model components for prediction. For friendly user experience, NeurIDA supports natural language queries; it interprets user intent to construct structured task profiles, and generates analytical reports with dedicated LLM agents. By design, NeurIDA enables ease-of-use and yet effective and efficient in-database AI analytics. Extensive experiment study shows that NeurIDA consistently delivers up to 12% improvement in AUC-ROC and 25% relative reduction in MAE across ten tasks on five real-world datasets. The source code is available at https://github.com/Zrealshadow/NeurIDA
💡 Research Summary
NeurIDA presents an autonomous, end-to-end system designed to bridge the fundamental gap between static machine learning models and the dynamic, query-rich environment of Relational Database Management Systems (RDBMS). The core challenge it addresses is the inherent inflexibility of conventional ML models, which are trained for specific tasks on static data snapshots, making them ill-suited for the diverse and evolving analytical queries native to databases. This mismatch forces analysts to build custom ML pipelines from scratch for each new task, incurring significant overhead and hindering the widespread adoption of ML in database analytics.
The system operates through a streamlined workflow orchestrated by four key components. First, the Query Intent Analyzer serves as the unified user interface. It accepts a Natural Language Query (NLQ) and, leveraging LLM-based agents guided by the database catalog, automatically parses the user’s intent. This process generates two structured outputs: a Task Profile (specifying the prediction target, task type like classification/regression) and a Data Profile (identifying the target table, related tables, join conditions, and filter predicates required for the task).
Second, the Conditional Model Dispatcher optimizes efficiency. Given the profiles, it selects the most suitable pre-trained base model from a pool using Zero-Cost Proxy techniques from Neural Architecture Search. Crucially, it then decides whether to deploy this base model directly or to invoke advanced augmentation. This decision is made by comparing the model’s estimated performance on the current task against its historical performance (Exponential Moving Average). Augmentation is triggered only when a performance gap is detected, ensuring computational resources are used judiciously.
Third, the heart of the system is the Dynamic In-database Modeling Engine (DIME). It retrieves the focused data slice from the database as defined by the Data Profile. If augmentation is invoked, DIME executes a novel dynamic modeling paradigm. Instead of using a fixed model, DIME dynamically constructs a bespoke model at query time based on a Composable Base Model Architecture. This architecture comprises a pool of base models and shared model components (e.g., embedding layers, relation modules). The dynamic construction is a three-stage process conditioned on the task: (1) Base Table Embedding generates tuple-level embeddings. (2) Dynamic Relation Modeling builds a task-specific relational graph to enrich embeddings with inter-table structural information. (3) Dynamic Model Fusion integrates these embeddings into a unified representation for the final prediction. This allows the system to adaptively tailor the model’s structure and computations to the specific semantics of each analytical query.
Finally, the Analytical Report Synthesizer interprets the prediction results. Using dedicated LLM agents, it synthesizes the numerical outputs, key drivers, and contextual insights into a comprehensive, human-readable analytical report delivered back to the user.
The paper validates NeurIDA through extensive experiments on five real-world relational datasets across ten diverse analytical tasks (e.g., predicting ICU admission, customer churn). The results demonstrate that NeurIDA’s dynamic modeling approach consistently enhances performance over using the selected base models alone, achieving up to a 12% improvement in AUC-ROC for classification and a 10-25% relative reduction in Mean Absolute Error (MAE) for regression tasks. By introducing dynamic in-database modeling, NeurIDA effectively reconciles the rigidity of ML with the dynamism of RDBMS, enabling efficient, accurate, and user-friendly AI analytics directly within the database system without manual pipeline construction.
Comments & Academic Discussion
Loading comments...
Leave a Comment