Grables: Tabular Learning Beyond Independent Rows

Grables: Tabular Learning Beyond Independent Rows
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Tabular learning is still dominated by row-wise predictors that score each row independently, which fits i.i.d. benchmarks but fails on transactional, temporal, and relational tables where labels depend on other rows. We show that row-wise prediction rules out natural targets driven by global counts, overlaps, and relational patterns. To make “using structure” precise across architectures, we introduce grables: a modular interface that separates how a table is lifted to a graph (constructor) from how predictions are computed on that graph (node predictor), pinpointing where expressive power comes from. Experiments on synthetic tasks, transaction data, and a RelBench clinical-trials dataset confirm the predicted separations: message passing captures inter-row dependencies that row-local models miss, and hybrid approaches that explicitly extract inter-row structure and feed it to strong tabular learners yield consistent gains.


💡 Research Summary

The paper addresses a fundamental limitation of modern tabular machine‑learning methods: most state‑of‑the‑art models, such as gradient‑boosted decision trees, random forests, and even recent deep tabular networks, treat each row of a table as an independent sample. This “row‑local” assumption aligns well with i.i.d. benchmark datasets but fails on many real‑world tables where the target for a row depends on other rows—e.g., transactional logs, temporal records, or relational clinical‑trial datasets where global counts, overlaps, or shared values matter.

To make the notion of “using structure” precise, the authors introduce grables, a modular abstraction that separates (a) graph construction (the constructor γ) from (b) prediction on the graph (the node predictor npred). A constructor maps a schema C and a table T to a graph G = (V, E₁…Eₘ, ρ) that always contains a distinguished node vᵣ for each row r∈T. The constructor may also add auxiliary nodes (value nodes, column identifiers, join keys) and typed edges that encode relationships among rows. The node predictor then computes a label for each node; the paper focuses on k‑layer Message‑Passing Neural Networks (MPNNs) as the predictor class.

The authors formalize Grabular expressibility: a row‑level predictor rp is (Γ, P)‑expressible if there exists a constructor γ∈Γ and a node predictor npred∈P such that for every table and every row r, rp(C,T)(r) = npred(Gγ(C,T))(vᵣ). This definition lets one compare different families of constructors (Γ) and predictor classes (P) by set inclusion of the resulting expressible row functions.

Two concrete constructors are examined.

  1. Trivial constructor (γ_triv) creates an edgeless graph where each row node stores its own features. Because there are no edges, any MPNN reduces to a per‑row feed‑forward network; the prediction depends only on the row’s own columns. The authors prove (Proposition 3.1) that on γ_triv, k‑layer MPNNs are exactly row‑local, matching the behavior of XGBoost, LightGBM, or MLPs.
  2. Incidence constructor (γ_inc) builds a bipartite graph of row nodes and value nodes, connecting a row node to a value node whenever the row contains that column‑value pair. This construction makes “shared value” relationships explicit, allowing information to flow between rows through common value nodes.

Four synthetic tasks illustrate the expressive gap:

  • UNIQUE – a value appears only in the target row.
  • COUNT(N) – at least N other rows share a designated value with the target row.
  • DOUBLE – another row shares a value with the target row and also a second value with a third row.
  • DIAMOND – two distinct shared values create a diamond‑shaped subgraph linking two rows via two intermediate rows.

On the incidence graph, a 1‑layer MPNN can implement all four tasks because they correspond to simple graded‑modal logic (GML) formulas of depth ≤1. The authors connect this to existing theory (Barceló et al., 2020) showing that k‑layer MPNNs are exactly the class of node predicates definable in GML(≤k). Thus, without edges (γ_triv) the GML formulas collapse to Boolean combinations of unary predicates on the row itself, while with γ_inc the graded modalities capture counts of shared value nodes, enabling true relational reasoning.

Empirical validation proceeds on three fronts.
Synthetic data: For each of the four tasks, models using γ_triv+MLP achieve near‑random accuracy, whereas γ_inc+1‑layer MPNN reaches >95 % accuracy. A hybrid pipeline called Neighbourhood Feature Aggregation (NFA) first aggregates 1‑hop neighbor features on the incidence graph, then feeds the resulting enriched row vectors to a strong tabular learner (e.g., LightGBM). NFA matches the MPNN performance while retaining the efficiency of traditional tabular models.

Transactional data: Predicting whether a customer will purchase a product depends on how many other customers have bought the same item. γ_inc+MPNN attains an AUC of 0.84, while a row‑local XGBoost model stalls at 0.71.

RelBench clinical‑trial dataset: Targets involve overlapping patient cohorts and shared biomarkers. Row‑local baselines miss many signals, achieving an F1 of ~0.62. γ_inc+2‑layer MPNN improves to 0.78, and the NFA+LightGBM hybrid reaches 0.81, outperforming even transformer‑style tabular foundation models (e.g., TabPFN) that rely on implicit attention graphs.

The study draws several key conclusions. First, the choice of graph constructor is the decisive factor: the same MPNN architecture can be either row‑local or relationally expressive depending on whether the underlying graph encodes inter‑row edges. Second, “using structure” is not merely a matter of adding capacity; it is about providing the model with the right relational view of the data. Third, graph‑based and tabular approaches are complementary: extracting relational features via a constructor and then applying a powerful tabular learner yields consistent gains without sacrificing interpretability or scalability.

Future work suggested includes automated learning of constructors (meta‑learning the best graph view), extending the framework to multi‑table relational schemas, handling dynamic temporal tables, and developing richer logical characterizations of what can be expressed with deeper message‑passing or attention‑based predictors.

In summary, the paper introduces a unifying formalism—grables—that clarifies the expressive limits of row‑local tabular models and demonstrates, both theoretically and empirically, that explicit inter‑row graph structures enable genuine relational reasoning, leading to measurable performance improvements on tasks where targets depend on global counts, overlaps, or shared witnesses. This work paves the way for principled hybrid tabular‑graph learning pipelines in real‑world applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment