Weighted Sum-of-Trees Model for Clustered Data

Weighted Sum-of-Trees Model for Clustered Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Clustered data, which arise when observations are nested within groups, are incredibly common in clinical, education, and social science research. Traditionally, a linear mixed model, which includes random effects to account for within-group correlation, would be used to model the observed data and make new predictions on unseen data. Some work has been done to extend the mixed model approach beyond linear regression into more complex and non-parametric models, such as decision trees and random forests. However, existing methods are limited to using the global fixed effects for prediction on data from out-of-sample groups, effectively assuming that all clusters share a common outcome model. We propose a lightweight sum-of-trees model in which we learn a decision tree for each sample group. We combine the predictions from these trees using weights so that out-of-sample group predictions are more closely aligned with the most similar groups in the training data. This strategy also allows for inference on the similarity across groups in the outcome prediction model, as the unique tree structures and variable importances for each group can be directly compared. We show our model outperforms traditional decision trees and random forests in a variety of simulation settings. Finally, we showcase our method on real-world data from the sarcoma cohort of The Cancer Genome Atlas, where patient samples are grouped by sarcoma subtype.


💡 Research Summary

The paper introduces a novel “Weighted Sum‑of‑Trees” (WSoT) framework designed to improve prediction for clustered (grouped) data while also providing insight into inter‑group similarity. Traditional approaches for such data—linear mixed‑effects models (GLMMs) and recent tree‑based mixed‑effects extensions (e.g., MER‑T, mixedBART)—rely on a linear additive decomposition of fixed and random effects. Consequently, they can only use fixed‑effect components when making predictions for groups that were not present during training, limiting flexibility and often under‑performing on complex, nonlinear relationships.

The proposed method proceeds in two stages. In the first stage a multi‑class classifier is trained on the full training set using the covariates X and the known group labels C. For each test observation the classifier outputs a probability vector w = (w₁,…,w_J), where w_j is the estimated probability that the observation belongs to training group j. These probabilities are interpreted as similarity weights in the feature space. In the second stage a separate decision tree (or random forest) is fitted for each training group j, yielding group‑specific predictors T_j(·). The final prediction for a test point X_t is a weighted linear combination: ŷ_t = Σ_{j=1}^J w_j · T_j(X_t). Thus, an unseen group is represented as a mixture of the known groups, with the mixture coefficients derived from the first‑stage classifier.

Key technical contributions include: (1) learning group‑specific non‑linear models that capture unique interactions and complex patterns within each cluster, bypassing the linear random‑effect assumption; (2) using classifier‑derived probabilities as data‑driven similarity weights, enabling out‑of‑sample group prediction without requiring explicit random‑effect estimates; (3) providing a natural metric of inter‑group similarity that can be examined through tree structures and variable‑importance profiles, facilitating substantive interpretation of how groups differ.

The authors evaluate the approach through extensive simulations and a real‑world application. Three simulation scenarios are considered: (i) nonlinear fixed effects with independent linear random effects, (ii) nonlinear fixed effects with correlated random effects (generated via an inverse‑Wishart covariance), and (iii) a setting where each group follows a different combination of basis functions (mimicking heterogeneous underlying mechanisms). Across varying numbers of groups (K), observations per group (n), and random‑effect variance levels (σ²_α), the WSoT method consistently achieves lower mean‑squared error (MSE) than standard decision trees, random forests, and GLMMs. The advantage is especially pronounced when group‑level correlation structures are strong, highlighting the benefit of the similarity‑based weighting.

For the real data analysis, the method is applied to the sarcoma cohort of The Cancer Genome Atlas (TCGA). Patients are grouped by sarcoma subtype, and the outcome of interest is the abundance of tumor‑infiltrating T‑cells. The authors train the model on a subset of subtypes and test on subtypes that were completely omitted from training. The weighted sum‑of‑trees approach outperforms a plain random forest and a GLMM in predicting T‑cell abundance for these unseen subtypes. Moreover, by inspecting the individual trees and their variable‑importance rankings, the authors uncover subtype‑specific predictive patterns, offering biological insight into how immune infiltration varies across sarcoma histologies.

Limitations are acknowledged. The quality of the similarity weights depends on the first‑stage classifier; mis‑classification can propagate errors into the final prediction. Additionally, fitting a separate tree for each group can become computationally intensive when the number of groups is very large. The authors suggest future work on Bayesian weight estimation and on sharing tree structures across groups to mitigate computational costs.

In summary, the Weighted Sum‑of‑Trees model provides a flexible, interpretable, and empirically superior alternative to existing mixed‑effects and tree‑based methods for clustered data. It simultaneously delivers accurate out‑of‑sample predictions for new groups and a principled way to assess and interpret inter‑group heterogeneity, making it a valuable tool for biomedical, educational, and social‑science applications where hierarchical data are the norm.


Comments & Academic Discussion

Loading comments...

Leave a Comment