KOINEU

February 10, 2026

Reading time: 11 minute

...

📝 Original Info

Title:
ArXiv ID: 2512.22398
Date:
Authors: Unknown

📝 Abstract

Foundation models for knowledge graphs (KGs) achieve strong cohort-level performance in link prediction, yet fail to capture individual user preferences; a key disconnect between general relational reasoning and personalized ranking. We propose GatedBias, a lightweight inference-time personalization framework that adapts frozen KG embeddings to individual user contexts without retraining or compromising global accuracy. Our approach introduces structure-gated adaptation: profile-specific features combine with graph-derived binary gates to produce interpretable, per-entity biases, requiring only ∼300 trainable parameters. We evaluate GatedBias on two benchmark datasets (Amazon-Book and Last-FM), demonstrating statistically significant improvements in alignment metrics while preserving cohort performance. Counterfactual perturbation experiments validate causal responsiveness; entities benefiting from specific preference signals show 6-30× greater rank improvements when those signals are boosted. These results show that personalized adaptation of foundation models can be both parameter-efficient and causally verifiable, bridging general knowledge representations with individual user needs.

📄 Full Content

Foundation models for knowledge graphs (KGs) and relational data achieve remarkable cohort-level accuracy in link prediction. Yet, practical deployments increasingly demand profile-conditioned behavior: the same candidate entity should rank differently for different users, contexts, or patients. Existing approaches typically fine-tune the backbone model or attach trainable adapters to handle each profile, but such strategies introduce substantial overhead (such as new gradient updates, hyperparameter tuning, and retraining cost) while risking degradation of the original cohort performance.

We propose a lightweight, inference-time personalization framework for frozen KG embeddings that adds interpretable, structure-aware biases instead of modifying the backbone. The key idea is gate personalization through graph structure: entity attributes extracted from the training KG serve as binary gates, while profile features act as conditioning signals over these gates. This separation allows us 1. We introduce a post-hoc personalization mechanism that operates entirely at inference time on frozen KG embeddings, requiring no backbone updates. 2. We propose structure-gated adaptation, an interpretable way to condition candidate rankings on profilespecific features via graph-derived gates.

Alignment@k and Counterfactual Responsiveness to quantify alignment and causal responsiveness of personalized predictions.

Our findings suggest that simple, structure-aware bias adaptation can serve as a general plug-in for personalized ranking in any pretrained KG or relational foundation model, bridging the gap between cohort-level embeddings and individualized predictions.

Joint KG-User Learning. RippleNet (Wang et al. 2018), KPRN (Wang et al. 2019b), and KGAT (Wang et al. 2019a) jointly embed users and entities, propagating preferences over multi-hop neighborhoods. Effective but deploymentheavy: they retrain per population and alter the embedding space. Our method is post-hoc and keeps the backbone frozen.

Parameter-Efficient KGE Adaptation. Adapter/LoRAstyle methods (e.g., IncLoRA, FastKGE (Liu et al. 2024)) insert trainable modules to specialize embeddings, yet still require backprop through the backbone and tuning. We train small, independent bias heads without touching backbone gradients.

Calibration & Post-hoc Adjustment. Platt scaling and isotonic regression (Tabacof and Costabello 2019;Safavi, Koutra, and Meij 2020;Nascimento et al. 2024) Our goal is profile-conditioned re-ranking: for a given profile p, the final score is

where b p (t) is a tail-specific bias that depends on both the structure of t in the training graph and the profile’s features. Because adding a constant bias to all tails does not change their rank, b p (•) must vary with t to meaningfully alter rankings. Our method makes this dependence explicit and interpretable through structure-gated personalization.

We partition relations into K semantic groups {R k } K k=1 (typically K = 2 for interpretability). For each group k, we define an attribute universe

Each candidate tail t is associated with a binary gate vector Each profile p provides corresponding feature vectors f k ∈ R |U k | that quantify preferences or relevance weights over the same attribute universes. The structure-gated bias is then:

where ⊙ denotes elementwise product, w k are small learnable weights, and α k are scalar gates. This ensures that personalization affects only attributes the entity actually has, providing both interpretability and rank variance.

The profile feature vectors f k encode how strongly a user values each attribute in U k . For interaction datasets (Amazon-Book, Last-FM), we construct f k via a three-stage process:

Individual preference extraction: For each user u, compute attribute frequencies within their interaction history I u :
Population aggregation: Aggregate across users:

w(a j ), 0, τ ) with scaling α and cap τ .

This yields dense, semantically meaningful profiles aligned to each relation group.

We optimize only {w k , α k } per profile while keeping θ frozen. For each batch B of positive and negative triples:

where λ 1 and λ 2 control sparsity and scale stability. Only a few hundred parameters are trained per profile (≪ 0.1% of backbone), making personalization ad-hoc, lightweight and reproducible. See Appendix for implementation details, hyperparameters, and training configuration.

We use the Amazon-Book (Ni, Li, and Profile and gate construction. For both datasets, we partition relations into two semantic groups and construct binary gate vectors g A (t), g B (t) from training triples only (no test leakage), indicating which group-specific attributes connect to each tail entity t. User preference profiles f A , f B are derived from interaction logs: we aggregate user-level preferences (e.g., tag/genre affinities, play-count patterns) into per-attribute scores, propagate these to attributes connected to preferred items in training, and obtain feature vectors by summing contributions over matched attributes. Both feature

, on CPU in one pass.

We report both standard ranking metrics and personalization metrics. Our objective is to ensure that standard ranking performance remains stable with the introduction of personalization, while the personalization metrics capture individuallevel improvements.

Standard Ranking Metrics. We follow the standard filtered evaluation protocol used in link prediction. For each query (h, r, ?), candidate tails t already seen in training or validation are filtered out when ranking the true tail. We report Mean Reciprocal Rank (MRR) (Voorhees and Tice 2000), Hits@k (Bordes et al. 2013), and Normalized Discounted Cumulative Gain (NDCG@k) (Järvelin and Kekäläinen 2002) (definitions provided in Appendix). These metrics evaluate overall link prediction performance and ensure cohort-level quality is preserved after personalization.

Personalization Metrics. To quantify profile-conditioned effects, we introduce and compute two complementary measures:

Alignment@k. Measures whether top-k predictions favor entities whose attributes match the profile’s preferences. For each entity t, we compute group contributions c

We define the aligned set as entities receiving strong, directional personalization:

where m(t) = |c A (t)-c B (t)| captures the strength of differential influence between the two feature groups, and threshold τ is the P -th percentile of margins among entities with at least one positive contribution (P ∈ {60, 70, 80}; higher P = stricter selection). Intuitively, aligned entities are those where (1) personalization provides a net positive push from at least one group, and (2) the two groups have strongly divergent effects, indicating clear directional preference.

Counterfactual Responsiveness (CR). To validate that personalization responds causally to features, we test whether entities currently benefiting from a feature group improve more when that group is boosted. For each entity t, let c A (t) = α A ⟨w A , g A (t) ⊙ f A ⟩ denote the contribution of group A to its bias (analogously for c B (t), with total bias b p (t) = c A (t) + c B (t)). Define A + = {t : c A (t) > 0} as entities whose scores are currently increased by group A features.

For each test query (h, r, t * ), we measure the rank change of the ground-truth tail t * and categorize the query based on whether t * ∈ A + . We apply a targeted perturbation by scaling group A features (f A ← (1 + ϵ)f A , e.g., ϵ = 0.1), recompute all ranks, and measure:

where ∆rank is the change in rank for the ground-truth tail. Since lower rank numbers are better (rank 1 is best), negative CR indicates correct responsiveness: test queries whose true answers are in A + show greater rank improvements (move to lower ranks) than others when we boost the features that already favor them.

We compare our structure-gated method to: Frozen Backbone. The pretrained DistMult scorer with no adaptation: s(h, r, t) = s θ (h, r, t). This establishes cohort-level performance without personalization.

PatientNode. A profile-agnostic ablation learns fixed entity biases with a lightweight MLP, b(t) = MLP ϕ (E t ) (same loss), ignoring f A , f B and g A , g B -thus applying the same boost to all users (similar to always ranking pizza higher because it is a popular dish, regardless of whether the user is a meat-lover or vegan).

Dataset MRR H@1 H@3 H@10 NDCG@10 DistMult (Base) Amazon-Book 0.649±0.002 0.622±0.001 0.661±0.002 0.675±0.002 0.650±0.001 Last-FM 0.831±0.002 0.760±0.002 0.888±0.001 0.949±0.001 0.832±0.001 + PatientNode Amazon-Book 0.653±0.001 0.624±0.002 0.662±0.002 0.690±0.001 0.656±0.001 Last-FM 0.828±0.002 0.756±0.002 0.884±0.001 0.947±0.001 0.829±0.001 + GatedBias Amazon-Book 0.649±0.001 0.622±0.002 0.661±0.001 0.675±0.002 0.650±0.001 Last-FM 0.831±0.002 0.759±0.002 0.888±0.001 0.950±0.001 0.831±0.001

Parameter counts: DistMult (400K), +PatientNode (400K+800), +GatedBias (400K+292)

Cohort Performance. Table 2 evaluates whether adding personalization degrades the backbone’s link prediction quality. GatedBias achieves link-prediction performance preservation on both datasets, while our ablation PatientNode shows inconsistent effects (+0.004 on Amazon-Book but -0.003 on Last-FM). Our approach requires only 292 additional parameters versus PatientNode’s 800; establishing parameter-efficient personalization with no cohortperformance trade-off. Personalization Signal and Causal Validation. Table 3 shows that our mechanism successfully reorders candidates toward profile-aligned entities and validates the causal path-way. Amazon-Book demonstrates strong, validated personalization: alignment increases +14% relative (+0.9pp absolute, from 6.4% to 7.3%, p = 0.021). Counterfactual perturbations confirm causality by boosting preference features improves in-group entities by CR = -6.11 rank positions (lower is better; more negative indicates stronger improvement), with 28.5% of queries showing measurable improvement. On Amazon-Book, preference gates (GENRE, THEME, etc.) dominate metadata gates (CR = -6.11 vs. -0.97; 28.5% vs. 16.0% improved).

Last-FM shows weaker but significant effects: +1.3% relative alignment lift (+1.1pp, p = 0.0025) and minimal causal responsiveness. Using the same convention (lower is better), metadata gates outperform preference gates on Last-FM: CR = -0.73 with 4% improved versus preference CR = -0.20 with 3% improved, inverting the Amazon-Book pattern. The stark CR gap between datasets (about 6.11 vs. 0.20 in magnitude) highlights domain constraints. Last-FM’s ∼ 83% baseline alignment indicates the frozen backbone already captures user-item affinity well which leaves minimal headroom for personalization. Incremental gains (e.g., from 83% to 84%) face ceiling effects unless signals are extremely precise. Moreover, music choices are inherently noisy (mood, context, serendipity), so collaborative signals already capture much of the actionable variance; explicit preference features add little. By contrast, product catalogs are sparser and more structured, so semantic attributes (e.g., GENRE, INFLUENCED BY) provide clearer leverage for personalization.

Placebo Validation. Table 4 provides the critical validity check: if our alignment metric genuinely measures featuregrounded personalization, shuffling profile features while freezing the alignment mask should collapse effects to near zero. With randomly shuffled features, alignment gains drop dramatically. Real features produce 21.4× stronger effects than noise on Amazon-Book and 8.6× on Last-FM.

Our method achieves statistically significant personalization effects while maintaining cohort performance. Effect sizes are modest in absolute terms, but placebo validation confirms they represent genuine feature-driven signal rather than artifacts, with real features yielding order-of-magnitude stronger effects than random noise across both datasets. Re-sults reveal clear domain dependencies that inform deployment strategies. Future work should explore better feature engineering and extensions to head/relation conditioning to boost effect sizes.

Distinctions. Frozen embeddings; no backbone gradients; per-profile (not global) score shifts.

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT(Wang et al. 2019a). Details are in Table1.Amazon-Book (content preference vs. metadata). Entities represent books and associated attributes. We define two relation groups: content preference (GENRE, THEME, SUB-JECT, SERIES) and metadata (AUTHOR, PUBLISHER, YEAR, LANGUAGE).Last-FM (musical preference vs. metadata). Entities correspond to artists, tracks, and related descriptors. We define two relation groups: musical preference (GENRE, TAG, STYLE, MOOD, TEMPO, SIMILAR, INFLUENCED BY, RELATED, SOUND) and metadata (LABEL, YEAR, COUN-TRY, LANGUAGE, FORMAT, RELEASE DATE, DURATION, ALBUM TYPE).

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT(Wang et al. 2019a). Details are in Table1.

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT(Wang et al. 2019a). Details are in Table1

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT(Wang et al. 2019a). Details are in Table

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT(Wang et al. 2019a)

McAuley 2019) and Last-FM (Het 2011) datasets as processed by KGAT

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Start searching

No results found