EXACT: Explicit Attribute-Guided Decoding-Time Personalization

EXACT: Explicit Attribute-Guided Decoding-Time Personalization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Achieving personalized alignment requires adapting large language models to each user’s evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.


💡 Research Summary

The paper introduces EXACT (Explicit Attribute‑Guided Decoding‑Time Personalization), a novel framework for personalizing large language models (LLMs) at inference time without updating model parameters. Existing decoding‑time personalization methods fall into two categories: prompt‑based approaches that embed user preferences implicitly, and logit‑steering methods that adjust token probabilities but assume a single static user profile. Both overlook the fact that individual users often exhibit “contextual preference shifts,” i.e., their preferred style or tone changes dramatically across different prompts.

EXACT addresses this by defining a fixed library of K interpretable attributes (e.g., Formal, Concise, Direct, Empathetic, Analytic, Code, Principled, Utilitarian) grouped into four coarse categories (style, tone, expertise, values). For each (prompt, preferred response, dispreferred response) triple, the method seeks a subset of attributes A that maximizes the likelihood of the observed preference under the Bradley‑Terry model combined with Direct Preference Optimization (DPO). Concretely, it appends an “Attributes: …” block to the original prompt, forming an attribute‑augmented prompt x_A, and evaluates the log‑likelihood difference log π(y_w|x,A) − log π(y_l|x,A).

Because exhaustively searching all 2^K subsets is infeasible, the authors propose a greedy k‑budget algorithm: iteratively add the attribute that yields the largest increase in the objective, stopping after k attributes are selected. This reduces the evaluation cost to O(K·k) and, under a mild submodularity assumption, enjoys a (1 − 1/e) approximation guarantee. The resulting attribute set is stored as a per‑prompt entry in an index.

At inference time, given a new user prompt x, EXACT encodes all historical prompts into normalized embeddings using a separate LLM encoder, computes cosine similarity, and retrieves the most semantically similar past prompt x_i*. The attribute subset A_i* associated with x_i* is then injected into x, producing the final prompt x_{A_i*}. This retrieval‑based adaptation allows the system to dynamically select attributes that match the current context, thereby mitigating contextual preference shifts without maintaining multiple static user profiles. The authors also provide a theoretical analysis showing that the retrieved attribute set maximizes the conditional probability of the preferred response, effectively aligning with the Bradley‑Terry preference model.

Empirical evaluation is conducted on several human‑annotated preference datasets, including PRISM and Summarize‑from‑Human‑Feedback. Metrics cover attribute selection accuracy, preference modeling accuracy, and human‑rated generation quality. EXACT consistently outperforms strong baselines such as prompt‑based methods, P‑AD, and Drift, achieving 5–12 % absolute gains, especially in scenarios where prompts vary widely in topic or style. Importantly, the approach requires only a single base LLM at inference, incurring negligible additional computational overhead compared to methods that need auxiliary models or per‑user fine‑tuning.

In summary, EXACT offers a theoretically grounded, interpretable, and computationally efficient solution for decoding‑time personalization. By learning explicit attribute subsets offline and retrieving context‑relevant attributes online, it captures the dynamic nature of user preferences, provides clear explanations of what drives a personalized output, and sets a new benchmark for LLM personalization without the cost of model re‑training.


Comments & Academic Discussion

Loading comments...

Leave a Comment