VEXA: Evidence-Grounded and Persona-Adaptive Explanations for Scam Risk Sensemaking

VEXA: Evidence-Grounded and Persona-Adaptive Explanations for Scam Risk Sensemaking
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Online scams across email, short message services, and social media increasingly challenge everyday risk assessment, particularly as generative AI enables more fluent and context-aware deception. Although transformer-based detectors achieve strong predictive performance, their explanations are often opaque to non-experts or misaligned with model decisions. We propose VEXA, an evidence-grounded and persona-adaptive framework for generating learner-facing scam explanations by integrating GradientSHAP-based attribution with theory-informed vulnerability personas. Evaluation across multi-channel datasets shows that grounding explanations in detector-derived evidence improves semantic reliability without increasing linguistic complexity, while persona conditioning introduces interpretable stylistic variation without disrupting evidential alignment. These results reveal a key design insight: evidential grounding governs semantic correctness, whereas persona-based adaptation operates at the level of presentation under constraints of faithfulness. Together, VEXA demonstrates the feasibility of persona-adaptive, evidence-grounded explanations and provides design guidance for trustworthy, learner-facing security explanations in non-formal contexts.


💡 Research Summary

The paper introduces VEXA, a novel framework that generates learner‑facing explanations for scam detection by grounding natural‑language output in model‑derived evidence and adapting the style to psychologically informed vulnerability personas. VEXA’s pipeline consists of four stages: (1) a BERT‑based transformer classifier is trained to detect scam messages across email, SMS, and social‑media channels and then frozen to provide stable predictions; (2) GradientSHAP is applied in the embedding space to compute token‑level contribution scores, which are aggregated and filtered to produce a compact set of evidence tokens (e.g., urgency cues, reward language, shortened URLs); (3) a persona selection module maps high‑vulnerability and low‑vulnerability profiles—derived from Big Five personality traits and security‑psychology literature—onto stylistic instructions that affect tone, granularity, and framing but not the underlying evidence; (4) a large language model (LLM) receives a prompt containing the original message, the extracted evidence, and the persona‑specific instruction, and generates a natural‑language explanation that must reference the evidence, thereby ensuring faithfulness.

The authors assembled a multi‑channel corpus comprising publicly available email, SMS, and SNS datasets, balanced to contain equal numbers of scam and benign messages (35 k per channel). The detection component was evaluated, with DeBERTa‑v3‑base achieving the highest macro‑F1 (0.93875) and being selected for all downstream experiments. Four explanation conditions were compared on a stratified 10 % subset of scam messages per channel: (i) Pure LLM (no evidence), (ii) LLM + XAI (evidence‑grounded, neutral style), (iii) LLM + XAI + high‑vulnerability persona, and (iv) LLM + XAI + low‑vulnerability persona. Automatic metrics (ROUGE, BERTScore) and human judgments assessed semantic reliability, lexical diversity, sentiment tone, and evidential alignment. Results show that evidence‑grounded explanations significantly improve semantic reliability over pure LLM outputs without increasing linguistic complexity. Persona conditioning introduces clear stylistic variations—high‑vulnerability explanations are calmer and more contextual, while low‑vulnerability explanations are concise and analytically framed—yet does not degrade evidential alignment.

The key design insight is a functional separation: evidential grounding governs semantic correctness, whereas persona‑based adaptation operates at the presentation layer under the constraint of faithfulness. This separation allows VEXA to provide trustworthy, user‑adapted explanations without requiring personal data or explicit user modeling. The authors argue that VEXA can be integrated into informal security education tools and real‑time risk‑awareness systems, and suggest future work on finer‑grained persona spectra, interactive feedback loops, and visual evidence displays to further enhance learning outcomes.


Comments & Academic Discussion

Loading comments...

Leave a Comment