Can Large Language Models Automate Phishing Warning Explanations? A Controlled Experiment on Effectiveness and User Perception

Can Large Language Models Automate Phishing Warning Explanations? A Controlled Experiment on Effectiveness and User Perception
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Phishing has become a prominent risk in modern cybersecurity, often used to bypass technological defences by exploiting predictable human behaviour. Warning dialogues are a standard mitigation measure, but the lack of explanatory clarity and static content limits their effectiveness. In this paper, we report on our research to assess the capacity of Large Language Models (LLMs) to generate clear, concise, and scalable explanations for phishing warnings. We carried out a large-scale between-subjects user study (N = 750) to compare the influence of warning dialogues supplemented with manually generated explanations against those generated by two LLMs, Claude 3.5 Sonnet and Llama 3.3 70B. We investigated two explanatory styles (feature-based and counterfactual) for their effects on behavioural metrics (click-through rate) and perceptual outcomes (e.g., trust, risk, clarity). The results provide empirical evidence that LLM-generated explanations achieve a level of protection statistically comparable to expert-crafted messages, effectively automating a high-cost task. While Claude 3.5 Sonnet showed a trend towards reducing click-through rates compared to manual baselines, Llama 3.3, despite being perceived as clearer, did not yield the same behavioral benefits. Feature-based explanations were more effective for genuine phishing attempts, whereas counterfactual explanations diminished false-positive rates. Other variables, such as workload, gender, and prior familiarity with warning dialogues, significantly moderated the effectiveness of warnings. These results indicate that LLMs can be used to automatically build explanations for warning users against phishing, and that such solutions are scalable, adaptive, and consistent with human-centred values.


💡 Research Summary

The research paper titled “Can Large Language Models Automate Phishing Warning Explanations? A Controlled Experiment on Effectiveness and User Perception” addresses a critical gap in modern cybersecurity: the inefficiency of static, non-explanatory phishing warnings. As phishing attacks increasingly exploit human psychology to bypass technical defenses, the authors propose using Large Language Models (LLMs) to generate dynamic, clear, and scalable explanations that can enhance user awareness and decision-making.

To evaluate this, the researchers conducted a large-scale, between-subjects user study involving 750 participants. The study compared three types of warning dialogues: manually crafted expert explanations (the baseline), and explanations generated by two state-of-the-art LLMs, Claude 3.5 Sonnet and Llama 3.3 70B. The researchers specifically investigated two distinct explanatory frameworks: “feature-based” explanations, which highlight specific indicators of phishing, and “counterfactual” explanations, which describe what conditions would need to change for the site to be considered safe.

The empirical results demonstrate that LLM-generated explanations are statistically comparable to expert-driven messages in terms of defensive effectiveness. A significant finding was that Claude 3.5 Sonnet showed a notable trend in reducing click-through rates (CTR) for phishing attempts, suggesting its ability to drive protective behavior. In contrast, while Llama 3.3 70B was perceived by users as being clearer and more understandable, it did not yield the same level of behavioral reduction in CTR.

The study also revealed the nuanced utility of different explanation styles. Feature-based explanations proved highly effective in mitigating actual phishing attacks by providing concrete evidence of risk. Conversely, counterfactual explanations were particularly useful in reducing false-positive rates, as they helped users understand the boundary between safe and malicious content, thereby reducing unnecessary alarm.

Furthermore, the research identified several moderating variables that influence the efficacy of these warnings, including user workload, gender, and prior familiarity with warning dialogues. These findings suggest that the effectiveness of an automated warning system is not solely dependent on the quality of the LLM’s text but also on the context of the user. In conclusion, the paper provides strong evidence that LLMs can automate the high-cost task of creating security explanations, offering a scalable, adaptive, and human-centered approach to modern phishing defense.


Comments & Academic Discussion

Loading comments...

Leave a Comment