Computer Science / Cryptography and Security

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

February 09, 2026

Reading time: 1 minute

...

📝 Original Info

Title: Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing
ArXiv ID: 2601.00042
Date: 2025-12-31
Authors: Manish Bhatt, Adrian Wood, Idan Habler, Ammar Al-Kahfah

📝 Abstract

Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapted Go-Explore to test GPT-4o-mini across 28 experimental runs examining 6 research questions. Key findings: (1) Random seed variance dominates algorithmic parameters (8× outcome spread; single-seed comparisons are unreliable, and multi-seed averaging materially reduces variance in our setup). ( 2 ) Reward shaping consistently harms (94% exploration collapse, or 18 false positives with zero verified attacks). (3) Simple state signatures outperform complex ones in our environment. (4) For comprehensive security testing, ensembles provide attack-type diversity while single agents optimize within-type coverage. These results suggest that seed variance and targeted domain knowledge can outweigh algorithmic sophistication when testing safety-trained models.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found