Out-of-the-box: Black-box Causal Attacks on Object Detectors

Reading time: 6 minute
...

📝 Original Info

  • Title: Out-of-the-box: Black-box Causal Attacks on Object Detectors
  • ArXiv ID: 2512.03730
  • Date: 2025-12-03
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less perceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original image, and hence imperceptible, demonstrating the power of causal pixels.

💡 Deep Analysis

Deep Dive into Out-of-the-box: Black-box Causal Attacks on Object Detectors.

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. BlackCAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of different sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box

📄 Full Content

Out-of-the-box: Black-box Causal Attacks on Object Detectors Melane Navaratnarajah∗ King’s College London Department of Informatics melane.navaratnarajah@kcl.ac.uk David A. Kelly∗ King’s College London Department of Informatics david.a.kelly@kcl.ac.uk Hana Chockler King’s College London Department of Informatics hana.chockler@kcl.ac.uk Abstract Adversarial perturbations are a useful way to expose vul- nerabilities in object detectors. Existing perturbation meth- ods are frequently white-box and architecture specific. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and an- alyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and a tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. Black- CAtt combines causal pixels with bounding boxes produced by object detectors to create adversarial attacks that lead to the loss, modification or addition of a bounding box. BlackCAtt works across different object detectors of differ- ent sizes and architectures, treating the detector as a black box. We compare the performance of BlackCAtt with other black-box attack methods and show that identification of causal pixels leads to more precisely targeted and less per- ceptible attacks. On the COCO test dataset, our approach is 2.7 times better than the baseline in removing a detection, 3.86 times better in changing a detection, and 5.75 times better in triggering new, spurious, detections. The attacks generated by BlackCAtt are very close to the original im- age, and hence imperceptible, demonstrating the power of causal pixels. 1. Introduction Picture yourself in a self-driving car when you suddenly see a dog in the road in front of you. The car has detected it as well, via its object detector. Then, for no apparent reason, the dog is no longer detected: its bounding box has van- ished. The dog is still there, you can see it, but to the car it *These authors contributed equally to this work has become invisible. You have to intervene and apply the brakes to avoid an accident. Why did the dog vanish? Object detectors (OD) are known to be vulnerable to both accidental and adversarial perturbations [33]. In fact, image classification models in general are quite easy to attack [31, 44]. What is harder to explain is what causes the attack to work. Generic attacks, such as global gaussian noise, are reproducible and demonstrate vulnerability of the models, but do not reveal the causal relationship between pixel-level perturbations and failed object detections. There is growing interest in exploring adversarial at- tacks on object detectors using eXplainable AI (XAI) tech- niques [47, 48]. These approaches are mostly white-box methods: they need access to OD hidden layers, which is an unnatural, and generous, attack model. Moreover, saliency maps produced from the hidden layers are well known to be noisy, sensitive to input perturbations and not naturally interpretable [40, 49]. In this paper, we present BlackCAtt (Black-box Causal Attacks), a black-box, causal approach to generating ad- versarial attacks on object detectors. BlackCAtt discovers minimal, sufficient pixel sets (MSPSs) for a detected object using ReX [7]. These pixels, by themselves, are enough to cause the required detection [6, 15]. BlackCAtt uses these pixels to generate low-distortion attacks that remove, alter, or introduce detections. BlackCAtt attacks the causes of the classification, not random pixels nor the entire image. One might expect that all MSPSs would be contained within the bounding box. One of the most surprising results in this paper is that MSPSs are, in fact, frequently either fully outside, or not fully contained within the OD bounding box. We exploit this phenomenon, showing that perturbing causal pixels outside the box often makes the box disappear (Section 6). This happens across different detector architec- tures (single-stage, two-stage, and transformer-based). We also compare the accuracy and precision of BlackCAtt’s native MSPSs by applying our extraction and perturba- tion techniques to another popular black box saliency tool, DRISE, and quantify attack success with a number of mea- sures, including perceptual distortion [50]. 1 arXiv:2512.03730v1 [cs.CV] 3 Dec 2025 (a) Cat and its bounding box (b) Bounding box and MSPS (c) No cat 1: blur (d) No cat 2: black Figure 1. The MSPS for cat (Figure 1b) reveals a dependency on the surrounding context. BlackCAtt starts with causal pixels outside of the bounding box and works inwards in order to maximize imperceptibility. In both Figures 1c and 1d the cat is still clearly present and complete, but YOLO no longer detects the cat. The attack works because BlackCAtt changes part of the cause of the detec

…(Full text truncated)…

📸 Image Gallery

000000111599_360_attacked_outside_blur.jpg 000000111599_360_attacked_outside_blur.webp 000000111599_360_attacked_outside_noise.jpg 000000111599_360_attacked_outside_noise.webp 000000111599_360_attacked_outside_pixel_value.jpg 000000111599_360_attacked_outside_pixel_value.webp 000000111599_360_attacked_outside_shift.jpg 000000111599_360_attacked_outside_shift.webp 000000111599_bbox_exp.png 000000111599_bbox_exp.webp 000000111599_best_no_pred_drise_mog.png 000000111599_best_no_pred_drise_mog.webp 000000111599_best_no_pred_greedy.png 000000111599_best_no_pred_greedy.webp 000000111599_best_no_pred_noise.png 000000111599_best_no_pred_noise.webp 000000111599_best_no_pred_rex_mog.png 000000111599_best_no_pred_rex_mog.webp 000000111599_drise_explanation.png 000000111599_drise_explanation.webp 000000111599_exp.png 000000111599_exp.webp 000000111599_saliency_drise.jpg 000000111599_saliency_drise.webp 000000114270_added_new_pred_greedy_bbox.png 000000114270_added_new_pred_greedy_bbox.webp 000000114270_added_new_pred_mixture_of_gaussian_bbox.png 000000114270_added_new_pred_mixture_of_gaussian_bbox.webp 000000114270_added_new_pred_noise_bbox.png 000000114270_added_new_pred_noise_bbox.webp 000000114270_added_new_pred_noise_targeted_bbox.png 000000114270_added_new_pred_noise_targeted_bbox.webp 000000114270_added_new_pred_saliency_bbox.png 000000114270_added_new_pred_saliency_bbox.webp 000000114270_drise_exp.png 000000114270_drise_exp.webp 000000114270_original_bbox.png 000000114270_original_bbox.webp 000000114270_rex_exp.png 000000114270_rex_exp.webp 000000114270_rex_responsibility.png 000000114270_rex_responsibility.webp 000000114270_rfdetr_responsibility_comb.png 000000114270_rfdetr_responsibility_comb.webp 000000114270_rfdetr_responsibility_negative.png 000000114270_rfdetr_responsibility_negative.webp 000000114270_saliency_0.jpg 000000114270_saliency_0.webp 000000146422_best_pred_changed_greedy_bbox.png 000000146422_best_pred_changed_greedy_bbox.webp 000000146422_best_pred_changed_mixture_of_gaussian_bbox.png 000000146422_best_pred_changed_mixture_of_gaussian_bbox.webp 000000146422_best_pred_changed_noise_bbox.png 000000146422_best_pred_changed_noise_bbox.webp 000000146422_best_pred_changed_noise_targeted_bbox.png 000000146422_best_pred_changed_noise_targeted_bbox.webp 000000146422_best_pred_changed_saliency_bbox.png 000000146422_best_pred_changed_saliency_bbox.webp 000000146422_drise_exp.png 000000146422_drise_exp.webp 000000146422_original_bbox.png 000000146422_original_bbox.webp 000000146422_rex_exp.png 000000146422_rex_exp.webp 000000146422_rfdetr_responsibility.png 000000146422_rfdetr_responsibility.webp 000000146422_rfdetr_responsibility_comb.png 000000146422_rfdetr_responsibility_comb.webp 000000146422_rfdetr_responsibility_negative.png 000000146422_rfdetr_responsibility_negative.webp 000000146422_saliency_0.jpg 000000146422_saliency_0.webp 000000206028_427_attacked_outside_blur.jpg 000000206028_427_attacked_outside_blur.webp 000000206028_427_attacked_outside_noise.jpg 000000206028_427_attacked_outside_noise.webp 000000206028_427_attacked_outside_pixel_value.jpg 000000206028_427_attacked_outside_pixel_value.webp 000000206028_427_attacked_outside_shift.jpg 000000206028_427_attacked_outside_shift.webp 000000206028_bbox_exp.png 000000206028_bbox_exp.webp 000000505832.jpg 000000505832.webp 000000505832_best_added_new_pred.png 000000505832_best_added_new_pred.webp 000000505832_best_no_pred_drise_mog.png 000000505832_best_no_pred_drise_mog.webp 000000505832_best_pred_changed.png 000000505832_best_pred_changed.webp 000000518436_427_attacked_inside_blur.jpg 000000518436_427_attacked_inside_blur.webp 000000518436_427_attacked_inside_noise.jpg 000000518436_427_attacked_inside_noise.webp 000000518436_427_attacked_inside_pixel_value.jpg 000000518436_427_attacked_inside_pixel_value.webp 000000518436_427_attacked_inside_shift.jpg 000000518436_427_attacked_inside_shift.webp 000000518436_427_attacked_outside_blur.jpg 000000518436_427_attacked_outside_blur.webp 000000518436_427_attacked_outside_noise.jpg 000000518436_427_attacked_outside_noise.webp 000000518436_427_attacked_outside_pixel_value.jpg 000000518436_427_attacked_outside_pixel_value.webp 000000518436_427_attacked_outside_shift.jpg 000000518436_427_attacked_outside_shift.webp 000000518436_exp.png 000000518436_exp.webp analysis_box_plot_coco_yolo_grouped.png analysis_box_plot_coco_yolo_grouped.webp analysis_conf_vs_dice_coco_fasterrcnn.png analysis_conf_vs_dice_coco_fasterrcnn.webp analysis_conf_vs_dice_coco_yolo.png analysis_conf_vs_dice_coco_yolo.webp analysis_conf_vs_frac_exp_inside_coco_fasterrcnn.png analysis_conf_vs_frac_exp_inside_coco_fasterrcnn.webp analysis_conf_vs_frac_exp_inside_coco_yolo.png analysis_conf_vs_frac_exp_inside_coco_yolo.webp cat.png cat.webp cat_452_attacked_outside_blur.jpg cat_452_attacked_outside_blur.webp cat_452_attacked_outside_noise.jpg cat_452_attacked_outside_noise.webp cat_452_attacked_outside_none.jpg cat_452_attacked_outside_none.webp cat_452_attacked_outside_pixel_value.jpg cat_452_attacked_outside_pixel_value.webp cat_452_attacked_outside_shift.jpg cat_452_attacked_outside_shift.webp cat_exp.png cat_exp.webp pipeline_example.png pipeline_example.webp pipeline_example_1.png pipeline_example_1.webp rt-detr-example.png rt-detr-example.webp spatial_overlap_dist_coco.png spatial_overlap_dist_coco.webp success_rate_vs_l2_threshold_coco_yolo.png success_rate_vs_l2_threshold_coco_yolo.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut