Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks

Reading time: 5 minute
...

📝 Original Info

  • Title: Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks
  • ArXiv ID: 2512.18264
  • Date: 2025-12-20
  • Authors: ** - Yucheng Fan¹ - Jiawei Chen¹ ² - Yu Tian³ - Zhaoxia Yin¹* ¹ East China Normal University, Shanghai, China ² Zhongguancun Academy, Beijing, China ³ Dept. of Computer Science and Technology, Institute for AI, Tsinghua University, Beijing, China ✉️ Corresponding author: zxyin@cee.ecnu.edu.cn — **

📝 Abstract

As vision-language models (VLMs) become widely adopted, VLM-based attribute inference attacks have emerged as a serious privacy concern, enabling adversaries to infer private attributes from images shared on social media. This escalating threat calls for dedicated protection methods to safeguard user privacy. However, existing methods often degrade the visual quality of images or interfere with vision-based functions on social media, thereby failing to achieve a desirable balance between privacy protection and user experience. To address this challenge, we propose a novel protection method that jointly optimizes privacy suppression and utility preservation under a visual consistency constraint. While our method is conceptually effective, fair comparisons between methods remain challenging due to the lack of publicly available evaluation datasets. To fill this gap, we introduce VPI-COCO, a publicly available benchmark comprising 522 images with hierarchically structured privacy questions and corresponding non-private counterparts, enabling fine-grained and joint evaluation of protection methods in terms of privacy preservation and user experience. Building upon this benchmark, experiments on multiple VLMs demonstrate that our method effectively reduces PAR below 25%, keeps NPAR above 88%, maintains high visual consistency, and generalizes well to unseen and paraphrased privacy questions, demonstrating its strong practical applicability for real-world VLM deployments.

💡 Deep Analysis

📄 Full Content

Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks Yucheng Fan1 Jiawei Chen1,2 Yu Tian 3 Zhaoxia Yin1∗ 1East China Normal University, Shanghai, China 2Zhongguancun Academy, Beijing, China 3Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua University, Beijing, China zxyin@cee.ecnu.edu.cn Abstract As vision-language models (VLMs) become widely adopted, VLM-based attribute inference attacks have emerged as a serious privacy concern, enabling adversaries to infer pri- vate attributes from images shared on social media. This escalating threat calls for dedicated protection methods to safeguard user privacy. However, existing methods often degrade the visual quality of images or interfere with vision- based functions on social media, thereby failing to achieve a desirable balance between privacy protection and user ex- perience. To address this challenge, we propose a novel protection method that jointly optimizes privacy suppres- sion and utility preservation under a visual consistency con- straint. While our method is conceptually effective, fair comparisons between methods remain challenging due to the lack of publicly available evaluation datasets. To fill this gap, we introduce VPI-COCO, a publicly available bench- mark comprising 522 images with hierarchically structured privacy questions and corresponding non-private counter- parts, enabling fine-grained and joint evaluation of protec- tion methods in terms of privacy preservation and user ex- perience. Building upon this benchmark, experiments on multiple VLMs demonstrate that our method effectively re- duces PAR below 25%, keeps NPAR above 88%, maintains high visual consistency, and generalizes well to unseen and paraphrased privacy questions, demonstrating its strong practical applicability for real-world VLM deployments. 1. Introduction In recent years, the rapid progress of Vision-Language Models (VLMs) has emerged as one of the most remarkable advances in Artificial Intelligence. By integrating visual and linguistic modalities, VLMs achieve powerful cross- modal understanding and generalization across real-world scenarios. However, their growing deployment also raises increasingly diverse and complex concerns about privacy and regulatory compliance [5, 20, 29, 31]. Users Images Social Media VLMs Attackers Users’ Attributes Inferred Attributes Figure 1. Illustration of VLM-based attribute inference attack. Users often share daily-life photos on social media. Attackers can exploit VLMs to infer personal privacy attributes from visual cues, even when such attributes are never explicitly disclosed. In particular, VLM-based attribute inference attacks have attracted increasing attention. Specifically, [6, 8, 14, 16] demonstrated that VLMs can infer an image’s ge- ographic location from subtle visual cues without geo- annotations, whereas [13, 22, 28] further showed that they can uncover private attributes from ordinary images through semantic cues. An illustrative example of such attacks is shown in Fig. 1, users on social platforms share volumes of daily-life images. By automatically collecting these im- ages, attackers can leverage VLMs to analyze visual con- tent and uncover latent private attributes about users. Such attacks are highly stealthy, low-cost, and scalable, posing an emerging threat to online privacy and security. To address these concerns, prior efforts have focused on enhancing model-level safety alignment and content filter- arXiv:2512.18264v1 [cs.CV] 20 Dec 2025 ing [18]. However, these safeguards can be easily circum- vented by simple evasion techniques [2, 15], rendering them ineffective against these attacks. This calls for an input- level privacy protection mechanism to defend against such threats, which poses two key challenges: (1) Trade-off between privacy protection and user ex- perience. An effective privacy protection method is ex- pected to induce the model to refuse privacy questions, thereby preventing potential privacy leakage. Meanwhile, it should also minimize unintended refusals of non-privacy questions that arise from the protection process to ensure that VLM-powered functions on social platforms, such as tagging and content recommendation, remain responsive for a satisfactory user experience. Existing input-level im- age privacy protection methods mainly fall into two cat- egories: anonymization-based protection [17, 19, 23, 27] and encryption-based protection [4, 21, 30, 32]. The for- mer anonymizes images by masking or replacing privacy- sensitive regions, while the latter enhances privacy by adding noise or applying image transformations to the input. However, these approaches often cause significant degrada- tion in visual quality and overlook their potential negative impact on non-privacy questions, thereby failing to achieve a trade-off between privacy protection and user experience. (2) Lack of publicly available targeted evaluation datasets. Existing relate

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut