신경 영감형 위상 정규화가 멀티모달 비전‑언어 모델의 프라이버시 방어력을 강화한다

February 23, 2026

Reading time: 7 minute

...

📝 Abstract

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI/ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal models’ resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (i.e., τ -regularized) framework to analyze MM VLMs’ resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the τ > 0 configuration defines the NEURO variant of each VLM under varying values of the topological coefficient τ (0-3). We show how τ -regularization affects the MIA attack success, offering a quantitative perspective on the privacy-utility tradeoffs. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by ∼ 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides empirical evidence on neuro VLMs’ privacy threat resilience.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

In the age of Agentic AI, multi-modal models (MMs) developed with multiple data modalities and sources are becoming more popular in application areas, including autonomous robotics [1], [2], healthcare [3], [4], education [5], biometric authentication [6], [7], and content creation [8], [9]. These MMs often include different modalities, including image and text modalities. In the last few years, Large Language Models (LLMs) like ChatGPT have seen more improvements and can now solve a wide range of problems, Department of Computer Science, The University of Alabama, 248 Kirkbride Ln., Tuscaloosa, AL 35401 USA. making them more popular than ever [10], [11]. These models are known for their language understanding, built on the foundation of Natural Language Processing (NLP) [11]. VLMs benefit from advances in LLM research and are considered part of the AI model cluster that leverages NLP and Computer Vision to create a more complete understanding of visual and textual information [12]. VLMs with this power can identify objects in an image and provide a natural language description of the object identified [12], [13], [14]. VLMs are integrated into many applications and tools from basic to sophisticated tasks, e.g., extracting text from an image, explaining a diagram, or counting objects [15], [14], [12], [16]. This can further be incorporated in wearable computing [17], trustworthy AI/ML computing [18], health tracking [19], mission-critical agentic AI solutions [20], and user authentication [21].

Despite their capabilities, likewise unimodal models, VLMs are susceptible to vulnerabilities like adversarial and privacy attacks, especially due to their multi-modality [16], [15], [22], [23], [24], [25], [26], [27]. Among these is a type of attack in which image and text pairs are subtly altered to mislead the model to produce harmful outputs [16], [23], [28], [29], [30]. Another common vulnerability in the form of an attack to VLMs is membership inference attacks (MIAs), where attackers (also known as adversaries) try to deduce whether the model used some specific data in its training, which can be a privacy risk [31], [32], [33], [34], [35], [36].

Shokri et al. [36] implemented and tested the first-ever black-box membership inference attacks (MIAs) on machine learning models. [37], [38] affirmed the success of these attacks on different machine learning models, including classification and generative models, convolution neural networks (CNNs), and multilayer perceptron (MLP). Some of these attacks also target ML models that learned on multiple and sensitive image and text data [39]. Recently, work on MIAs has expanded to LLM-based models, such as generative text LLMs. Some black-box, text-only MIA methods in LLMs, such as the Repeat and Brainwash techniques, check for membership by examining patterns in the text the model produces [40], [41]. These methods are similar to MIAs on VLMs, like our work, in that they rely on measuring the semantic similarity or lexical overlap between generated captions and ground-truth data. However, the LLM attacks target text data rather than text-image pairs. More recently, we have seen some emerging work on MIAs on VLMs, with the first-ever by [42], [43], who explored MIAs on multimodal image captioning systems using metric-based and feature-based similarity methods. Hu et al. [31] highlighted the need for more attention to MIAs on VLMs, as most work on VLMs has focused on improving the performance of these models’ interactions [44]. In their work, they proposed the first temperature-based MIA methods targeting instructiontuning data in VLMs.

Although researchers started focusing on privacy vulnerabilities to VLMs, it has generally focused on detecting MIAs on regular VLMs using metric-or similarity-based signals [45], [43], which exploit the proximity of the modelgenerated caption to the ground-truth training caption. These efforts consider regular VLMs vulnerabilities, which do not seem to be resilient against MIAs. Studies, on the other hand, pointed out that neuroscience-inspired biologically regularized unimodal models can be more resilient to privacy threats compared to regular neural networks due to their capabilities to capture subtle changes in samples [46], [47]. However, it is not yet explored how this holds for multimodal complex VLMs. Therefore, we focus on developing a biologically inspired regularization-based framework to assess the MIA vulnerabilities on MM VLMs. We explore how neuroscience-inspired model-level structural regularization, specifically τ -regularization (topology-inducing training) [44], can impact VLMs resilience. Instead of just detecting data leakage through output metrics, we study an internal framework that makes models more resilient against MIAs, without compromising performance (utility). More importantly, to the best of our knowledge, no research has examined how neuro-inspired vision-language models (i.e., those trained with topograp

View Original ArXiv

This content is AI-processed based on ArXiv data.

신경 영감형 위상 정규화가 멀티모달 비전‑언어 모델의 프라이버시 방어력을 강화한다

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found