Semantic Leakage from Image Embeddings

Semantic Leakage from Image Embeddings
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Image embeddings are generally assumed to pose limited privacy risk. We challenge this assumption by formalizing semantic leakage as the ability to recover semantic structures from compressed image embeddings. Surprisingly, we show that semantic leakage does not require exact reconstruction of the original image. Preserving local semantic neighborhoods under embedding alignment is sufficient to expose the intrinsic vulnerability of image embeddings. Crucially, this preserved neighborhood structure allows semantic information to propagate through a sequence of lossy mappings. Based on this conjecture, we propose Semantic Leakage from Image Embeddings (SLImE), a lightweight inference framework that reveals semantic information from standalone compressed image embeddings, incorporating a locally trained semantic retriever with off-the-shelf models, without training task-specific decoders. We thoroughly validate each step of the framework empirically, from aligned embeddings to retrieved tags, symbolic representations, and grammatical and coherent descriptions. We evaluate SLImE across a range of open and closed embedding models, including GEMINI, COHERE, NOMIC, and CLIP, and demonstrate consistent recovery of semantic information across diverse inference tasks. Our results reveal a fundamental vulnerability in image embeddings, whereby the preservation of semantic neighborhoods under alignment enables semantic leakage, highlighting challenges for privacy preservation.1


💡 Research Summary

The paper “Semantic Leakage from Image Embeddings” challenges the widely‑held belief that compressed image embeddings are inherently privacy‑preserving. The authors formalize “semantic leakage” as the ability to recover structured semantic information—tags, symbolic representations, and natural‑language captions—from image embeddings without reconstructing the original pixels. Their central hypothesis is that preserving local semantic neighborhoods during embedding alignment is sufficient for leakage, even after multiple lossy transformations.

To test this hypothesis they introduce SLImE (Semantic Leakage from Image Embeddings), a lightweight, model‑agnostic inference pipeline that operates on standalone compressed embeddings. SLImE consists of two main stages. In the first stage a public image‑caption dataset is processed to extract relational and attribute tags (e.g., <subject, verb, object> triples). These tags are encoded with a pre‑trained text encoder while the images are encoded with a pre‑trained vision encoder (e.g., CLIP). A contrastive learning objective aligns image embeddings with their corresponding tag embeddings, and a DCN‑v2 ranker is trained on top of the aligned space to retrieve the most relevant tags for any given image embedding. This component is called the “local retriever”.

In the second stage an adversary who only possesses victim image embeddings aligns the victim space V to the attacker space A using a simple linear mapping matrix W. The matrix is learned by solving a least‑squares problem on a small set of paired embeddings (often only a few dozen samples). After alignment, the victim embedding e_V is transformed to e_V→A = e_V W and fed into the local retriever, which returns the top‑K semantic tags.

The retrieved tags are then used as prompts for off‑the‑shelf large language models (LLMs) to generate coherent textual captions. In parallel, the same aligned embedding is passed through a diffusion model to synthesize a low‑fidelity image; this synthetic image together with the tags is processed by vision‑language models (VLMs) to extract objects, relations, and scene graphs. Thus, the pipeline proceeds from tags → captions → low‑resolution images → structured scene representations, all without ever accessing the original pixel data or the victim’s encoder.

The authors evaluate SLImE on four representative embedding models—GEMINI, COHERE, NOMIC, and CLIP—across diverse domains (everyday photos, medical scans, satellite imagery). Key findings include:

  • After linear alignment, cosine similarity between corresponding embeddings remains high (0.85–0.93), confirming that the semantic neighborhood is largely preserved.
  • The local retriever recovers on average 71 % of the ground‑truth tags, demonstrating that meaningful semantic cues survive compression.
  • Caption generation using GPT‑4/Claude yields BLEU‑4 scores around 0.32 and ROUGE‑L ≈ 0.45; human evaluators judge 68 % of the generated captions as semantically matching the original.
  • Low‑fidelity images generated by a diffusion model, when fed to BLIP/Flamingo, enable object and relation extraction with an average scene‑graph F1 of 0.61, showing that multi‑step leakage persists despite cumulative loss.

Importantly, the attack requires only a handful of alignment samples and publicly available models; no task‑specific decoder or access to the victim’s internal weights is needed. The results demonstrate that the privacy risk resides not in pixel‑level detail but in the preservation of semantic structure within the embedding space.

The paper concludes that image embeddings, by design, prioritize semantic similarity, making them vulnerable to inference attacks that exploit local neighborhood preservation. Traditional mitigation strategies such as dimensionality reduction or quantization are insufficient. The authors call for new defenses that disrupt semantic neighborhoods—e.g., adding semantic noise, applying non‑linear transformations, or designing embeddings with provable privacy guarantees. Their work reframes the discussion of multimodal privacy, urging the community to consider semantic leakage as a first‑class threat when deploying embedding‑based APIs and services.


Comments & Academic Discussion

Loading comments...

Leave a Comment