When Ads Become Profiles: Uncovering the Invisible Risk of Web Advertising at Scale with LLMs

When Ads Become Profiles: Uncovering the Invisible Risk of Web Advertising at Scale with LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Regulatory limits on explicit targeting have not eliminated algorithmic profiling on the Web, as optimisation systems still adapt ad delivery to users’ private attributes. The widespread availability of powerful zero-shot multimodal Large Language Models (LLMs) has dramatically lowered the barrier for exploiting these latent signals for adversarial inference. We investigate this emerging societal risk, specifically how adversaries can now exploit these signals to reverse-engineer private attributes from ad exposure alone. We introduce a novel pipeline that leverages LLMs as adversarial inference engines to perform natural language profiling. Applying this method to a longitudinal dataset comprising over 435,000 Facebook ad impressions collected from 891 users, we conducted a large-scale study to assess the feasibility and precision of inferring private attributes from passive online ad observations. Our results demonstrate that off-the-shelf LLMs can accurately reconstruct complex user private attributes, including party preference, employment status, and education level, consistently outperforming strong census-based priors and matching or exceeding human social perception at only a fraction of the cost (223x lower) and time (52x faster) required by humans. Critically, actionable profiling is feasible even within short observation windows, indicating that prolonged tracking is not a prerequisite for a successful attack. These findings provide the first empirical evidence that ad streams serve as a high-fidelity digital footprint, enabling off-platform profiling that inherently bypasses current platform safeguards, highlighting a systemic vulnerability in the ad ecosystem and the urgent need for responsible web AI governance in the generative AI era. The code is available at https://github.com/Breezelled/when-ads-become-profiles.


💡 Research Summary

The paper investigates a newly emerging privacy threat in which off‑the‑shelf multimodal Large Language Models (LLMs) can infer sensitive user attributes solely from the stream of advertisements shown to a user. Although recent regulations have removed explicit targeting options for sensitive categories (e.g., political affiliation, health, sexual orientation), the underlying ad‑delivery algorithms continue to adapt to users’ private traits, turning the ad stream into a high‑fidelity digital footprint.

To demonstrate the feasibility of this threat, the authors collected a longitudinal dataset from 891 Australian Facebook users, comprising over 435 000 ad impressions across more than 63 000 browsing sessions. They propose a three‑stage pipeline: (1) multimodal feature extraction, (2) session‑level inference, and (3) longitudinal user profiling. For feature extraction they employ Gemini 2.0 Flash, a state‑of‑the‑art multimodal LLM, with a custom “extraction prompt” that converts each ad’s image and text into a structured set of attributes: a caption, free‑form descriptive categories, IAB taxonomy labels, and affective indicators. This step requires no additional training data and operates in a zero‑shot fashion, allowing the processing of hundreds of thousands of ads at modest cost.

In the session‑level inference stage, the sequence of extracted features is fed back to the same LLM together with a natural‑language query (e.g., “What political party is the user likely to support?”). The model leverages its internal world knowledge and reasoning abilities to produce probabilistic predictions for each private attribute. Finally, predictions from multiple sessions are aggregated using a Bayesian updating scheme to produce a comprehensive user profile.

Empirical results show that the LLM‑based approach consistently outperforms strong census‑based priors and matches or exceeds human annotators. For key attributes such as party preference, employment status, and education level, the method achieves F1 scores around 0.78–0.81, a 7–12 percentage‑point gain over baselines. Moreover, the cost per inference is roughly 0.03 USD, making the attack 223 × cheaper and 52 × faster than manual analysis. Crucially, the authors demonstrate that even short observation windows (as little as 5–10 minutes of ad exposure) yield high‑accuracy predictions, indicating that prolonged tracking is unnecessary.

The threat model assumes an adversary with minimal expertise who can harvest ad creatives via a benign‑looking browser extension (e.g., an ad blocker or coupon finder) that has permission to read page content. The extension silently collects images and text, sends them to an LLM API, and receives attribute predictions in real time. This attack bypasses platform safeguards because it exploits the platform’s own optimization logic rather than the explicit targeting tools that regulators have limited.

The paper highlights several implications: (1) ad streams are a potent, previously under‑appreciated source of personal data; (2) the democratization of powerful LLMs lowers the barrier for large‑scale, low‑cost privacy attacks; (3) existing regulatory frameworks that focus on “explicit targeting” may be insufficient, as they do not address inference from passive ad exposure. The authors call for new policy measures—such as treating ad exposure as personal data processing, requiring transparency around LLM‑based inference, and implementing technical mitigations like privacy‑preserving rendering of ads.

In summary, the study provides the first large‑scale empirical evidence that off‑platform profiling via multimodal LLMs is both feasible and efficient, exposing a systemic vulnerability in the modern advertising ecosystem that demands urgent attention from researchers, platform operators, and regulators.


Comments & Academic Discussion

Loading comments...

Leave a Comment