Invisible Trails? An Identity Alignment Scheme based on Online Tracking

Invisible Trails? An Identity Alignment Scheme based on Online Tracking
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many tracking companies collect user data and sell it to data markets and advertisers. While they claim to protect user privacy by anonymizing the data, our research reveals that significant privacy risks persist even with anonymized data. Attackers can exploit this data to identify users’ accounts on other websites and perform targeted identity alignment. In this paper, we propose an effective identity alignment scheme for accurately identifying targeted users. We develop a data collector to obtain the necessary datasets, an algorithm for identity alignment, and, based on this, construct two types of de-anonymization attacks: the \textit{passive attack}, which analyzes tracker data to align identities, and the \textit{active attack}, which induces users to interact online, leading to higher success rates. Furthermore, we introduce, for the first time, a novel evaluation framework for online tracking-based identity alignment. We investigate the key factors influencing the effectiveness of identity alignment. Additionally, we provide an independent assessment of our generated dataset and present a fully functional system prototype applied to a cryptocurrency use case.


💡 Research Summary

The paper “Invisible Trails? An Identity Alignment Scheme based on Online Tracking” investigates the privacy risks inherent in data collected by third‑party web trackers, even when that data is claimed to be anonymized. The authors demonstrate that an adversary who possesses a user’s account on a known “source” website (SiteA) and can obtain anonymized behavioral logs from a tracking service (SiteT) can reliably infer the user’s account on an unknown “target” website (SiteB).

The work is organized around six technical challenges (TC‑I to TC‑VI). TC‑I and TC‑II concern the scarcity and heterogeneity of publicly available datasets, which makes it difficult to collect dynamic, cross‑domain user behavior. TC‑III highlights the unrealistic assumption that users always provide truthful profile information. TC‑IV points out the weak cross‑border social graph connections that impede traditional friend‑based matching. TC‑V notes the low accuracy of existing probabilistic alignment methods, and TC‑VI observes the lack of a standardized evaluation framework.

To address these challenges, the authors design three core modules:

  1. Data Collector – a hybrid system combining a web crawler (for public profiles, posts, friend lists) and a tracker component (for anonymous timestamps, domain visits, page‑view sequences). The collector normalizes raw logs using Algorithm 1, fills missing entries, and produces a unified dataset that contains both anonymous behavior traces and identified public attributes.

  2. Identity Alignment Scheme – a multi‑modal matching algorithm (Algorithms 2 and 3). Each user is represented by a feature vector that fuses (a) behavioral characteristics (time‑of‑day distribution, inter‑site transition patterns, browsing‑to‑posting ratios) and (b) content characteristics (lexical n‑grams, sentiment scores, topic‑model embeddings). A weighted similarity score is computed, followed by a graph‑based reinforcement step that aligns social‑graph edges across sites. The scheme can handle single‑account, multi‑account, cross‑border, and cross‑device scenarios, and it explicitly mitigates the impact of false or fabricated profile data.

  3. Attack Methods – two de‑anonymization strategies are built on top of the alignment scheme.

    • Passive Attack uses only the collected logs and public data to generate a minimal candidate set of target accounts.
    • Active Attack augments the passive approach by deliberately engaging the target (e.g., posting tailored content that matches the target’s interests) to increase the volume and diversity of the victim’s behavioral traces. Experiments show that the active attack improves identification success by roughly 18 percentage points and reduces the time‑to‑identify by a factor of 2.3 compared with the passive attack.

A novel evaluation framework is introduced to overcome TC‑VI. Three bespoke metrics are defined: Alignment Success Rate (the proportion of correctly matched target accounts), F1‑Alignment (harmonic mean of precision and recall specific to alignment tasks), and Time‑to‑Identify (average computational and data‑collection latency). Using a large‑scale dataset of over two million records from five social networks and three tracking platforms, the authors conduct extensive experiments across multiple scenarios. Key findings include: (i) a ten‑fold increase in data volume yields a 12 % improvement in success rate; (ii) combining behavioral and content features outperforms either modality alone by 18 %; (iii) the active attack consistently outperforms the passive attack in both accuracy and speed.

Two practical applications are demonstrated. First, a cryptocurrency‑crime tracing system links suspicious blockchain addresses to real‑world identities by aligning anonymous tracker logs with identified accounts on domestic platforms, thereby exposing the physical location of crypto offenders. Second, a Tor‑network de‑anonymization prototype shows that even users employing onion routing can be profiled by correlating their timing and site‑access patterns captured by trackers. Both prototypes achieve over 92 % alignment success in real‑world tests.

In conclusion, the paper provides a comprehensive end‑to‑end framework—data collection, alignment algorithm, attack strategies, and evaluation—that proves anonymized tracking data can be weaponized for high‑precision identity alignment. The findings call into question the adequacy of current anonymization practices (generalization, differential privacy) and suggest that stronger data‑minimization, transparency, and regulatory safeguards are required to protect user privacy in the era of pervasive online tracking.


Comments & Academic Discussion

Loading comments...

Leave a Comment