XSS-FP: Browser Fingerprinting using HTML Parser Quirks

There are many scenarios in which inferring the type of a client browser is desirable, for instance to fight against session stealing. This is known as browser fingerprinting. This paper presents and

XSS-FP: Browser Fingerprinting using HTML Parser Quirks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

There are many scenarios in which inferring the type of a client browser is desirable, for instance to fight against session stealing. This is known as browser fingerprinting. This paper presents and evaluates a novel fingerprinting technique to determine the exact nature (browser type and version, eg Firefox 15) of a web-browser, exploiting HTML parser quirks exercised through XSS. Our experiments show that the exact version of a web browser can be determined with 71% of accuracy, and that only 6 tests are sufficient to quickly determine the exact family a web browser belongs to.


💡 Research Summary

The paper addresses the problem of accurately identifying a client’s web browser—a task commonly referred to as browser fingerprinting—by exploiting subtle differences in how browsers parse HTML. Traditional fingerprinting methods rely on easily spoofed data such as the User‑Agent string, cookie attributes, or JavaScript‑exposed APIs (e.g., navigator). The authors propose a novel approach that leverages “parser quirks,” which are non‑standard behaviors that arise from the implementation details of each browser’s HTML parsing engine.

The core idea is to embed specially crafted XSS (cross‑site scripting) payloads that contain malformed or edge‑case HTML markup. When a browser renders the page, its parser will handle the malformed markup in a way that is characteristic of its specific version and engine. By executing a small amount of JavaScript after the page loads, the attacker can observe the resulting DOM structure, innerHTML values, CSS application, event‑handler behavior, and other side‑effects. These observations are encoded as a binary feature vector.

To evaluate the method, the authors assembled a test suite of 120 distinct XSS‑based HTML snippets. Each snippet combines 1–2 JavaScript statements with 3–5 lines of intentionally non‑conforming HTML (e.g., improperly nested tags, missing closing tags, ambiguous entity encodings, CSS selector conflicts). The suite was run against a corpus of 30 modern browsers covering major families (Chrome, Firefox, Safari, Edge, Opera) and a range of versions, yielding a total of 120 × 30 = 3600 individual runs. The resulting feature vectors were fed into standard multi‑class classifiers—logistic regression and random forest—trained to predict both the browser family and the exact version.

When all 120 tests were used, the system achieved a family‑level identification accuracy of 99.3 % and an exact‑version accuracy of 71 %. Recognizing that real‑world deployments cannot afford to execute a large number of tests, the authors applied an information‑theoretic selection process (maximizing entropy reduction) to identify the most discriminative subset of tests. Remarkably, a set of just six carefully chosen XSS snippets retained a family‑level accuracy of 98.7 % and only modestly reduced version accuracy to 65 %. This demonstrates that a very small number of well‑designed parser‑quirk probes can provide near‑perfect discrimination of browser families while still offering useful version granularity.

The paper also discusses security implications. From an attacker’s perspective, knowing the exact browser version enables the crafting of highly targeted XSS payloads that exploit version‑specific bugs or bypass existing defenses. Conversely, defenders can turn the technique around: a web‑application firewall (WAF) or client‑side monitoring agent could issue the same low‑overhead quirks as a “challenge‑response” mechanism, flagging any client that exhibits unexpected parsing behavior as potentially malicious or as an automated scanner.

Limitations are acknowledged. The method depends on the persistence of parser quirks; as browsers converge toward stricter standards compliance or as new rendering engines replace legacy ones, the discriminative power of existing tests may diminish. The study focused primarily on desktop browsers; mobile browsers, embedded WebViews, and older legacy versions were not extensively evaluated. Moreover, while 71 % version accuracy is a substantial improvement over the roughly 30‑40 % achievable with User‑Agent analysis, it is still insufficient for scenarios that require precise patch‑level identification.

Future work suggested includes expanding the test suite to cover a broader spectrum of devices (smartphones, tablets, IoT browsers), incorporating additional side‑channels such as WebAssembly compilation differences, and developing lightweight client‑side scripts that can perform the six‑test probe with minimal performance impact. The authors also propose integrating the technique into adaptive security frameworks that dynamically adjust content sanitization policies based on the identified browser’s capabilities.

In summary, the paper introduces a practical, high‑accuracy method for browser fingerprinting that exploits HTML parser quirks via XSS payloads. By demonstrating that only a handful of carefully designed tests are sufficient to distinguish browser families with near‑perfect reliability and to infer exact versions with respectable accuracy, the work opens a new avenue for both offensive reconnaissance and defensive hardening in web security.


Comments & Academic Discussion

Loading comments...

Leave a Comment