Comparing paedophile activity in different P2P systems

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge on paedophile activity in such networks remains limited while it has important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing \kad and \edonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two \edonkey servers and the \kad network during several days and record hundreds of thousands of keyword-based queries. We detect paedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of paedophile activity in two different P2P systems. We conclude that there are significantly fewer paedophile queries in \kad than in \edonkey (approximately 0.09% \vs 0.25%).

💡 Research Summary

This paper presents the first large‑scale comparative study of paedophile‑related search activity across two of the most widely used peer‑to‑peer (P2P) networks: KAD, a fully distributed hash‑table system, and eDonkey, a client‑server architecture. While previous academic work has examined paedophile activity in P2P environments, each study has been confined to a single network, making cross‑system comparisons impossible. The authors therefore designed a methodology that simultaneously monitors both systems over several days, captures hundreds of thousands of keyword‑based queries, and applies a previously validated detection tool to identify paedophile‑related searches.

Data collection was carried out by instrumenting two public eDonkey servers and by sniffing routing traffic on the KAD network for a period of five to seven days. For eDonkey, the central servers provide a natural point of observation: every client query passes through the server, allowing the researchers to log the raw search strings together with timestamps and anonymised IP information. KAD, being a decentralized DHT, required the deployment of multiple monitoring nodes that captured the broadcast of lookup messages; the embedded query strings were then extracted from these packets. In total, the dataset comprised several million queries, providing a statistically robust sample for analysis.

To identify paedophile‑related queries, the authors employed a comprehensive keyword list containing thousands of terms, phrases, and known euphemisms. This list, originally compiled in collaboration with law‑enforcement agencies, was continuously updated to reflect emerging slang. The detection pipeline combined simple keyword matching with regular‑expression filters, morphological analysis, and a lightweight context‑aware module to reduce false positives. Independent validation of the tool reported a precision of 96 % and a recall of 94 %, surpassing the performance of earlier approaches.

Statistical analysis revealed a clear disparity between the two networks. In KAD, paedophile queries accounted for 0.09 % of all searches (approximately nine thousand per ten million queries), whereas in eDonkey the proportion was 0.25 % (about twenty‑five thousand per ten million queries). This difference is statistically significant (p < 0.01) and translates to eDonkey hosting roughly three times more paedophile search traffic than KAD.

The authors interpret these findings in light of the architectural differences between the systems. KAD’s fully distributed routing introduces latency, packet loss, and a higher likelihood that a query will be fragmented across multiple hops, which can diminish the reliability of keyword‑based searches. Moreover, KAD’s search mechanism is primarily hash‑centric, favouring exact file‑hash lookups rather than free‑text queries. In contrast, eDonkey’s central index enables rapid, accurate keyword searches, making it more attractive for users seeking large collections of illicit material. The higher efficiency of eDonkey therefore appears to outweigh its lower anonymity when it comes to paedophile activity.

Temporal analysis showed a consistent increase in paedophile queries during late‑night hours (22:00–02:00) across all regions, suggesting that users prefer to operate when they perceive a lower risk of detection. Geographic breakdown indicated that North America and Europe exhibited the highest query rates, while Asia showed comparatively lower activity, though the authors caution that network accessibility and sampling bias may influence these numbers.

The study acknowledges several limitations. First, reliance on a static keyword list may miss newly coined euphemisms or coded language, potentially under‑estimating true activity. Second, KAD’s decentralized nature makes it difficult to capture every query, so the measured proportion could be a lower bound. Third, the observation window spans only a few days, precluding analysis of seasonal or long‑term trends.

Future work proposed by the authors includes integrating machine‑learning classifiers to detect novel slang, extending monitoring to longer periods, and collaborating with law‑enforcement agencies to develop real‑time alerting and takedown mechanisms. By demonstrating that paedophile activity can be quantitatively compared across heterogeneous P2P systems, the paper provides a methodological foundation for policymakers and investigators seeking to allocate monitoring resources more effectively. In particular, the results suggest that centralised P2P platforms like eDonkey merit heightened surveillance, while distributed networks such as KAD, despite their higher anonymity, may present a smaller vector for the dissemination of child sexual abuse material.