Tracking and Characterizing Botnets Using Automatically Generated Domains
Modern botnets rely on domain-generation algorithms (DGAs) to build resilient command-and-control infrastructures. Recent works focus on recognizing automatically generated domains (AGDs) from DNS traffic, which potentially allows to identify previously unknown AGDs to hinder or disrupt botnets’ communication capabilities. The state-of-the-art approaches require to deploy low-level DNS sensors to access data whose collection poses practical and privacy issues, making their adoption problematic. We propose a mechanism that overcomes the above limitations by analyzing DNS traffic data through a combination of linguistic and IP-based features of suspicious domains. In this way, we are able to identify AGD names, characterize their DGAs and isolate logical groups of domains that represent the respective botnets. Moreover, our system enriches these groups with new, previously unknown AGD names, and produce novel knowledge about the evolving behavior of each tracked botnet. We used our system in real-world settings, to help researchers that requested intelligence on suspicious domains and were able to label them as belonging to the correct botnet automatically. Additionally, we ran an evaluation on 1,153,516 domains, including AGDs from both modern (e.g., Bamital) and traditional (e.g., Conficker, Torpig) botnets. Our approach correctly isolated families of AGDs that belonged to distinct DGAs, and set automatically generated from non-automatically generated domains apart in 94.8 percent of the cases.
💡 Research Summary
The paper introduces Phoenix, a novel framework for detecting, characterizing, and tracking botnets that use domain‑generation algorithms (DGAs) without requiring low‑level DNS sensors or access to client IP addresses. Modern botnets often employ “domain flux” – a technique where each bot runs the same DGA to generate a large, time‑dependent list of candidate domain names, only a few of which are actually registered as command‑and‑control (C&C) servers. Detecting these automatically generated domains (AGDs) is challenging because the domains appear random, and prior work has relied on privileged DNS data (e.g., queries from infected hosts) that raises privacy concerns, is difficult to deploy at scale, and hampers reproducibility.
Phoenix operates on publicly available passive DNS data, which consists of domain‑to‑IP mappings and DNS response codes collected above the recursive resolver level. The system is composed of three main modules:
-
DGA Discovery Module – Takes as input a stream of domains known to be malicious (e.g., from blacklists) and the corresponding passive DNS traffic. It builds a linguistic model of benign (human‑generated) domains using features such as character frequency, n‑gram distribution, vowel‑consonant ratios, and length. Domains that deviate significantly from this model are flagged as AGD candidates. The module also constructs a bipartite graph of domain‑IP relations and clusters domains that share IP addresses, producing initial DGA families.
-
AGD Detection Module – Processes live DNS traffic and, using the models learned in the discovery phase, decides whether a newly observed domain is automatically generated. For each AGD it computes similarity to a set of DGA “fingerprints” – compact representations of the generation algorithm derived from linguistic patterns and IP‑distribution characteristics. The domain is then labeled with the most likely DGA type. This step works on a per‑domain basis, avoiding the need for arbitrary group selection used in earlier approaches.
-
Intelligence and Insights Module – Aggregates the labeling results, tracks the evolution of each DGA family over time, and extracts actionable intelligence such as botnet migrations across autonomous systems, emergence of new C&C domains, and changes in the volume of AGDs. The module can automatically enrich existing blacklists with newly discovered AGDs and provide alerts to analysts.
The authors address three major research gaps: (a) the reliance on low‑level DNS data that is hard to obtain and raises privacy issues; (b) the difficulty of grouping AGDs without biasing the analysis; and (c) the lack of up‑to‑date ground truth for DGAs. By using only passive DNS data, Phoenix preserves privacy, is easy to deploy at large scale, and enables repeatable experiments. Its linguistic features are designed to work on single domains, eliminating the need for random group sampling.
Evaluation was performed on 1,153,516 domains, covering both modern DGAs (e.g., Bamital) and classic families (e.g., Conficker, Torpig). Phoenix achieved a 94.8 % accuracy in distinguishing AGDs from non‑AGDs. It correctly isolated distinct DGA families, and, importantly, it labeled a set of previously unknown AGDs as belonging to Conficker; subsequent manual investigation confirmed the classification. The system also demonstrated resilience to IP‑sharing (NAT) and IP‑reusing (DHCP) scenarios because it does not depend on client IP information.
Key contributions include:
- A privacy‑preserving, low‑overhead method for AGD detection that works with publicly available DNS feeds.
- Automatic generation of DGA fingerprints that allow rapid labeling of new malicious domains without reverse‑engineering the underlying algorithm.
- An intelligence layer that provides longitudinal insights into botnet behavior, facilitating proactive mitigation (e.g., sinkholing, blacklist updates).
In summary, Phoenix advances the state of the art by offering a scalable, repeatable, and privacy‑respectful solution for tracking DGA‑based botnets. Its ability to automatically discover and characterize new DGAs, enrich threat intelligence feeds, and monitor botnet evolution makes it a valuable tool for both academic researchers and operational security teams.
Comments & Academic Discussion
Loading comments...
Leave a Comment