An In-depth Analysis of Spam and Spammers

An In-depth Analysis of Spam and Spammers
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Electronic mail services have become an important source of communication for millions of people all over the world. Due to this tremendous growth, there has been a significant increase in spam traffic. Spam messes up user’s inbox, consumes network resources and spread worms and viruses. In this paper we study the characteristics of spam and the technology used by spammers. In order to counter anti spam technology, spammers change their mode of operation, therefore continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing anti spam technology and thereby help us to combat spam effectively. In order to characterize spam, we collected four hundred thousand spam mails from a corporate mail server for a period of 14 months from January 2006 to February 2007. For analysis we classified spam based on attachment and contents. We observed that spammers use software tools to send spam with attachment. The main features of this software are hiding sender’s identity, randomly selecting text messages, identifying open relay machines, mass mailing capability and defining spamming duration. Spammers do not use spam software to send spam without attachment. From our study we observed that, four years old heavy users email accounts attract more spam than four years old light users mail accounts. Relatively new email accounts which are 14 months old do not receive spam. But in some special cases like DDoS attacks, we found that new email accounts receive spam and 14 months old heavy users email accounts have attracted more spam than 14 months old light users. We believe that this analysis could be useful to develop more efficient anti spam techniques.


💡 Research Summary

The paper presents a comprehensive study of spam characteristics and spammers’ technologies based on a large‑scale data set collected from a corporate mail server over a fourteen‑month period (January 2006 – February 2007). A dedicated spam trap captured more than 400,000 spam messages, which were separated from legitimate traffic using a suite of conventional anti‑spam mechanisms: Bayesian filtering, DNS‑based blacklists (DNSBL, SURBL), Sender Policy Framework (SPF), greylisting, reverse DNS checks, and a content filter trained on a 50,000‑word dictionary.

The authors first classify spam into two broad categories: (1) spam with attachments and (2) spam without attachments. This binary split reveals that the two groups are generated by distinct infrastructures and exhibit different traffic patterns. Spam with attachments accounts for roughly half of the total volume and is further divided into four sub‑categories: (a) image (GIF/JPG) plus plain text, (b) image plus text plus URL, (c) image plus URL only, and (d) executable files (.exe). Image attachments range from 5 KB to 45 KB, while executable attachments vary between 35 KB and 180 KB. The executable‑laden spam often carries viruses, worms, or trojans and is used to launch distributed denial‑of‑service (DDoS) attacks by turning infected hosts into mail‑bomb generators.

Spam without attachments is split into pure text messages and text with a clickable URL/link. These messages are typically 2–3 KB in size. Pure‑text spam is largely scam‑oriented, originates from fabricated or non‑existent sender addresses, and often targets users in specific regions (e.g., African senders using Japanese domains). Text‑plus‑URL spam, which includes pharmacy, financial, and advertising content, appears at roughly twice the frequency of pure‑text spam.

A key contribution of the study is the analysis of the software tools employed by spammers to generate attachment‑based spam. The authors identify five core capabilities of these tools: (i) hiding the sender’s identity through IP spoofing and open‑relay discovery, (ii) randomizing body text and URLs to evade content filters, (iii) probing for open relays to use as forwarding nodes, (iv) mass‑mailing capacity via parallel connections, and (v) defining a spamming duration to sustain campaigns. The paper notes that these sophisticated tools are primarily used for attachment‑laden spam, whereas attachment‑free spam is often dispatched via simpler scripts or manual methods.

The investigation also examines how user account characteristics influence spam reception. Accounts that have been active for four years and exhibit high sending/receiving volume (“heavy users”) receive on average two to three times more spam than “light users” of the same age. Conversely, relatively new accounts (14 months old) receive negligible spam under normal conditions, but during DDoS incidents they can become targets, and heavy‑user accounts of the same age still attract significantly more spam than their light‑user counterparts. This suggests that spammers prioritize long‑standing, high‑activity addresses—likely because such addresses have been harvested over time and are known to be responsive.

Statistical analysis of daily traffic shows that spam volume is independent of legitimate mail flow. Over the two‑week observation window, legitimate mail averaged 906 messages per day (range 720–7,253), while spam averaged 4,736 messages per day (range 1,701–8,615). Spam containing viruses as attachments averaged 403 per day (range 209–541). Separate daily averages for attachment‑based spam (1,872) and attachment‑free spam (2,722) further illustrate distinct generation sources.

Based on these findings, the authors argue that anti‑spam defenses must evolve beyond static blacklists. They recommend a multi‑layered approach: (1) OCR‑based analysis of image‑based spam to extract embedded text, (2) sandbox execution of executable attachments to detect malicious behavior before delivery, (3) stricter authentication and rate‑limiting for accounts identified as heavy users, and (4) continuous monitoring of spammers’ toolkits to quickly adapt to new evasion techniques. The paper concludes that a dynamic, content‑aware, and user‑profile‑aware defense strategy is essential for effectively mitigating the evolving spam threat landscape.


Comments & Academic Discussion

Loading comments...

Leave a Comment