Forensic Analysis of WhatsApp Messenger on Android Smartphones
We present the forensic analysis of the artifacts left on Android devices by \textit{WhatsApp Messenger}, the client of the WhatsApp instant messaging system. We provide a complete description of all the artifacts generated by WhatsApp Messenger, we discuss the decoding and the interpretation of each one of them, and we show how they can be correlated together to infer various types of information that cannot be obtained by considering each one of them in isolation. By using the results discussed in this paper, an analyst will be able to reconstruct the list of contacts and the chronology of the messages that have been exchanged by users. Furthermore, thanks to the correlation of multiple artifacts, (s)he will be able to infer information like when a specific contact has been added, to recover deleted contacts and their time of deletion, to determine which messages have been deleted, when these messages have been exchanged, and the users that exchanged them.
💡 Research Summary
The paper presents a comprehensive forensic examination of the artifacts left on Android smartphones by WhatsApp Messenger, the client application for the widely used WhatsApp instant‑messaging service. The authors aim to go beyond prior work that focused mainly on the chat database (msgstore.db) and to provide a full inventory, decoding, and correlation of every data item generated by the application.
First, the authors review related literature, noting that earlier studies either targeted iOS platforms, examined only a subset of artifacts, or concentrated on encryption algorithm identification. They argue that Android represents the majority of WhatsApp users, and that a complete artifact set—including contacts database, avatar images, log files, media files, backup files, and user‑preference files—offers a richer evidential base.
The methodology relies on the YouWave virtualization platform to emulate Android 4.0.4 devices. WhatsApp version 2.11 is installed on multiple virtual machines, each assigned a role (sender, receiver, group leader, etc.) in controlled experiments covering one‑to‑one chats, group chats, multimedia exchanges, and location‑based messages. After each scenario the virtual machine is suspended, its virtual disk (VDI) is mounted with FTK Imager, and the internal files are extracted. SQLiteMan is used to parse the SQLite databases, while Notepad++ handles textual logs. This approach eliminates the risk of contaminating physical memory, ensures repeatability across different hardware, and reduces costs. The authors validate the emulated environment by comparing extracted artifacts with those reported in prior real‑device studies, confirming identical behavior.
The artifact inventory (Table 1 in the paper) lists nine categories:
- contacts database (wa.db) – SQLite, stores WhatsApp IDs (jid), profile names, status strings, avatar timestamps, and a flag indicating whether the entry corresponds to a real WhatsApp user.
- chat database (msgstore.db) – SQLite, contains every sent/received message, timestamps, delivery status, media type, and remote JID.
- encrypted backups of the chat database (msgstore.db.crypt) – stored on the SD card, requiring extraction of the 256‑bit key for AES‑CBC decryption.
- avatar image files – JPEGs named after the contact’s JID, located in internal app storage and on the external SD card.
- log files (whatsapp.log) – plain‑text logs of application events, including message deletions and connection attempts.
- received media files – stored under /WhatsApp/Media on the external storage.
- sent media files – stored under /WhatsApp/Media/Sent.
- user settings and preferences – various XML/INI‑style files in the app’s private directory.
- additional miscellaneous files (e.g., temporary caches).
Contact Analysis
The contacts database’s wa_contacts table is dissected in detail. Each row contains the JID (phone number@s.whatsapp.net), a Boolean is_whatsapp_user flag, thumb_ts (time when the user set a new avatar), photo_id_timestamp (time the avatar was downloaded locally), and profile fields (wa_name, status). By correlating thumb_ts with the file‑system timestamps of the corresponding avatar JPEG, investigators can determine precisely when a user changed their picture. The rowid and the SQLite‑generated sequence number provide a chronological ordering of contact insertions, enabling reconstruction of the moment a contact was added to the phone. Deleted contacts disappear from wa.db, but remnants may be recovered from log entries or from older backup files, allowing analysts to infer deletion times.
Message Analysis
The msgstore.db schema includes key_from_me (direction flag), key_remote_jid (recipient or group identifier), data (message body), timestamp (Unix epoch in milliseconds), status (0 = sent, 1 = delivered, 2 = read), and media_wa_type (0 = text, 1 = image, 2 = audio, 3 = video, etc.). By examining the status field together with timestamps, the paper demonstrates how to verify whether a message was successfully delivered or merely sent. Media messages are linked to actual files on the external storage; the presence or absence of those files, combined with the database entry, indicates whether the media was later deleted by the user.
The authors also describe how to detect message deletions that are not reflected in the database. The log file records “msg deleted” events with timestamps, and the encrypted backup (msgstore.db.crypt) can be decrypted using the key stored in the preferences file. Comparing the decrypted backup with the current msgstore.db reveals messages that existed in a prior backup but are missing now, implying deletion.
Group Chat Reconstruction
Group chats are identified by a remote JID ending with “@g.us”. The participants table (or the “group participants” field within msgstore.db) lists all members at the time of each message. By tracking additions and removals of JIDs in successive messages, the analyst can build a timeline of group membership changes, including when a user joined or left a group.
Media and Avatar Correlation
Media files are named using a combination of timestamps and hash values. The paper shows how to match a media entry in msgstore.db with its corresponding file on the SD card, verifying file integrity via SHA‑256 hashes. Avatar images are stored both internally and on the external storage; their timestamps (thumb_ts and file creation time) are cross‑checked to confirm the exact moment a user updated their picture, which can be crucial for linking a WhatsApp account to a real‑world identity.
Settings and Preferences
The preferences directory contains XML files that record the last successful login, auto‑download settings for images, videos, and documents, and the schedule of automatic backups. By parsing these files, investigators can infer the user’s typical activity windows, whether they had enabled automatic media download (affecting the presence of media files), and when the last backup was performed—information that helps estimate the freshness of the collected evidence.
Conclusions and Future Work
The study demonstrates that isolated analysis of a single artifact yields only partial insight, whereas correlating contacts, messages, logs, media, and preferences provides a holistic reconstruction of user activity. The authors emphasize that their virtual‑machine‑based acquisition method preserves data integrity and is reproducible across different research groups. Future directions include extending the methodology to newer WhatsApp versions that employ end‑to‑end encryption, investigating cloud‑based backups (e.g., Google Drive), and automating the correlation process through a dedicated forensic toolkit.
Overall, the paper offers a detailed, step‑by‑step guide for forensic practitioners to extract, decode, and interrelate all WhatsApp‑related artifacts on Android, enabling the reconstruction of contact lists, message timelines, group dynamics, media exchanges, and user settings with a high degree of confidence.
Comments & Academic Discussion
Loading comments...
Leave a Comment