A Case of Collusion: A Study of the Interface Between Ad Libraries and their Apps
A growing concern with advertisement libraries on Android is their ability to exfiltrate personal information from their host applications. While previous work has looked at the libraries’ abilities to measure private information on their own, advertising libraries also include APIs through which a host application can deliberately leak private information about the user. This study considers a corpus of 114,000 apps. We reconstruct the APIs for 103 ad libraries used in the corpus, and study how the privacy leaking APIs from the top 20 ad libraries are used by the applications. Notably, we have found that app popularity correlates with privacy leakage; the marginal increase in advertising revenue, multiplied over a larger user base, seems to incentivize these app vendors to violate their users’ privacy.
💡 Research Summary
The paper investigates how Android advertising libraries can act as conduits for the deliberate transmission of personal user data from host applications. Using a corpus of 114,000 free apps downloaded from Google Play in early 2013, the authors first identified 103 distinct ad and analytics libraries by manually locating their package names. All apps were then disassembled with the Dedexer Dalvik disassembler, and every method call from an app to any of its ad libraries was extracted. By aggregating these calls the researchers reconstructed the “working API” for each library, capturing method names, parameter types, and call frequencies across all versions of the libraries present in the dataset.
From the 103 libraries, the top 20—accounting for 84 % of total installs—were selected for deeper analysis. The authors manually inspected each API call, using method signatures and publicly available documentation to decide whether a call could constitute a privacy risk. Calls that passed demographic or personally identifying information (e.g., age, gender, location, postal code, income) were flagged, as were generic key‑value map calls that could carry arbitrary user data. The resulting privacy‑related categories included Arbitrary Data, Keywords, Gender, Location, Age, Multiple Factors, Postal Code, Enable Location, and Income.
The study quantified how often apps invoke these privacy‑related APIs. Overall, 11 of the top 20 libraries expose at least one such call. The most common categories were “Keywords” (2.5 % of apps), “Gender” (2.03 %), and “Location” (1.64 %). Arbitrary Data calls appeared in 3.06 % of apps, reflecting the ability of developers to send any data to the ad network. By counting the number of calls per app and grouping apps by install count, the authors discovered a clear positive correlation between app popularity and privacy‑leaking behavior: apps with over one million installs make on average 0.34 privacy‑related calls, whereas low‑install apps (<10 K) make only about 0.07 calls. This suggests that the potential for higher advertising revenue incentivizes developers of popular apps to share more user data with ad networks.
The paper also examines non‑code interfaces. Some libraries allow developers to embed demographic parameters directly in static XML layout files; for example, the Jumptap library permits age, gender, income, and postal code to be specified in the layout. Although static, these values can still be transmitted to the ad server without runtime user interaction. Additionally, the “Enable Location” API does not pass location data directly but grants the library permission to collect it itself, effectively turning a library feature on or off.
Methodological limitations are acknowledged. Certain libraries employ package‑name obfuscation (e.g., AirPush) or method‑name mangling, which can hide API calls; the authors estimate that roughly 5 % of calls may be missed due to such techniques. Moreover, the analysis only captures API calls; other communication channels such as shared memory, class‑field manipulation, or callbacks from the library to the app are not examined, meaning the reported figures represent a lower bound on privacy leakage.
In conclusion, the research demonstrates that ad libraries and host apps frequently collaborate to transmit user‑provided personal data that the libraries could not obtain via Android permissions alone. This “collusion” expands the data profile that ad networks can build, potentially enabling de‑identification when combined with device identifiers and system‑level data. The findings highlight a gap in Android’s permission model, which focuses on system‑level access but does not regulate what data developers voluntarily pass to third‑party SDKs. The authors suggest that platform operators, app‑store reviewers, and policymakers should consider mechanisms to audit and restrict such API‑based data flows, and that developers should be made aware of the privacy implications of using ad SDKs that expose privacy‑sensitive APIs.
Comments & Academic Discussion
Loading comments...
Leave a Comment