CovAgent: Overcoming the 30% Curse of Mobile Application Coverage with Agentic AI and Dynamic Instrumentation

CovAgent: Overcoming the 30% Curse of Mobile Application Coverage with Agentic AI and Dynamic Instrumentation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Automated GUI testing is crucial for ensuring the quality and reliability of Android apps. However, the efficacy of existing UI testing techniques is often limited, especially in terms of coverage. Recent studies, including the state-of-the-art, struggle to achieve more than 30% activity coverage in real-world apps. This limited coverage can be attributed to a combination of factors such as failing to generate complex user inputs, unsatisfied activation conditions regarding device configurations and external resources, and hard-to-reach code paths that are not easily accessible through the GUI. To overcome these limitations, we propose CovAgent, a novel agentic AI-powered approach to enhance Android app UI testing. Our fuzzer-agnostic framework comprises an AI agent that inspects the app’s decompiled Smali code and component transition graph, and reasons about unsatisfied activation conditions within the app code logic that prevent access to the activities that are unreachable by standard and widely adopted GUI fuzzers. Then, another agent generates dynamic instrumentation scripts that satisfy activation conditions required for successful transitions to those activities. We found that augmenting existing fuzzing approaches with our framework achieves a significant improvement in test coverage over the state-of-the-art, LLMDroid, and other baselines such as Fastbot and APE (e.g., 101.1%, 116.3% and 179.7% higher activity coverage, respectively). CovAgent also outperforms all the baselines in other metrics such as class, method, and line coverage. We also conduct investigations into components within CovAgent to reveal further insights regarding the efficacy of Agentic AI in the field of automated app testing such as the agentic activation condition inference accuracy, and agentic activity-launching success rate.


💡 Research Summary

The paper “CovAgent: Overcoming the 30% Curse of Mobile Application Coverage with Agentic AI and Dynamic Instrumentation” tackles a persistent problem in automated Android GUI testing: most state‑of‑the‑art tools plateau at roughly 30 % activity coverage on real‑world apps. The authors attribute this ceiling to nine concrete causes identified by prior work, including guard conditions that depend on external servers, hardware states (e.g., SD‑card presence), environment variables, and complex data dependencies that are invisible to pure GUI fuzzers.

CovAgent is a fuzzer‑agnostic augmentation framework that introduces two cooperating LLM‑driven agents. The first agent performs static analysis: it decompiles the APK, extracts the component transition graph (CTG), and parses Smali code. Using the Model‑Context Protocol (MCP) to interface the LLM with analysis tools, the agent runs a chain‑of‑thought (CoT) prompting routine that forward‑ and backward‑analyzes each unreachable activity. The output is a structured description of the activation conditions (guard predicates, required API calls, expected return values, hardware states, etc.) that must be satisfied before the activity can be launched.

The second agent consumes these activation conditions and automatically generates Frida instrumentation scripts. Frida hooks the target app’s process in memory, allowing the agent to modify method implementations, inject Intent‑Bundle payloads, or spoof device‑state queries (e.g., make the app believe an SD card is inserted). Scripts are validated in an Android emulator; any runtime exception is fed back to the LLM for iterative refinement, improving robustness and reducing false positives.

The evaluation addresses four research questions (RQs). RQ1 asks whether CovAgent can break the 30 % ceiling. When paired with the APE fuzzer, CovAgent‑APE achieves an average activity coverage of 49.5 % versus 17.7 % for APE alone—a 2.8× improvement. RQ2 compares CovAgent‑Fastbot against the LLM‑enhanced baseline LLMDroid‑Fastbot; the former reaches 34.6 % activity coverage, more than double the 17.2 % of the latter (101 % relative gain). RQ3 measures activity‑launching success against Scenedroid, the current best intent‑based launcher; CovAgent’s dynamic instrumentation attains a 54.8 % success rate, roughly three times higher than Scenedroid’s 15.8 %. RQ4 evaluates the accuracy of activation‑condition inference by comparing against the manually curated reasons from Akinotcho et al.; CovAgent’s recall far exceeds a random baseline, demonstrating that an LLM can reliably discover the same guard conditions that human analysts previously identified.

Key contributions are: (1) a novel hybrid static‑dynamic, agentic AI framework that automatically discovers and satisfies hidden activation constraints; (2) extensive empirical evidence that this approach substantially raises activity, class, method, and line coverage over both traditional fuzzers and recent LLM‑augmented tools; (3) an open‑source implementation, datasets, and reproducibility package.

The paper also acknowledges limitations. LLM inference can produce incorrect or incomplete conditions, leading to malformed instrumentation scripts. Frida hooking may be unstable in heavily multithreaded or obfuscated apps, and the current static analysis operates at the Smali level, which may struggle with future Android runtime changes or aggressive code obfuscation. Future work is outlined: improving prompt engineering and multi‑agent coordination for higher inference fidelity, extending dynamic instrumentation to handle asynchronous and multi‑threaded contexts, and scaling the evaluation to real devices and cloud‑based testing farms.

Overall, CovAgent demonstrates that integrating agentic AI with dynamic instrumentation can effectively “break the ceiling” that has limited Android GUI testing for years, opening a promising research direction for AI‑driven software testing and analysis.


Comments & Academic Discussion

Loading comments...

Leave a Comment