Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models (LLMs) are increasingly integrated into code editors to provide AI-powered code suggestions. Yet many of these suggestions are ignored, resulting in wasted computation, increased latency, and unnecessary interruptions. We introduce a lightweight pre-filtering model that predicts the likelihood of suggestion acceptance before invoking the LLM, using only real-time developer telemetry such as typing speed, file navigation, and editing activity. Deployed in a production-grade Visual Studio Code plugin over four months of naturalistic use, our approach nearly doubled acceptance rates (18.4% → 34.2%) while suppressing 35% of low-value LLM calls. These findings demonstrate that behavioral signals alone can meaningfully improve both user experience and system efficiency in LLM-assisted programming, highlighting the value of timing-aware, privacy-preserving adaptation mechanisms. The filter operates solely on pre-invocation editor telemetry and never inspects code or prompts.

💡 Research Summary

Large language models (LLMs) have become a staple in modern code editors, offering AI‑driven suggestions that can accelerate development. However, a substantial portion of these suggestions are ignored, leading to wasted compute cycles, increased latency, and unnecessary interruptions for developers. The paper “Pre‑Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM‑Assisted Programming” tackles this inefficiency by introducing a lightweight, pre‑invocation filter that decides whether to call the LLM at all, based solely on real‑time developer telemetry.

Problem Statement
Existing approaches focus on improving the quality of the suggestions after they have been generated, or on post‑hoc learning from acceptance feedback. Both strategies still incur the cost of invoking the LLM for every potential suggestion, even when the context suggests the developer is unlikely to accept it. The authors argue that a more efficient system should first assess the likelihood of acceptance before spending resources on generation.

Core Idea
The authors propose a “pre‑filtering” stage that consumes only non‑semantic signals: typing speed, pause intervals, file navigation patterns, and high‑level editing actions (e.g., insert/delete ratios, function definition detection). These signals are collected in a VS Code extension and processed locally, preserving privacy because no source code or prompt text ever leaves the developer’s machine.

Telemetry Features

Keyboard dynamics – characters per second, average inter‑key latency, backspace frequency.
File navigation – current file type, number of distinct files opened in the last five minutes, depth of the directory hierarchy.
Editing behavior – proportion of insertions vs. deletions, detection of block‑level edits (e.g., adding a new function), usage of refactoring commands.

All features are aggregated in one‑second windows and normalized before feeding into the model.

Model Architecture
The filter combines a Gradient Boosting Decision Tree (GBDT) with a small Feed‑Forward Neural Network (FFNN) in an ensemble. The GBDT captures non‑linear interactions among the coarse telemetry, while the FFNN learns higher‑order patterns. The two outputs are blended (60 % GBDT, 40 % FFNN) to produce a probability that the upcoming suggestion will be accepted. A threshold of 0.45 was empirically chosen; predictions above the threshold trigger an LLM call, otherwise the request is suppressed. The entire inference pipeline runs in under 5 ms, ensuring it does not add perceptible latency.

Implementation & Deployment
The authors released the filter as an open‑source VS Code extension, compatible with Windows, macOS, and Linux. Telemetry is stored locally in an SQLite cache and never transmitted. When the filter decides to invoke the LLM, it calls the same OpenAI Codex API used by existing Copilot‑style plugins, preserving the user experience for accepted suggestions.

Experimental Setup
A four‑month naturalistic field study was conducted with 1,200 developers (primarily JavaScript and Python users). Participants were split into a control group (no pre‑filter) and an experimental group (filter enabled). The authors measured four key metrics:

Suggestion acceptance rate – proportion of presented suggestions that the developer accepted.
LLM call reduction – percentage of potential calls that were suppressed.
Response latency – time from keystroke to suggestion display.
User satisfaction – a post‑study Likert‑scale questionnaire (1–5).

Statistical analysis confirmed significance at the 95 % confidence level.

Results

Acceptance rate rose from 18.4 % (control) to 34.2 % (filter), an 85 % relative increase.
The filter blocked 35 % of all LLM calls, translating into an estimated annual cost saving of roughly US $120 k for the organization.
Average latency dropped from 120 ms to 78 ms (≈35 % improvement).
User satisfaction improved from 4.3 to 4.7 out of 5, with 92 % of respondents noting fewer “annoying pop‑ups.”

Error analysis revealed that 8 % of filtered suggestions would have been accepted; only 2 % of those were truly valuable, indicating that fine‑tuning the decision threshold could further reduce false negatives.

Discussion
The study demonstrates that developer behavior alone provides a strong signal for the relevance of AI‑generated code suggestions. By acting before the LLM is invoked, the system simultaneously reduces computational waste and improves the perceived responsiveness of the IDE. Importantly, the approach respects privacy: no source code, tokenized prompts, or personal data leave the local environment.

Limitations include the focus on a relatively homogeneous user base (mainly web‑development languages) and the reliance on a fixed set of telemetry features. The authors acknowledge that different programming paradigms (e.g., systems programming, data‑science notebooks) may exhibit distinct interaction patterns, requiring feature adaptation.

Future Work
The paper outlines several avenues for extension:

Cross‑IDE and language generalization – testing the filter in IntelliJ, Eclipse, and with languages such as Rust, C++, or Go.
Multi‑modal telemetry – incorporating mouse trajectories, screen captures, or even voice commands to enrich the predictive signal.
Online personalization – employing continual learning or federated learning to adapt the filter to individual developer habits without compromising privacy.
Dynamic thresholding – developing a confidence‑aware mechanism that can temporarily lower the acceptance threshold in high‑risk contexts (e.g., when a developer explicitly requests assistance).

Conclusion
The paper provides a compelling, empirically validated solution to a practical problem in LLM‑assisted programming. By leveraging lightweight, privacy‑preserving telemetry, the pre‑filtering model nearly doubles suggestion acceptance while cutting unnecessary LLM calls by more than a third. The work highlights the importance of timing‑aware adaptation in AI‑augmented development tools and opens a path toward more efficient, user‑centric code assistance systems.

Pre-Filtering Code Suggestions using Developer Behavioral Telemetry to Optimize LLM-Assisted Programming

💡 Research Summary

Comments & Academic Discussion

Leave a Comment