LSPFuzz: Hunting Bugs in Language Servers

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Language Server Protocol (LSP) has revolutionized the integration of code intelligence in modern software development. There are approximately 300 LSP server implementations for various languages and 50 editors offering LSP integration. However, the reliability of LSP servers is a growing concern, as crashes can disable all code intelligence features and significantly impact productivity, while vulnerabilities can put developers at risk even when editing untrusted source code. Despite the widespread adoption of LSP, no existing techniques specifically target LSP server testing. To bridge this gap, we present LSPFuzz, a grey-box hybrid fuzzer for systematic LSP server testing. Our key insight is that effective LSP server testing requires holistic mutation of source code and editor operations, as bugs often manifest from their combinations. To satisfy the sophisticated constraints of LSP and effectively explore the input space, we employ a two-stage mutation pipeline: syntax-aware mutations to source code, followed by context-aware dispatching of editor operations. We evaluated LSPFuzz on four widely used LSP servers. LSPFuzz demonstrated superior performance compared to baseline fuzzers, and uncovered previously unknown bugs in real-world LSP servers. Of the 51 bugs we reported, 42 have been confirmed, 26 have been fixed by developers, and two have been assigned CVE numbers. Our work advances the quality assurance of LSP servers, providing both a practical tool and foundational insights for future research in this domain.

💡 Research Summary

The paper addresses the growing reliability concerns of Language Server Protocol (LSP) implementations, which provide real‑time code intelligence (hover, completion, go‑to‑definition, etc.) across dozens of editors and hundreds of language servers. Crashes or memory corruptions in an LSP server can disable all intelligence features and, in the worst case, expose developers to security risks when editing untrusted code. Despite the widespread adoption of LSP, no prior work has focused on systematic testing of LSP servers.

To fill this gap, the authors introduce LSPFuzz, a grey‑box hybrid mutational fuzzer specifically designed for LSP server testing. The core insight is that bugs often arise from the interaction between the source code being analyzed and the editor operations sent to the server. Consequently, effective testing must mutate both components in a coordinated fashion while respecting the protocol’s combinatorial constraints (e.g., request positions must refer to valid locations in the current document).

Two‑stage mutation pipeline

Source‑code mutation – Using the TREE‑SITTER parsing library, LSPFuzz builds an abstract syntax tree (AST) for a seed program. It then performs random tree‑mutation: a non‑terminal node is selected and replaced by a newly generated subtree of the same non‑terminal type, preserving grammatical validity. This yields a diverse corpus that includes well‑formed programs, partially written fragments, and deliberately malformed snippets, thereby exercising the server’s incremental analysis paths.
Editor‑operation dispatch – After the source is mutated, LSPFuzz extracts location information (tokens, symbols, AST nodes) and automatically generates a sequence of LSP requests such as textDocument/hover, completion, definition, and formatting. The dispatcher is context‑aware: it selects positions that are syntactically meaningful (e.g., the identifier of a function call) and varies the order and combination of operations to maximize coverage of different server code paths.

The fuzzer runs each mutated test case against a target LSP server, monitors runtime behavior, and records any newly covered control‑flow edges. Crashes with unique stack traces are saved as potential bugs. The grey‑box aspect comes from using coverage feedback to guide future mutations, allowing the corpus to evolve toward more interesting regions of the input space.

Evaluation
LSPFuzz was evaluated on four popular LSP servers: clangd (C/C++), Sorbet (Ruby), Pyright (Python), and the TypeScript language server. For each server, ten independent runs were performed. Results show:

An average of 54.1 distinct crashes per server, far exceeding baseline fuzzers (binary or grammar‑based) which found only a handful.
Code‑coverage improvements ranging from 2.45× to 142.9× over the baselines, demonstrating that the two‑stage pipeline reaches deep into server logic.
Ablation studies confirm that removing either stage dramatically reduces bug‑finding effectiveness, highlighting the necessity of both source‑code diversity and editor‑operation diversity.

In total, 51 previously unknown bugs were reported. Of these, 42 have been confirmed by the respective development teams, 26 have been fixed, and 2 have been assigned CVE identifiers (demonstrating security relevance). Notably, some bugs involved memory corruption that could lead to remote code execution when processing malicious files. The clangd team responded by disabling the server in untrusted VS Code workspaces, illustrating immediate practical impact.

Contributions

First systematic study of LSP server quality assurance.
Design and implementation of LSPFuzz (12,293 lines of Rust), featuring a novel two‑stage mutation pipeline.
Empirical evidence that LSPFuzz outperforms existing fuzzers on real‑world LSP servers and that both stages are essential for effectiveness.
Disclosure of 51 bugs, with substantial upstream adoption (fixes, CVEs, policy changes).
Open‑source release of the fuzzer and all experimental data to foster future research.

Overall, the work demonstrates that grey‑box, context‑aware fuzzing can effectively handle the combinatorial and interactive nature of LSP, substantially improving the robustness and security of the ecosystem that underpins modern developer tooling.

LSPFuzz: Hunting Bugs in Language Servers

💡 Research Summary

Comments & Academic Discussion

Leave a Comment