Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code

Reading time: 1 minute
...

📝 Original Info

  • Title: Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code
  • ArXiv ID: 2601.01215
  • Date: 2026-01-03
  • Authors: Prateek Rajput, Yewei Song, Abdoul Aziz Bonkoungou, Iyiola E. Olatunji, Abdoul Kader Kabore, Jacques Klein, Tegawendé F. Bissyandé

📝 Abstract

LLMs can produce functionally correct programs, yet correctness alone does not guarantee reliability. Two programs passing the same tests can exhibit drastically different runtime behavior, creating hidden risks such as performance bottlenecks and memory leaks. Despite this, the runtime consistency of LLM-generated code remains largely unexplored. In this work, we introduce a framework to systematically quantify execution-time memory stability across multiple correct generations for the same task. We propose a novel solution-level metric, DMPD (Dynamic Mean Pairwise Distance), which uses Dynamic Time Warping to compare the shapes of memory usage profiles. These profiles, which we term Monotonic Peak Profiles (MPPs), are transformed to suppress transient noise, enabling robust comparison. By aggregating these scores, we derive a model-level Model Instability Score (MIS). Across the BigOBench and CodeContests benchmarks, we find substantial runtime divergence among correct solutions, revealing that instability often increases with higher sampling temperatures even as pass@1 improves. We also uncover exploratory correlations between our stability met...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut