Programming Not Only by Example
In recent years, there has been tremendous progress in automated synthesis techniques that are able to automatically generate code based on some intent expressed by the programmer. A major challenge for the adoption of synthesis remains in having the programmer communicate their intent. When the expressed intent is coarse-grained (for example, restriction on the expected type of an expression), the synthesizer often produces a long list of results for the programmer to choose from, shifting the heavy-lifting to the user. An alternative approach, successfully used in end-user synthesis is programming by example (PBE), where the user leverages examples to interactively and iteratively refine the intent. However, using only examples is not expressive enough for programmers, who can observe the generated program and refine the intent by directly relating to parts of the generated program. We present a novel approach to interacting with a synthesizer using a granular interaction model. Our approach employs a rich interaction model where (i) the synthesizer decorates a candidate program with debug information that assists in understanding the program and identifying good or bad parts, and (ii) the user is allowed to provide feedback not only on the expected output of a program, but also on the underlying program itself. That is, when the user identifies a program as (partially) correct or incorrect, they can also explicitly indicate the good or bad parts, to allow the synthesizer to accept or discard parts of the program instead of discarding the program as a whole. We show the value of our approach in a controlled user study. Our study shows that participants have strong preference to using granular feedback instead of examples, and are able to provide granular feedback much faster.
💡 Research Summary
The paper “Programming Not Only by Example” tackles a central obstacle in program synthesis: how programmers convey their intent to the synthesizer. Traditional approaches either rely on coarse‑grained specifications such as type constraints, which generate many candidate programs that the user must manually sift through, or on Programming‑by‑Example (PBE), where the user iteratively supplies input‑output pairs. While PBE works well for end‑users, it is insufficient for programmers because structural constraints—e.g., “do not use min”, “always include this prefix”—cannot be expressed directly through examples. Consequently, users must infer these constraints implicitly, leading to extra cognitive load and often requiring many discriminating examples.
To address this, the authors introduce a Granular Interaction Model (GIM). GIM operates in two directions. First, the synthesizer augments each candidate program with debug information: after every operation it displays the intermediate value on the current examples. This makes the program’s behavior transparent and helps users pinpoint exactly which part is correct or incorrect. Second, users can give feedback not only by adding more examples but also by marking specific syntactic fragments of the candidate program as “keep”, “discard”, “prefix”, or “suffix”. For instance, a user can issue Discard(takeRight(2)) to forbid any future candidate from containing that operation, or Prefix(zip(input.tail)) to force all future candidates to start with that fragment. These granular predicates prune the search space dramatically—often by a factor equal to the size of the operation vocabulary—while preserving useful sub‑structures that the synthesizer can reuse.
The paper also provides a theoretical result showing that there exist intent specifications that cannot be captured by any finite set of input‑output examples. In other words, certain structural requirements are fundamentally inexpressible in the PBE model, which justifies the need for explicit syntactic feedback.
A prototype synthesizer was built for Scala, targeting functional compositions such as zip, map, groupBy, and maxBy. The system presents candidates with inline comments showing intermediate results, and a UI that lets users tick boxes to keep or discard particular fragments.
The authors evaluated GIM through a controlled user study with 32 developers from academia and industry. Participants interacted with three modes: pure PBE, pure GIM, and a hybrid of both. Results show that GIM reduces average interaction time by about 35 % compared to PBE (29 s vs. 45 s) and that 87 % of participants found GIM more intuitive. Qualitative feedback highlighted frustration when trying to eliminate unwanted operations using only examples (e.g., the min operator). However, when using GIM alone, some participants needed additional examples to fully verify the final program, suggesting that a hybrid approach is optimal.
The discussion acknowledges limitations: the current implementation assumes a relatively small, functional‑style operation set and may not directly apply to stateful or object‑oriented APIs. Moreover, overly fine‑grained feedback could increase cognitive load, so future work might explore automated suggestions for likely “keep” or “discard” operations.
In conclusion, the paper demonstrates that allowing programmers to give granular, code‑level feedback—combined with rich debug information—significantly improves the efficiency and usability of program synthesis. The authors argue that future synthesis tools should integrate both example‑based and granular interaction mechanisms to support a broader range of programming tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment