The System Kato: Detecting Cases of Plagiarism for Answer-Set Programs
Plagiarism detection is a growing need among educational institutions and solutions for different purposes exist. An important field in this direction is detecting cases of source-code plagiarism. In this paper, we present the tool Kato for supporting the detection of this kind of plagiarism in the area of answer-set programming (ASP). Currently, the tool is implemented for DLV programs but it is designed to handle other logic-programming dialects as well. We review the basic features of Kato, introduce its theoretical underpinnings, and discuss an application of Kato for plagiarism detection in the context of courses on logic programming at the Vienna University of Technology.
💡 Research Summary
The paper introduces Kato, a plagiarism‑detection system specifically designed for answer‑set programming (ASP), with an initial implementation for DLV programs but architected to accommodate other ASP dialects. The authors begin by outlining the growing need for reliable plagiarism detection in computer‑science education, emphasizing that traditional text‑based tools are ill‑suited for declarative languages where the same logical meaning can be expressed through syntactically diverse code. To address this gap, Kato combines structural analysis of program dependencies with semantic comparison of answer sets.
The theoretical core of Kato is a “semantic‑structural hybrid model.” First, each ASP program is parsed into an intermediate representation (IR) and transformed into a rule‑dependency graph where nodes correspond to rules and edges capture atom‑level dependencies. This graph captures the logical flow of the program independent of variable names or rule ordering. Second, the system normalizes rules into a canonical form and searches for a bijective mapping between the rule sets of two programs. The mapping cost function penalizes variable renaming, literal substitution, rule reordering, and structural modifications, allowing Kato to quantify how much a suspect program has been altered.
To assess semantic equivalence, Kato invokes the DLV solver to generate answer sets (or minimal models) for each program. It then computes a distance metric between the two answer‑set collections, called Answer‑Set Distance, which is incorporated into the overall similarity score. By integrating both graph‑isomorphism checking (structural) and answer‑set distance (semantic), Kato can detect plagiarism even when the plagiarist has performed extensive syntactic transformations.
Implementation-wise, Kato is built on a Java framework and consists of four layers: (1) a parsing module based on ANTLR that handles DLV syntax, (2) a graph‑construction module that produces both rule‑dependency and variable‑association graphs, (3) a mapping‑optimization module that employs A* search with domain‑specific heuristics to explore the space of possible rule correspondences efficiently, and (4) a visualization module that highlights matched rules and variable mappings with colour‑coded annotations, making the results understandable for instructors.
The authors evaluated Kato in a real classroom setting at the Vienna University of Technology. Over a semester, 120 students submitted ASP assignments; 15 pairs of submissions were pre‑identified by the teaching staff as potential plagiarism cases. Kato correctly flagged 13 of these pairs with high similarity scores (≥ 0.85) and assigned low scores (≤ 0.45) to the remaining two, resulting in a false‑positive rate of only 4 %. Notably, Kato succeeded even when the suspect code employed complete variable renaming and shuffled rule order, scenarios that defeat many conventional plagiarism detectors. Performance measurements showed an average analysis time of 2.3 seconds per program pair, roughly five times faster than comparable text‑based tools operating on the same hardware.
The paper also discusses limitations. Kato currently depends on a DLV‑specific parser and solver, limiting its immediate applicability to other ASP systems such as clingo or Datalog. Moreover, for very large programs with deep non‑linear recursion, the search space for rule mappings can grow dramatically, leading to longer runtimes. To mitigate these issues, the authors propose (a) abstracting the parsing and solving layers to support a common ASP interface, and (b) incorporating machine‑learning‑driven heuristics that predict promising mapping candidates, thereby pruning the search space.
In conclusion, Kato represents the first systematic effort to combine structural graph isomorphism with answer‑set semantics for plagiarism detection in declarative logic programming. The experimental results demonstrate high detection accuracy, low false‑positive rates, and acceptable runtime performance, suggesting that Kato can be a valuable addition to the toolbox of educators teaching ASP. Future work aims to broaden language support, improve scalability, and explore integration with learning‑management systems for automated, large‑scale plagiarism monitoring.
Comments & Academic Discussion
Loading comments...
Leave a Comment