Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks
We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall testing, paraphrase-level similarity analysis, persuasive jailbreak probing, and unlearning verification, within a unified and extensible framework. Through interactive prompting, response collection, and iterative workflows, our system enables systematic auditing of verbatim memorization and paraphrase-level leakage, supporting responsible deployment and transparent evaluation of LLM copyright risks even with black-box access.
💡 Research Summary
The paper introduces “Copyright Detective,” an interactive forensic platform designed to uncover, analyze, and visualize potential copyright infringements in the outputs of large language models (LLMs). Unlike prior work that treats copyright compliance as a binary classification problem, the authors frame it as an evidence‑discovery process, acknowledging the nuanced legal standards that govern copyright law. The system integrates five forensic modules—Content Recall Detection, Persuasive Jailbreak Detection, Knowledge Memorization Detection, Unlearning Detection, and a Legal Cases Display—into a unified, extensible framework that works primarily with black‑box access but also supports white‑box analysis when model weights are available.
Content Recall Detection probes a model’s ability to reproduce verbatim text. It offers two operational modes: a snippet‑level “text memorization” mode for targeted passages and a document‑level “document memorization” mode that automatically chunks uploaded files and uses a rolling‑window approach. Users can select from predefined zero‑shot or few‑shot prompt templates (or craft custom prompts) and configure inference scaling, i.e., generating many independent samples per prompt to mitigate stochastic output variability. The module then computes a suite of similarity metrics—Jaccard, Levenshtein, and especially ROUGE‑L—alongside token‑match statistics, presenting the results through an interactive visualization panel.
Persuasive Jailbreak Detection addresses the reality that safety‑fine‑tuned LLMs often block direct requests for copyrighted material. The module rewrites such requests using rhetorical persuasion strategies (logos, ethos, pathos, alliance building) and automatically filters out mutations that drift from the original intent via an “Intention Preservation Judge.” Successful mutations are then executed multiple times, and the distribution of ROUGE‑L scores, success rates, and strategy‑wise effectiveness are displayed via box‑plots and histograms. Empirical results show that a “Pathos” style prompt can shift the leakage distribution from a ROUGE‑L of ~0.1 to ~0.7, dramatically increasing the risk of inadvertent disclosure.
Knowledge Memorization Detection moves beyond surface‑level string matching to assess whether factual knowledge from copyrighted works has been internalized. It generates open‑ended questions and multiple‑choice items automatically from the source text, leveraging auxiliary LLMs to produce ground‑truth answers and distractors. Responses are evaluated with an LLM‑based semantic evaluator, reporting Fact Recall F1 for open‑ended queries and accuracy for multiple‑choice items. High scores indicate that the model retains substantive information about the protected work, even if it does not reproduce exact wording.
Unlearning Detection evaluates the effectiveness of post‑training removal (unlearning) of copyrighted content. For black‑box scenarios, the system computes token‑probability based signals such as Min‑K% Prob, normalized perplexity, and z‑lib scores, comparing the likelihood of “hard” tokens in copyrighted passages against a control set of unseen text. For white‑box scenarios, it extracts internal representations from both the original and the unlearned model, quantifying drift using PCA shift (centroid displacement), Centered Kernel Alignment (CKA), Fisher Information Matrix (FIM) similarity, and cosine similarity in PCA space. These metrics collectively indicate how much the model’s internal processing of the target content has changed, rather than providing a binary proof of erasure.
Legal Cases Display curates a searchable repository of landmark copyright cases across music, visual arts, and literature, linking technical findings to real‑world legal outcomes. This module serves both as an educational resource and as contextual support for interpreting forensic evidence in a legal framework.
The entire platform is delivered as a Streamlit web application, with a top navigation bar for mode selection, a central configuration panel for prompt and inference settings, and a comprehensive evidence panel that visualizes matched sentences, similarity scores, and statistical distributions. The source code and a demonstration video are publicly available on GitHub, facilitating reproducibility and community extensions.
In extensive experiments, the authors evaluated roughly 20 copyrighted books across five LLM families (including GPT‑4o‑mini) and over 20 persuasion‑based jailbreak strategies. Key findings include: (1) copyright leakage is highly probabilistic, necessitating large‑scale inference scaling (e.g., 1,000 samples per prompt) to reliably surface rare memorization events; (2) persuasive jailbreaks dramatically increase leakage risk, shifting model behavior from deterministic refusal to probabilistic disclosure; (3) unlearning operations produce measurable representation drift, especially in deeper transformer layers, suggesting that fine‑tuning can mask but not always fully eliminate memorized content.
Overall, Copyright Detective offers a comprehensive, modular, and extensible toolkit for forensic auditing of LLMs with respect to copyright. By combining statistical sampling, similarity analysis, adversarial prompting, and representation‑level diagnostics, it addresses the three practical challenges identified by the authors: output uncertainty, alignment suppression, and cross‑version fragility. The system is positioned to serve AI developers seeking pre‑deployment risk assessments, legal practitioners requiring reproducible evidence, and educators aiming to raise public awareness about the copyright implications of generative AI.
Comments & Academic Discussion
Loading comments...
Leave a Comment