FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning
The growing collaboration between humans and AI models in generative tasks has introduced new challenges in distinguishing between human-written, LLM-generated, and human-LLM collaborative texts. In this work, we collect a multilingual, multi-domain, multi-generator dataset FAIDSet. We further introduce a fine-grained detection framework FAID to classify text into these three categories, and also to identify the underlying LLM family of the generator. Unlike existing binary classifiers, FAID is built to capture both authorship and model-specific characteristics. Our method combines multi-level contrastive learning with multi-task auxiliary classification to learn subtle stylistic cues. By modeling LLM families as distinct stylistic entities, we incorporate an adaptation to address distributional shifts without retraining for unseen data. Our experimental results demonstrate that FAID outperforms several baselines, particularly enhancing the generalization accuracy on unseen domains and new LLMs, thus offering a potential solution for improving transparency and accountability in AI-assisted writing. Our data and code are available at https://github.com/mbzuai-nlp/FAID
💡 Research Summary
The paper addresses the emerging need to attribute authorship in a world where large language models (LLMs) are increasingly used as co‑authors, editors, or even primary writers. Existing detection work largely treats the problem as a binary classification—human versus AI—and is typically limited to English. The authors propose a fine‑grained detection framework called FAID that simultaneously (1) classifies a text into three categories—human‑written, fully LLM‑generated, and human‑LLM collaborative—and (2) identifies the specific LLM family (e.g., GPT‑4, Gemini, Llama‑3, DeepSeek) that produced the AI component.
To enable this, the authors construct a new multilingual, multi‑domain, multi‑generator dataset named FAIDSet. It contains 83,350 examples drawn from two languages (English and Vietnamese) and two academic domains (student theses and paper abstracts). For each example, they generate fully LLM‑produced texts, human‑written texts, and three styles of human‑LLM collaboration (polishing, continuation, paraphrasing) using four modern LLM families. Prompt diversity and manual quality control ensure that the generated texts are fluent, coherent, and factually plausible.
FAID’s architecture builds on a multilingual pretrained encoder (XLM‑RoBERTa). The core learning objective combines multi‑level contrastive learning with multi‑task auxiliary classification. In the contrastive component, texts are organized into five hierarchical distributions (P₁–P₅) representing (i) a specific LLM family, (ii) any LLM, (iii) collaborative texts tied to that family, (iv) collaborative texts from any LLM, and (v) pure human texts. The model is trained to maximize cosine similarity for pairs within the same distribution while minimizing similarity across higher‑level distributions, thereby structuring the embedding space so that stylistic proximity mirrors authorship proximity.
In parallel, two classification heads are attached: a three‑class head for source type and a multi‑class head for LLM family. Cross‑entropy losses from these heads are summed with the contrastive loss, forcing the encoder to capture both fine‑grained stylistic cues (via contrastive learning) and coarse‑grained label information (via auxiliary tasks).
Evaluation is performed under both in‑domain (same language, domain, and LLM family) and out‑of‑domain settings (new language, unseen LLM family, or new domain). Baselines include recent binary detectors (SeqXGPT, DeTective), multi‑task detectors (LLM‑DetectAIve), and domain‑adaptation methods (OUTFOX). FAID consistently outperforms them, achieving an overall accuracy of 92.3% on the three‑class task—5–9 percentage points higher than the strongest baseline. LLM‑family identification reaches 88.7% accuracy, a substantial gain over prior work (~70%). Notably, on the Vietnamese test set, FAID improves F1 from ~62% (baseline) to >81%, and on unseen GPT‑4o data it retains >84% F1, demonstrating robust generalization.
A practical advantage is the embedding‑based retrieval mechanism: once the encoder is trained, new texts can be embedded and compared against a stored database without retraining, enabling rapid adaptation to emerging models or domains.
The authors acknowledge limitations: the dataset is generated under controlled prompts, which may not capture the full variability of “in‑the‑wild” LLM outputs, and the collaborative label collapses diverse interaction patterns into a single class, potentially obscuring finer distinctions. Future work is suggested to collect natural collaboration logs and to introduce finer‑grained collaborative categories.
In summary, FAID introduces a novel combination of multi‑level contrastive learning and multi‑task auxiliary classification to model LLM families as distinct “authors.” By doing so, it delivers state‑of‑the‑art performance on fine‑grained authorship detection across languages, domains, and unseen generators, offering a scalable tool for transparency, academic integrity, and responsible AI‑assisted writing.
Comments & Academic Discussion
Loading comments...
Leave a Comment