UniMark: Artificial Intelligence Generated Content Identification Toolkit

Reading time: 5 minute
...

📝 Abstract

The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regulatory demands. However, existing identification tools suffer from fragmentation and a lack of support for visible compliance marking. To address these gaps, we introduce the UniMark, an open-source, unified framework for multimodal content governance. Our system features a modular unified engine that abstracts complexities across text, image, audio, and video modalities. Crucially, we propose a novel dualoperation strategy, natively supporting both Hidden Watermarking for copyright protection and Visible Marking for regulatory compliance. Furthermore, we establish a standardized evaluation framework with three specialized benchmarks (Image/Video/Audio-Bench) to ensure rigorous performance assessment. This toolkit bridges the gap between advanced algorithms and engineering implementation, fostering a more transparent and secure digital ecosystem.

💡 Analysis

The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regulatory demands. However, existing identification tools suffer from fragmentation and a lack of support for visible compliance marking. To address these gaps, we introduce the UniMark, an open-source, unified framework for multimodal content governance. Our system features a modular unified engine that abstracts complexities across text, image, audio, and video modalities. Crucially, we propose a novel dualoperation strategy, natively supporting both Hidden Watermarking for copyright protection and Visible Marking for regulatory compliance. Furthermore, we establish a standardized evaluation framework with three specialized benchmarks (Image/Video/Audio-Bench) to ensure rigorous performance assessment. This toolkit bridges the gap between advanced algorithms and engineering implementation, fostering a more transparent and secure digital ecosystem.

📄 Content

UniMark: Artificial Intelligence Generated Content Identification Toolkit Meilin Li1,2, Ji He1,3, Yi Yu1, Jia Xu1, Shanzhe Lei1, Yan Teng1, Yingchun Wang1, Xuhong Wang1∗ 1Shanghai AI Laboratory, 2Shandong University 3Shanghai Information Security Testing Evaluation and Certification Center lml@mail.sdu.edu.cn, heji@shtec.org.cn {yuyi, xujia, leishanzhe, tengyan, wangyingchun, wangxuhong}@pjlab.org.cn Abstract The rapid proliferation of Artificial Intelligence Generated Content has precipitated a crisis of trust and urgent regula- tory demands. However, existing identification tools suffer from fragmentation and a lack of support for visible com- pliance marking. To address these gaps, we introduce the UniMark, an open-source, unified framework for multimodal content governance. Our system features a modular unified engine that abstracts complexities across text, image, audio, and video modalities. Crucially, we propose a novel dual- operation strategy, natively supporting both Hidden Water- marking for copyright protection and Visible Marking for regulatory compliance. Furthermore, we establish a standard- ized evaluation framework with three specialized benchmarks (Image/Video/Audio-Bench) to ensure rigorous performance assessment. This toolkit bridges the gap between advanced algorithms and engineering implementation, fostering a more transparent and secure digital ecosystem. The code is avail- able at https://github.com/AI-LAB-C-for-S-T-AI/ AIGC-Identification-Toolkit. 1 Introduction In recent years, Large Language Models (LLMs) and genera- tive diffusion models have made remarkable progress. From text-domain models such as GPT-5 [10] and Qwen [1], to video generation models like Sora 2 [13], Wan 2.1 [16], and Nano Banana [8], Artificial Intelligence Generated Content (AIGC) is reshaping the production of digital content at an unprecedented pace. However, the widespread adoption of these technologies has also precipitated a profound crisis of trust. The abuse of Deepfake technology, the viral spread of fake news, and increasingly severe copyright infringement issues are eroding the foundation of public trust in digital in- formation. Facing this challenge, establishing a reliable mech- anism for Content Identification and Tracing is no longer optional but an urgent necessity for the digital ecosystem. ∗Corresponding author. Governments worldwide have also attached great importance to this issue; for instance, China’s national standard Informa- tion Security Technology – Method for Identifying Content Generated by Artificial Intelligence [15] and the European Union’s AI Act [4] have both set forth clear specifications and requirements for the identification of AIGC. Existing AIGC identification solutions fall primarily into two categories: In-processing (generation process embed- ding) and Post-processing (generation after-treatment). In- processing methods embed watermarks directly during the model inference stage. Representative works include Syn- thID [3] and GumbelSoft [7] in the text domain; Stable Sig- nature [5], Tree-Ring [17], and PRC [9] in the image do- main; and Groot [12] in the audio domain. Although these methods theoretically offer high imperceptibility, they face significant limitations in practical application: they typically require access to internal model parameters (such as logits), making them inapplicable to black-box APIs like GPT-5 or Midjourney, and they cannot handle pre-existing generated data. Consequently, to build a universal and model-agnostic solution, Post-processing schemes have become a more prac- tical choice. Related research, such as PostMark [2] for text, ZoDiac [18] for images, VideoSeal [6] for video, and Au- dioSeal [14] for audio, has demonstrated immense potential. However, the current ecosystem of identification tools suffers from severe fragmentation—State-of-the-Art (SOTA) algo- rithms are scattered across different codebases with inconsis- tent interface standards, making integration difficult. More importantly, existing tools mostly focus solely on implicit technical watermarks, neglecting “Visible Marking”—a crit- ical dimension for regulatory compliance—thus failing to meet the requirements for user right-to-know mandated by regulations such as the EU AI Act. To address these gaps, we propose the UniMark—an open- source, modular, and production-grade multimodal content identification framework. First, the core of this framework is a unified engine archi- tecture. We have highly abstracted the differences among text, image, audio, and video modalities. Developers need not 1 arXiv:2512.12324v2 [cs.CR] 26 Dec 2025 learn complex underlying libraries for different modalities; they can complete all operations through unified embed and extract APIs, significantly lowering the barrier to entry for multimodal application development. Second, addressing the dual needs of copyright protec- tion and regulatory compliance, we propose a unique dual- operation strat

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut