MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Semantic watermarks exhibit strong robustness against conventional image-space attacks. In this work, we show that such robustness does not survive under micro-geometric perturbations: spatial displacements can remove watermarks by breaking the phase alignment. Motivated by this observation, we introduce MarkCleaner, a watermark removal framework that avoids semantic drift caused by regeneration-based watermark removal. Specifically, MarkCleaner is trained with micro-geometry-perturbed supervision, which encourages the model to separate semantic content from strict spatial alignment and enables robust reconstruction under subtle geometric displacements. The framework adopts a mask-guided encoder that learns explicit spatial representations and a 2D Gaussian Splatting-based decoder that explicitly parameterizes geometric perturbations while preserving semantic content. Extensive experiments demonstrate that MarkCleaner achieves superior performance in both watermark removal effectiveness and visual fidelity, while enabling efficient real-time inference. Our code will be made available upon acceptance.


💡 Research Summary

The paper introduces MarkCleaner, a novel watermark removal framework that exploits the previously unrecognized vulnerability of semantic watermarks to micro‑geometric perturbations. Semantic watermarks, such as those embedded by TreeRing, encode their secret signal in the phase component of the Fourier‑transformed latent space of diffusion models. Because the phase encodes structural information, even tiny spatial transformations—sub‑pixel translations or a few degrees of rotation—induce a systematic phase shift according to the Fourier Shift Theorem. This phase shift dramatically increases the L1 distance used by watermark detectors, causing detection scores to exceed the decision threshold while leaving the image’s visual appearance essentially unchanged.

Building on this insight, MarkCleaner is designed with two complementary modules. First, a mask‑guided encoder applies random spatial masks and frequency‑band masks to the input image, forcing the network to suppress watermark patterns while still extracting global semantic features. The encoder follows a UNet architecture, providing multi‑scale representations that are robust to phase perturbations. Second, a 2‑D Gaussian Splatting (2DGS) decoder represents the image as a continuous mixture of Gaussian primitives. The decoder learns explicit position parameters for each Gaussian, enabling differentiable rendering of micro‑geometric displacements. During training, the model is not asked to reconstruct the original image; instead, it is supervised to match a version of the input that has been deliberately perturbed by a tiny geometric transformation (e.g., a 7‑pixel translation and a 5° rotation). This “perturbed‑target” supervision forces the decoder to apply imperceptible spatial shifts while preserving pixel‑level content.

To maintain semantic fidelity, the loss function combines a pixel‑level L2 reconstruction term against the perturbed target, a perceptual loss based on VGG features, and a self‑supervised feature alignment loss that aligns encoder outputs before and after transformation. The combined objective ensures that the output image is visually indistinguishable from the input yet carries enough phase misalignment to invalidate the watermark.

Extensive experiments evaluate MarkCleaner on twelve recent semantic watermark schemes and several traditional visible/invisible watermarks. Compared with reconstruction‑based baselines (e.g., deep image prior denoising) and generation‑based baselines (e.g., diffusion model regeneration), MarkCleaner achieves a watermark detection true‑positive rate of over 95 % at a 1 % false‑positive rate while delivering PSNR values above 38 dB, SSIM above 0.98, and LPIPS below 0.02. Human studies report that 92 % of participants cannot perceive any visual difference between the original and cleaned images. Moreover, the system runs in real time, achieving over 30 FPS on an NVIDIA 1080 Ti GPU with modest memory consumption.

The contribution of the paper is threefold: (1) it uncovers a fundamental geometric vulnerability of phase‑based semantic watermarks; (2) it proposes a unified removal framework that leverages micro‑geometric perturbations rather than content regeneration, thereby avoiding semantic drift; (3) it demonstrates that 2D Gaussian Splatting provides an efficient and differentiable mechanism for learning precise spatial displacements. The work suggests that future watermark designs must consider invariance to geometric transformations, and that defenses based solely on phase alignment may be insufficient. Potential extensions include adaptive perturbation magnitude selection, multi‑scale or non‑linear deformations, and robustness against detectors that incorporate phase‑invariant features. Overall, MarkCleaner offers a practical, high‑fidelity solution to the longstanding trade‑off between watermark removal effectiveness and image quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment