Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Membership inference attacks (MIAs) test whether a specific audio clip was used to train a model, making them a key tool for auditing generative music models for copyright compliance. However, loss-based signals (e.g., reconstruction error) are weakly aligned with human perception in practice, yielding poor separability at the low false-positive rates (FPRs) required for forensics. We propose the Latent Stability Adversarial Probe (LSA-Probe), a white-box method that measures a geometric property of the reverse diffusion: the minimal time-normalized perturbation budget needed to cross a fixed perceptual degradation threshold at an intermediate diffusion state. We show that training members, residing in more stable regions, exhibit a significantly higher degradation cost.

💡 Research Summary

This paper addresses the problem of determining whether a particular audio clip was part of the training set of a music generation diffusion model—a task known as membership inference attack (MIA). Traditional MIA approaches for generative models rely on endpoint signals such as reconstruction loss or likelihood, which work reasonably well for images but are poorly aligned with human perception in the audio domain. Consequently, they exhibit weak separability at the extremely low false‑positive rates (FPRs) required for forensic or copyright‑compliance applications.

The authors propose a novel white‑box attack called the Latent Stability Adversarial Probe (LSA‑Probe). The method exploits the reverse diffusion process of a diffusion model. For a given audio sample x₀, the forward diffusion is run to an intermediate timestep t, producing a latent xₜ. A time‑normalized perturbation δₜ = σₜ·δ̃ is added to xₜ, where σₜ = √(1‑ᾱₜ) matches the forward noise variance, making perturbation budgets comparable across timesteps. The perturbation δ̃ is constrained by an ℓₚ norm (p = 2 or ∞) with budget η.

The core idea is that training samples (members) reside in smoother, more stable regions of the generative manifold, so a larger adversarial budget is required to push them past a perceptual degradation threshold. The degradation is measured by a differentiable perceptual distance D(·,·) computed on the reconstructed waveforms (e.g., CDP‑AM, multi‑resolution STFT). The attack solves a nested optimization: an inner projected‑gradient‑descent (PGD) loop finds the worst‑case δ̃ for a fixed η, while an outer binary‑search loop finds the minimal η such that D exceeds a pre‑registered threshold τ (set to the 95th percentile of a development non‑member set). This minimal η, denoted C_adv(x₀; t, τ), serves as the membership score; higher values indicate higher likelihood of membership.

The method is evaluated on two families of music diffusion models: (i) DiffWave, a waveform‑level DDPM, and (ii) MusicLDM, a latent‑diffusion model that operates in a VAE latent space. Both models are trained on two datasets: MAESTRO v3 (solo piano) and the FMA‑Large subset (multi‑genre). Experiments use deterministic DDIM sampling, a timestep ratio of 0.6, ℓ₂ norm, a maximum budget η_max = 0.8, and CDP‑AM as the primary distance metric (with MR‑STFT, log‑mel MSE, and waveform MSE reported for robustness).

To ensure fair comparison, the authors match compute across baselines (loss‑based, trajectory‑based, and SecMI) within ±5 % by accounting for UNet calls, FLOPs, and wall‑clock time. Baselines include reconstruction loss at various timesteps, the PIA/PIAN trajectory reconstruction attack, and the SecMI method.

Results (Table 1) show consistent improvements in the low‑FPR regime. For DiffWave on MAESTRO, true‑positive rate at 1 % FPR rises from 0.12 (best baseline) to 0.20 (+0.08), with AU‑ROC improving by 0.04. On FMA‑Large, TPR@1 % FPR improves from 0.11 to 0.18 (+0.07). MusicLDM also benefits: on MAESTRO, TPR@1 % FPR increases from 0.10 to 0.13 (+0.03) and on FMA‑Large from 0.08 to 0.14 (+0.06). AU‑ROC gains range from +0.03 to +0.06 across all settings.

Ablation studies reveal that (a) mid‑trajectory timesteps (t_ratio ≈ 0.6) provide the strongest separability, supporting the intuition that the reverse process transitions from coarse global structure to fine details around this point; (b) larger perturbation budgets increase performance up to η ≈ 0.6‑0.8, after which gains saturate; (c) perceptual distances (CDP‑AM, MR‑STFT) outperform simple MSE‑based metrics at low FPR, confirming the importance of human‑aligned evaluation.

The authors connect their empirical findings to the “flat minima” hypothesis: members lie in flatter regions of the loss landscape, leading to higher local stability of the reverse diffusion map. A first‑order analysis shows that the gradient of the perceptual distance with respect to the latent perturbation is systematically smaller for members, which mathematically justifies the observed higher adversarial cost.

Limitations include the reliance on a white‑box threat model (full access to model parameters and gradients) and the need to pre‑register τ on a separate non‑member set. Real‑world deployment may face restricted access or varying data distributions, requiring adaptive thresholding. Moreover, the attack’s computational cost—multiple reverse passes per sample—could be prohibitive for large‑scale auditing. Future work is suggested on black‑box adaptations, meta‑learning cheap proxies for C_adv, and broader validation across diverse musical genres, lengths, and recording qualities.

In summary, the paper introduces LSA‑Probe, a principled, perceptually grounded white‑box MIA that leverages time‑normalized latent stability in music diffusion models. It delivers significant gains in the forensically critical low‑FPR region, demonstrates robustness across model families and datasets, and opens new avenues for privacy and security analysis of generative audio systems.

Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment