Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching
📝 Original Info
- Title: Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching
- ArXiv ID: 2511.08061
- Date: 2025-11-11
- Authors: ** 논문에 명시된 저자 정보가 제공되지 않아 정확히 기재할 수 없습니다. 일반적으로 해당 연구는 컴퓨터 비전·생성 모델 분야의 연구팀(예: 대학·연구소·기업 공동)에서 수행될 가능성이 높습니다. **
📝 Abstract
Subject-driven image generation aims to synthesize novel depictions of a specific subject across diverse contexts while preserving its core identity features. Achieving both strong identity consistency and high prompt diversity presents a fundamental trade-off. We propose a LoRA fine-tuned diffusion model employing a latent concatenation strategy, which jointly processes reference and target images, combined with a masked Conditional Flow Matching (CFM) objective. This approach enables robust identity preservation without architectural modifications. To facilitate large-scale training, we introduce a two-stage Distilled Data Curation Framework: the first stage leverages data restoration and VLM-based filtering to create a compact, high-quality seed dataset from diverse sources; the second stage utilizes these curated examples for parameter-efficient fine-tuning, thus scaling the generation capability across various subjects and contexts. Finally, for filtering and quality assessment, we present CHARIS, a fine-grained evaluation framework that performs attribute-level comparisons along five key axes: identity consistency, prompt adherence, region-wise color fidelity, visual quality, and transformation diversity.💡 Deep Analysis
📄 Full Content
Reference
This content is AI-processed based on open access ArXiv data.