Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching

February 22, 2026

Reading time: 2 minute

...

📝 Original Info

Title: Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching
ArXiv ID: 2511.08061
Date: 2025-11-11
Authors: ** 논문에 명시된 저자 정보가 제공되지 않아 정확히 기재할 수 없습니다. 일반적으로 해당 연구는 컴퓨터 비전·생성 모델 분야의 연구팀(예: 대학·연구소·기업 공동)에서 수행될 가능성이 높습니다. **

📝 Abstract

Subject-driven image generation aims to synthesize novel depictions of a specific subject across diverse contexts while preserving its core identity features. Achieving both strong identity consistency and high prompt diversity presents a fundamental trade-off. We propose a LoRA fine-tuned diffusion model employing a latent concatenation strategy, which jointly processes reference and target images, combined with a masked Conditional Flow Matching (CFM) objective. This approach enables robust identity preservation without architectural modifications. To facilitate large-scale training, we introduce a two-stage Distilled Data Curation Framework: the first stage leverages data restoration and VLM-based filtering to create a compact, high-quality seed dataset from diverse sources; the second stage utilizes these curated examples for parameter-efficient fine-tuning, thus scaling the generation capability across various subjects and contexts. Finally, for filtering and quality assessment, we present CHARIS, a fine-grained evaluation framework that performs attribute-level comparisons along five key axes: identity consistency, prompt adherence, region-wise color fidelity, visual quality, and transformation diversity.

Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

A Survey on Efficient Vision-Language-Action Models

Adaptive Control for a Physics-Informed Model of a Thermal Energy Distribution System: Qualitative Analysis

Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification

Start searching

No results found