AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation

Reading time: 1 minute
...

📝 Original Info

  • Title: AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation
  • ArXiv ID: 2512.10943
  • Date: 2025-12-11
  • Authors: Sharath Girish, Viacheslav Ivanov, Tsai-Shien Chen, Hao Chen, Aliaksandr Siarohin, Sergey Tulyakov

📝 Abstract

Figure 1. AlcheMinT: Time-controlled subject reference video generation. Given a subject reference with input timestamps, Al-cheMinT generates a consistent video with the subject naturally appearing in the specified time interval. Yellow boxes highlight the frames which lie in the input interval expecting the first subject reference to be present at those times, while red boxes highlight the second reference if present.

📄 Full Content

Large-scale diffusion models [4,6,18,33,36,44,55,60] have shown remarkable quality and high-fidelity in producing realistic videos directly from text or image inputs. These models are capable of handling various forms of conditioning such as poses, depths, cameras [24]. More recently, conditioning based on identities or subjects has become popular to provide fine-grained control as well as personalized generations for users. This has led to a large number of works targeting single or multiple reference conditions consisting of people, faces, animals, objects, or background [8,9,12,14,15,20,22,23,30,32,34,35,48,50,52,58,61].

…(본문이 길어 일부가 생략되었습니다.)

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut