ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction
Automated redesign without manual adjustments marks a key step forward in the design workflow. In this work, we focus on a foundational redesign task termed design layout editing, which seeks to autonomously modify the geometric composition of a design based on user intents. To overcome the ambiguity of user needs expressed in natural language, we introduce four basic and important editing actions and standardize the format of editing operations. The underexplored task presents a unique challenge: satisfying specified editing operations while simultaneously preserving the layout structure of unedited elements. Besides, the scarcity of triplet (original design, editing operation, edited design) samples poses another formidable challenge. To this end, we present ReLayout, a novel framework for versatile and structure-preserving design layout editing that operates without triplet data. Specifically, ReLayout first introduces the relation graph, which contains the position and size relationships among unedited elements, as the constraint for layout structure preservation. Then, relation-aware design reconstruction (RADR) is proposed to bypass the data challenge. By learning to reconstruct a design from its elements, a relation graph, and a synthesized editing operation, RADR effectively emulates the editing process in a self-supervised manner. A multi-modal large language model serves as the backbone for RADR, unifying multiple editing actions within a single model and thus achieving versatile editing after fine-tuning. Qualitative, quantitative results and user studies show that ReLayout significantly outperforms the baseline models in terms of editing quality, accuracy, and layout structure preservation.
💡 Research Summary
ReLayout addresses the previously under‑explored task of design layout editing, where an existing graphic design must be modified according to a user‑specified operation while preserving the spatial relationships among the untouched elements. The authors first formalize the editing operation as a structured triple consisting of an action (add, delete, move, resize), a target element, and any required parameters, thereby removing the ambiguity inherent in natural‑language commands.
The core technical contribution is the introduction of a relation graph that explicitly encodes the layout structure of a design. Nodes represent all elements plus the canvas; edges capture two kinds of relationships: (1) size relationships derived from the area ratio between two elements, classified as “small”, “equal”, or “large” using a tolerance threshold, and (2) position relationships obtained by partitioning the target element’s bounding box (or the canvas) into a 3 × 3 grid and labeling the source element’s center point accordingly (nine possible positions such as top‑left, center, bottom‑right, etc.). This graph serves as a hard constraint that guides any editing operation to keep the relative geometry of the unedited components intact.
To overcome the scarcity of triplet data (original design, editing operation, edited design), the paper proposes Relation‑Aware Design Reconstruction (RADR), a self‑supervised learning scheme. For each training iteration, a design D is sampled, its element contents C and attribute set A_in are extracted, and the relation graph G is built. An editing operation O is then synthesized by randomly selecting one of the four actions and a target element, with appropriate parameters (new coordinates or dimensions). The edges involving the target element are removed from G, reflecting the fact that those relationships will change after editing. The model is tasked with reconstructing the edited attribute set A_out from the triplet (C, G, O).
The reconstruction backbone is a multimodal large language model (MLLM). Visual features of image elements are encoded by a vision encoder and projector, while textual content and the operation token are tokenized as text. All tokens are concatenated into a single sequence and fed to the LLM, which predicts a JSON‑style list of element attributes (id, x, y, width, height, etc.). Training uses a negative log‑likelihood loss on the predicted attributes. Because the same model is fine‑tuned on all four actions, ReLayout becomes a versatile editor capable of handling any of the defined operations without needing separate networks.
Experiments compare ReLayout against a GPT‑4o‑based baseline and several state‑of‑the‑art layout‑generation or design‑generation models (e.g., VLC, Graphist, FlexDM, LaDeCo). Evaluation metrics include editing accuracy (mean absolute error of coordinates/dimensions), structure preservation (graph edit distance between original and edited relation graphs), visual quality (FID), and human user studies (5‑point Likert scale). ReLayout consistently outperforms baselines, achieving a 15‑20 % improvement in structure‑preservation scores and higher user satisfaction, especially regarding the maintenance of the original design intent. The authors also demonstrate extensions to language‑guided editing (natural‑language commands) and composite editing (multiple actions applied simultaneously), showing the framework’s flexibility.
Key contributions are: (1) defining the design layout editing problem and a standardized operation format; (2) introducing a relation‑graph constraint to enforce layout structure; (3) devising a self‑supervised reconstruction objective that eliminates the need for costly triplet datasets; (4) leveraging a multimodal LLM to build a single model that handles multiple editing actions. Limitations include reliance on heuristic rules for relationship extraction, focus on only image and text modalities, and lack of support for more complex vector graphics or style attributes. Future work may replace heuristics with learned relationship detectors and broaden the attribute space (colors, effects, layers).
In summary, ReLayout presents a novel, data‑efficient solution for automated design layout editing. By coupling explicit structural constraints with a versatile multimodal language model, it achieves high‑quality, accurate edits while preserving the spatial coherence of the original design, opening new possibilities for rapid prototyping, UI/UX automation, and large‑scale graphic design pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment