Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions

Reading time: 5 minute
...

📝 Original Info

  • Title: Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions
  • ArXiv ID: 2512.15743
  • Date: 2025-12-10
  • Authors: David Noever

📝 Abstract

We present a framework for generating physically realizable assembly instructions from natural language descriptions. Unlike unconstrained text-to-3D approaches, our method operates within a discrete parts vocabulary, enforcing geometric validity, connection constraints, and buildability ordering. Using LDraw as a text-rich intermediate representation, we demonstrate that large language models can be guided with tools to produce valid step-by-step construction sequences and assembly instructions for brick-based prototypes of more than 3000 assembly parts. We introduce a Python library for programmatic model generation and evaluate buildable outputs on complex satellites, aircraft, and architectural domains. The approach aims for demonstrable scalability, modularity, and fidelity that bridges the gap between semantic design intent and manufacturable output. Physical prototyping follows from natural language specifications. The work proposes a novel elemental lingua franca as a key missing piece from the previous pixel-based diffusion methods or computer-aided design (CAD) models that fail to support complex assembly instructions or component exchange. Across four original designs, this novel "bag of bricks" method thus functions as a physical API: a constrained vocabulary connecting precisely oriented brick locations to a "bag of words" through which arbitrary functional requirements compile into material reality. Given such a consistent and repeatable AI representation opens new design options while guiding natural language implementations in manufacturing and engineering prototyping.

💡 Deep Analysis

Figure 1

📄 Full Content

Prompt-to-Parts: Generative AI for Physical Assembly and Scalable Instructions

David Noever PeopleTec, 4901-D Corporate Drive, Huntsville, AL, USA, 35805 david.noever@peopletec.com

ABSTRACT We present a framework for generating physically realizable assembly instructions from natural language descriptions. Unlike unconstrained text-to-3D approaches, our method operates within a discrete parts vocabulary, enforcing geometric validity, connection constraints, and buildability ordering. Using LDraw as a text-rich intermediate representation, we demonstrate that large language models can be guided with tools to produce valid step-by-step construction sequences and assembly instructions for brick-based prototypes of more than 3000 assembly parts. We introduce a Python library for programmatic model generation and evaluate buildable outputs on complex satellites, aircraft, and architectural domains. The approach aims for demonstrable scalability, modularity, and fidelity that bridges the gap between semantic design intent and manufacturable output. Physical prototyping follows from natural language specifications. The work proposes a novel elemental lingua franca as a key missing piece from the previous pixel- based diffusion methods or computer-aided design (CAD) models that fail to support complex assembly instructions or component exchange. Across four original designs, this novel “bag of bricks” method thus functions as a physical API: a constrained vocabulary connecting precisely oriented brick locations to a “bag of words” through which arbitrary functional requirements compile into material reality. Given such a consistent and repeatable AI representation opens new design options while guiding natural language implementations in manufacturing and engineering prototyping.

Keywords: large language models, VLM, 3d model, image generation, spatial reasoning, part assembly

  1. INTRODUCTION The conversion of natural language into functional physical prototypes represents an emerging frontier in computational design. While text-to-image and text-to-3D systems have advanced significantly [1-14], the challenge of generating physically buildable, materially constrained structures remains open [15-25]. LEGO bricks [1,26-32], with their standardized geometry, tolerances, and global availability, offer a uniquely well- bounded medium for exploring this problem.
    LEGO systems are already widely adopted in engineering education [30-31], robotics experiments [19,33-34], and low- cost scientific instrumentation [35-36]. The Figure 1. Complex medieval castle automatically generated as an 860-part instruction kit and bill of materials largest LEGO set with instructions (Eiffel Tower) totals around 10,000 bricks, 58.5- inch height, and nearly 1000 assembly steps (or manual pages). As illustrated in Figure 1, these composable examples highlight the potential for LEGO to serve as an accessible platform for rapid, low-cost experimentation in mechanical design, instrumentation, and physical sciences. The underlying insight is that discretized construction systems— whether LEGO, modular satellites, or flat- pack furniture—share a common structure amenable to language-driven synthesis: a finite vocabulary of parts, a grammar of valid connections, and functional constraints that map to inventive principles (Figure 2). A key question arises: Can large language models (LLMs) reliably generate accurate LEGO models and step-by-step assembly instructions from arbitrary text prompts? If successful, this capability would operationalize a form of text-to-prototype, allowing designers, students, and researchers to translate conceptual descriptions directly into testable physical artifacts. The problem is nontrivial. It requires spatial reasoning [17- 20], long-horizon planning [34], constraint satisfaction [10,14], and an implicit understanding of real-world forces [33-35] and geometric compatibility [1,3,10]. It also requires the model to produce instructions that a human can follow and that result in a stable assembly [23,29] with no ambiguous or impossible steps.
    We hypothesize that the adoption of a compact, human-readable component language can substantially improve the reliability and scale of LLM-generated physical assemblies. The hypothesis draws from LLM methods to encode chess games with Forsyth-Edwards Notation (FEN) [37] or use natural language to produce structured query languages (SQL) in databases [38]) Just as FEN encodes complete board states in a single line of text that both humans and machines can parse unambiguously, the LDraw format [39] encodes LEGO assemblies through standardized part identifiers, precise coordinates, and rotation matrices in a syntax that predates and is independent of any particular AI system.
    We extend the success of the natural language choice—so called “Bag of Words (BOW)” approaches–

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut