Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.

💡 Research Summary

The paper tackles a fundamental shortcoming of current multimodal large language models (MLLMs) when applied to engineering schematics: structural blindness. While modern vision‑language models excel at recognizing visual symbols, they fail to capture the underlying topology and logical constraints that are essential for tasks such as electrical compliance checking. The authors demonstrate this weakness through a targeted diagnostic probe consisting of ten representative compliance rules (grounding, wiring, labeling) applied to a set of real‑world CAD circuit diagrams. Six state‑of‑the‑art MLLMs (ChatGPT‑4o, ChatGPT‑5, GLM‑4.5V, Gemini 2.5 Pro, Qwen2.5‑VL‑72B, Claude Sonnet‑4) are evaluated in a zero‑shot setting, and all achieve near‑chance performance (often below 10 % accuracy), confirming that pixel‑based processing alone cannot reason about connectivity or relational constraints.

To overcome this limitation, the authors propose a Vector‑to‑Graph (V2G) pipeline. The first stage parses CAD files using the ezdxf library to extract low‑level primitives (lines, arcs, text, inserts). An LLM‑driven extraction module groups primitives into component nodes and infers electrical edges by combining geometric proximity (within a tolerance τ) with semantic cues supplied by the LLM. The result is a property graph G = (V, E, X) where V are component nodes, E are explicit connections, and X stores attributes such as terminal IDs, polarity, and grounding type.

The second stage introduces an MLLM planner that receives natural‑language compliance rules Φ. Through structured prompting, the planner translates each rule into a JSON command that specifies a target subgraph Rₜ ⊆ G and selects an appropriate verification function fₜ from a library F. This three‑step process (rule interpretation, region selection, function selection) bridges the language model’s reasoning ability with the graph representation.

The third stage applies deterministic Graph Signal Processing (GSP) operators to the selected subgraph. The adjacency matrix Aₜ, degree matrix Dₜ, and Laplacian Lₜ = Dₜ − Aₜ are computed. Connectivity is assessed via the multiplicity of the zero eigenvalue (the number of connected components). Grounding uniqueness is checked by counting ground‑type nodes within the subgraph; polarity and phase consistency are enforced by attribute equality across edges; wiring loops and abnormal circuits are identified through spectral features such as the cycle number β = |Eₜ| − |Vₜ| + c. Each verification function returns a binary outcome, providing a fully auditable and deterministic compliance decision.

The final stage aggregates all rule outcomes into a structured JSON report and produces a concise natural‑language summary for human auditors.

Experimental evaluation uses a benchmark built from 60 real electrical schematics supplied by a regional power utility. Each base diagram is augmented with 10–20 transformations (rotation, translation, mild scaling, noise), yielding roughly 900 test instances. Evaluation metrics focus on accuracy (percentage of correctly identified compliant or violating cases) and statistical significance via McNemar tests.

Results show that the baseline pixel‑only models achieve an average accuracy of only 12 %, whereas the +V2G configuration raises the average to 47 % (an absolute gain of 35 %). Category‑wise improvements are 61 % for connection labeling, 27 % for grounding, and 20 % for wiring. All six MLLMs benefit, indicating that the V2G framework is model‑agnostic. Ablation studies reveal that removing the deterministic GSP verifier drops performance to ~28 %, omitting node attributes reduces it to ~34 %, and disabling planner‑guided subgraph selection lowers it to ~39 %, confirming the importance of each component. Moreover, V2G’s performance remains stable under geometric perturbations, unlike pixel‑based baselines whose predictions fluctuate with rotations.

The authors introduce an “Intropy” metric (intelligence = δS / R) to formalize the efficiency gain: δS represents meaningful discrepancy reduction (the amount of structural information recovered), while R denotes internal resistance such as uncertainty or representation mismatch. By converting vector graphics into explicit graphs, V2G dramatically reduces R and increases δS, yielding higher Intropy and more robust, interpretable reasoning.

In conclusion, the paper provides compelling evidence that structural blindness is a systemic flaw of current MLLMs and that explicit vector‑to‑graph conversion coupled with deterministic graph‑signal verification can substantially improve reliability in schematic auditing. The released benchmark and open‑source implementation invite further research, including scaling to larger datasets, comparing against specialized Graph Neural Networks, and integrating V2G into real‑time design verification pipelines.

Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment