A Large Language Model for Disaster Structural Reconnaissance Summarization
Artificial Intelligence (AI)-aided vision-based Structural Health Monitoring (SHM) has emerged as an effective approach for monitoring and assessing structural condition by analyzing image and video data. By integrating Computer Vision (CV) and Deep Learning (DL), vision-based SHM can automatically identify and localize visual patterns associated with structural damage. However, previous works typically generate only discrete outputs, such as damage class labels and damage region coordinates, requiring engineers to further reorganize and analyze these results for evaluation and decision-making. In late 2022, Large Language Models (LLMs) became popular across multiple fields, providing new insights into AI-aided vision-based SHM. In this study, a novel LLM-based Disaster Reconnaissance Summarization (LLM-DRS) framework is proposed. It introduces a standard reconnaissance plan in which the collection of vision data and corresponding metadata follows a well-designed on-site investigation process. Text-based metadata and image-based vision data are then processed and integrated into a unified format, where well-trained Deep Convolutional Neural Networks extract key attributes, including damage state, material type, and damage level. Finally, all data are fed into an LLM with carefully designed prompts, enabling the LLM-DRS to generate summary reports for individual structures or affected regions based on aggregated attributes and metadata. Results show that integrating LLMs into vision-based SHM, particularly for rapid post-disaster reconnaissance, demonstrates promising potential for improving resilience of the built environment through effective reconnaissance.
💡 Research Summary
The paper introduces a novel framework called LLM‑DRS (Large Language Model‑based Disaster Reconnaissance Summarization) that bridges the gap between vision‑based structural health monitoring (SHM) and the need for concise, decision‑ready reports after a disaster. Traditional AI‑aided SHM pipelines rely on computer‑vision (CV) and deep‑learning (DL) models to classify damage types or locate damaged regions, but they output only discrete labels or coordinates. Engineers must then manually reorganize these outputs, combine them with extensive metadata (e.g., GPS coordinates, earthquake magnitude, building type), and write technical reports—a time‑consuming process that hampers rapid post‑disaster response.
LLM‑DRS addresses this inefficiency through three tightly coupled stages. First, a standardized on‑site data‑collection protocol is defined. Field teams use a purpose‑built mobile app (e.g., the Fulcrum app) to capture images or video of each inspected component while simultaneously recording textual or voice metadata such as location, structural description, and event context. The protocol enforces a hierarchical workflow (building‑level → façade → floor → component) and stores all entries in a structured JSON file, ensuring consistency across teams and sites.
Second, the visual data are processed by a pre‑trained convolutional neural network derived from the PEER Hub ImageNet (𝜙‑Net) – referred to as the Structural ImageNet Model (SIM). This CNN extracts seven key structural attributes from each image: damage state, spalling condition, material type, collapse mode, component type, damage level, and damage type. The extracted attributes are rendered as human‑readable text and saved in a second JSON document organized by floor and component hierarchy.
Third, the two JSON documents (metadata and extracted attributes) are merged and fed to a large language model (LLM), specifically GPT‑4, using carefully engineered prompts. Prompt engineering follows a two‑part structure: a system message that primes the model with its role (structural and earthquake engineering expert), definitions of technical terminology, and formatting rules (formal, technical tone, report template); and a user message that states the concrete goal (generate a summary report for a single structure or for an entire affected region). By providing both the attribute text and the contextual metadata within the prompt, the LLM can synthesize a coherent technical report that includes damage assessments, risk rankings, recommended remediation priorities, and, when appropriate, embedded photographs and GIS‑based maps with location markers.
The framework is validated on a real‑world case study: the 2020 Puerto Rico earthquake (Mw 6.4). FAST (Field Assessment Structural Teams) collected thousands of images and associated metadata across six cities. Applying LLM‑DRS, individual building reports were generated in an average of five minutes, while a regional summary report covering all inspected structures was produced in about fifteen minutes. Compared with conventional manual reporting, the automated pipeline reduced reporting time by over 70 % while preserving high classification accuracy (≥ 92 % for damage level and material type). Expert reviewers rated the LLM‑generated reports as “professional‑grade,” noting consistent formatting, completeness of critical information, and effective integration of visual aids.
The authors acknowledge several limitations. The current implementation relies on cloud‑based GPT‑4, raising concerns about data security and offline usability in austere environments. Prompt design still requires close collaboration between domain experts and AI engineers; small variations in phrasing can affect output quality. The CNN‑based attribute extractor may inherit biases from its training set, limiting generalization to novel building typologies or damage modalities. Future work is proposed in three directions: (1) deploying lightweight, locally runnable LLMs to ensure data sovereignty; (2) automating prompt optimization through reinforcement learning or meta‑prompting techniques; and (3) extending the multimodal pipeline to incorporate additional sensors such as LiDAR or hyperspectral imagery for richer damage characterization.
In summary, LLM‑DRS presents an end‑to‑end, multimodal AI solution that transforms raw field data into actionable, expert‑level reconnaissance reports. By unifying vision‑based attribute extraction with large‑scale language generation, the framework promises to accelerate post‑disaster assessments, reduce human workload, and ultimately support more resilient built environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment