Design and evaluation of an agentic workflow for crisis-related synthetic tweet datasets

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Twitter (now X) has become an important source of social media data for situational awareness during crises. Crisis informatics research has widely used tweets from Twitter to develop and evaluate artificial intelligence (AI) systems for various crisis-relevant tasks, such as extracting locations and estimating damage levels from tweets to support damage assessment. However, recent changes in Twitter’s data access policies have made it increasingly difficult to curate real-world tweet datasets related to crises. Moreover, existing curated tweet datasets are limited to past crisis events in specific contexts and are costly to annotate at scale. These limitations constrain the development and evaluation of AI systems used in crisis informatics. To address these limitations, we introduce an agentic workflow for generating crisis-related synthetic tweet datasets. The workflow iteratively generates synthetic tweets conditioned on prespecified target characteristics, evaluates them using predefined compliance checks, and incorporates structured feedback to refine them in subsequent iterations. As a case study, we apply the workflow to generate synthetic tweet datasets relevant to post-earthquake damage assessment. We show that the workflow can generate synthetic tweets that capture their target labels for location and damage level. We further demonstrate that the resulting synthetic tweet datasets can be used to evaluate AI systems on damage assessment tasks like geolocalization and damage level prediction. Our results indicate that the workflow offers a flexible and scalable alternative to real-world tweet data curation, enabling the systematic generation of synthetic social media data across diverse crisis events, societal contexts, and crisis informatics applications.

💡 Research Summary

The paper addresses a pressing problem in crisis informatics: the growing difficulty of acquiring real‑world Twitter data for training and evaluating AI systems due to recent policy changes and the high cost of annotating existing datasets. To overcome these constraints, the authors propose an “agentic workflow” that leverages large language models (LLMs) to generate synthetic tweets that are explicitly conditioned on target labels—specifically a geographic location (y_loc) and a damage level (y_dmg). The workflow consists of three interacting agents:

Generator (g) – receives a prompt that embeds the target label vector and uses an LLM to produce a synthetic tweet.
Evaluator (e) – applies three heuristic compliance checks: (a) location correctness, (b) damage‑level correctness, and (c) textual diversity (measured by Self‑BLEU against already accepted tweets). The evaluator returns a binary compliance vector c = {c_loc, c_dmg, c_div}.
Augmenter (a) – translates any failed checks into human‑readable feedback messages, concatenates this feedback to the original prompt, and feeds it back to the generator for the next iteration.

The process iterates for a predefined number of rounds or until the synthetic dataset D_syn meets desired quality criteria. The authors apply this workflow to a post‑earthquake damage‑assessment scenario, using six real earthquake events (Napa 2014, Iquique 2014, Nepal 2015, Ridgecrest 2019, Fukushima 2021, Haiti 2021) as sources for target label distributions. Real tweets from these events were collected via the Twitter Search API, de‑duplicated, and automatically annotated: spaCy’s NER model extracted location entities, while Google’s gemma‑3‑27b‑it model inferred damage levels. These automatically derived label vectors formed the basis for synthetic tweet generation.

Experimental results show that after 5–7 iterations, over 95 % of generated tweets satisfied both location and damage‑level checks, and the Self‑BLEU threshold ensured sufficient lexical diversity. The synthetic dataset was then used to evaluate two downstream AI tasks: (i) geolocation – predicting the referenced coordinates of a tweet, and (ii) damage‑level classification – assigning a severity label. Models trained or tested on the synthetic data achieved performance metrics (average distance error, F1‑score) comparable to those obtained with the original real‑world tweets, demonstrating that the synthetic corpus can serve as a reliable proxy for model benchmarking.

Key contributions include: (a) a novel agentic architecture that integrates LLM generation with automated compliance evaluation and feedback‑driven refinement; (b) a practical method for controlling label attributes in synthetic social‑media text; (c) empirical evidence that such synthetic data can faithfully reproduce critical characteristics of real crisis‑related tweets and support AI system evaluation.

The authors acknowledge limitations: heuristic checks cannot fully replace expert human validation; LLM‑inherent biases may propagate through the feedback loop; and the current implementation focuses solely on earthquake damage, requiring adaptation of label schemas and compliance rules for other disaster types (floods, wildfires, pandemics). Future work is suggested to explore multi‑agent collaboration for richer label sets (e.g., casualty counts, infrastructure status), incorporate human‑in‑the‑loop verification, and generalize the workflow across diverse crisis domains.

In summary, the proposed agentic workflow offers a scalable, flexible alternative to costly real‑world tweet collection, enabling the systematic generation of high‑quality synthetic social‑media datasets for crisis informatics research and operational AI development.

Design and evaluation of an agentic workflow for crisis-related synthetic tweet datasets

💡 Research Summary

Comments & Academic Discussion

Leave a Comment