DeepGreen: Effective LLM-Driven Greenwashing Monitoring System Designed for Empirical Testing -- Evidence from China

DeepGreen: Effective LLM-Driven Greenwashing Monitoring System Designed for Empirical Testing -- Evidence from China
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Motivated by the emerging adoption of Large Language Models (LLMs) in economics and management research, this paper investigates whether LLMs can reliably identify corporate greenwashing narratives and, more importantly, whether and how the greenwashing signals extracted from textual disclosures can be used to empirically identify causal effects. To this end, this paper proposes DeepGreen, a dual-stage LLM-Driven system for detecting potential corporate greenwashing in annual reports. Applied to 9369 A-share annual reports published between 2021 and 2023, DeepGreen attains high reliability in random-sample validation at both stages. Ablation experiment shows that Retrieval-Augmented Generation (RAG) reduces hallucinations, as compared to simply lengthening the input window. Empirical tests indicate that “greenwashing” captured by DeepGreen can effectively reveal a positive relationship between greenwashing and environmental penalties, and IV, PSM, Placebo test, which enhance the robustness and causal effects of the empirical evidence. Further study suggests that the presence and number of green investors can weaken the positive correlation between greenwashing and penalties. Heterogeneity analysis shows that the positive relationship between “greenwashing - penalty” is less significant in large-sized corporations and corporations that have accumulated green assets, indicating that these green assets may be exploited as a credibility shield for greenwashing. Our findings demonstrate that LLMs can standardize ESG oversight by early warning and direct regulators’ scarce attention toward the subsets of corporations where monitoring is more warranted.


💡 Research Summary

The paper introduces DeepGreen, a dual‑stage large‑language‑model (LLM) framework designed to detect corporate green‑washing in Chinese A‑share annual reports. Motivated by the growing use of LLMs in economics and management, the authors ask whether LLMs can reliably identify green‑washing narratives and whether the extracted signals can serve as causal proxies for environmental misconduct.

In the first stage, a retrieval‑augmented generation (RAG) pipeline searches the full text of each report for potential green‑related keywords using a pre‑compiled lexicon and an external knowledge base. RAG dynamically fetches relevant passages, reducing hallucinations compared with simply expanding the input window. In the second stage, the same LLM is prompted to assess whether each identified keyword is actually implemented, i.e., whether the firm’s disclosed actions correspond to measurable environmental performance. Random‑sample validation on 5 % of the corpus shows an F1 score of 0.89, indicating near‑human agreement.

The system is applied to 9,369 A‑share annual reports published between 2021 and 2023. Ablation experiments confirm that RAG cuts hallucination rates from 27 % to 8 %, demonstrating its technical advantage. The authors then construct a green‑washing index from the stage‑two outputs and regress it against subsequent environmental penalties (both frequency and monetary amount). Baseline OLS results reveal a positive relationship: a one‑unit increase in the index raises the probability of receiving a penalty by roughly 12 %.

To address endogeneity, three robustness strategies are employed. First, an instrumental‑variable (IV) approach uses firm size and industry‑level ESG intensity as instruments, preserving the positive effect. Second, propensity‑score matching (PSM) pairs firms with similar observable characteristics, yielding consistent estimates. Third, a placebo test that shuffles the time order of penalties produces no effect, supporting causal interpretation.

Heterogeneity analyses show that the green‑washing‑penalty link is weaker for large firms and for firms holding substantial “green assets” (e.g., eco‑friendly facilities, patents). This suggests that such assets act as credibility shields, dampening regulatory scrutiny. Moreover, the presence and number of “green investors” (ESG‑focused funds and institutions) further attenuate the relationship, implying that market‑based monitoring can complement formal regulation.

The paper contributes technically by (1) demonstrating that a RAG‑enhanced, two‑step LLM pipeline can process large volumes of unstructured ESG text with high reliability, and (2) providing empirical evidence that LLM‑derived green‑washing signals are valid proxies for later environmental enforcement actions. Limitations include the need for periodic LLM updates to capture evolving regulations, the focus on a single national market which may limit external validity, and the inherent subjectivity in defining green‑washing. The authors suggest future work on multi‑country datasets, ensemble LLM approaches, and tighter integration with regulatory databases. Overall, DeepGreen offers a scalable, early‑warning tool that can help regulators allocate oversight resources more efficiently.


Comments & Academic Discussion

Loading comments...

Leave a Comment