📝 Original Info
- Title: ESG 메트릭 지식 그래프 자동 구축을 위한 온톨로지 기반 프레임워크
- ArXiv ID: 2512.01289
- Date: 2025-12-01
- Authors: Mingqin Yu, Fethi Rabhi, Boming Xia, Zhengyi Yang, Felix Tan, Qinghua Lu
📝 Abstract
Environmental, Social, and Governance (ESG) metric knowledge is inherently structured, connecting industries, reporting frameworks, metric categories, metrics, and calculation models through compositional dependencies, yet in practice this structure remains embedded implicitly in regulatory documents such as SASB, TCFD, and IFRS S2 and rarely exists as an explicit, governed, or machineactionable artefact. Existing ESG ontologies define formal schemas but do not address scalable population and governance from authoritative regulatory sources, while unconstrained large language model (LLM) extraction frequently produces semantically incorrect entities, hallucinated relationships, and structurally invalid graphs. OntoMetric is an ontology-guided framework for the automated construction and governance of ESG metric knowledge graphs from regulatory documents that operationalises the ESG Metric Knowledge Graph (ESGMKG) ontology as a first-class constraint embedded directly into the extraction and population process. The framework integrates structure-aware segmentation, ontology-constrained LLM extraction enriched with semantic fields and deterministic identifiers, and two-phase validation combining semantic type verification with rule-based schema checking, while preserving segment-level and page-level provenance to ensure traceability to regulatory source text. Evaluation on five ESG regulatory standards shows that ontology-guided extraction achieves 65-90% semantic accuracy and over 80% schema compliance, compared with 3-10% for unconstrained baseline extraction, and yields stable cost efficiency with a cost per validated entity of $0.01-$0.02 and a 48× efficiency improvement over baseline.
💡 Deep Analysis
Deep Dive into ESG 메트릭 지식 그래프 자동 구축을 위한 온톨로지 기반 프레임워크.
Environmental, Social, and Governance (ESG) metric knowledge is inherently structured, connecting industries, reporting frameworks, metric categories, metrics, and calculation models through compositional dependencies, yet in practice this structure remains embedded implicitly in regulatory documents such as SASB, TCFD, and IFRS S2 and rarely exists as an explicit, governed, or machineactionable artefact. Existing ESG ontologies define formal schemas but do not address scalable population and governance from authoritative regulatory sources, while unconstrained large language model (LLM) extraction frequently produces semantically incorrect entities, hallucinated relationships, and structurally invalid graphs. OntoMetric is an ontology-guided framework for the automated construction and governance of ESG metric knowledge graphs from regulatory documents that operationalises the ESG Metric Knowledge Graph (ESGMKG) ontology as a first-class constraint embedded directly into the extractio
📄 Full Content
OntoMetric: An Ontology-Driven LLM-Assisted Framework for
Automated ESG Metric Knowledge Graph Generation
Mingqin Yu∗
mingqin.yu@unsw.edu.au
School of Computer Science and
Engineering, University of New South
Wales
Sydney, Australia
Fethi Rabhi
f.rabhi@unsw.edu.au
School of Computer Science and
Engineering, University of New South
Wales
Sydney, Australia
Boming Xia
boming.xia@adelaide.edu.au
Faculty of Sciences, Engineering and
Technology, The University of
Adelaide
Adelaide, Australia
Zhengyi Yang
zhengyi.yang@unsw.edu.au
School of Computer Science and
Engineering, University of New South
Wales
Sydney, Australia
Felix Tan
f.tan@unsw.edu.au
School of Information Systems and
Technology Management, University
of New South Wales
Sydney, Australia
Qinghua Lu
qinghua.lu@data61.csiro.au
CSIRO’s Data61
Sydney, Australia
Abstract
Environmental, Social, and Governance (ESG) metric knowledge
is inherently structured, connecting industries, reporting frame-
works, metric categories, metrics, and calculation models through
compositional dependencies, yet in practice this structure remains
embedded implicitly in regulatory documents such as SASB, TCFD,
and IFRS S2 and rarely exists as an explicit, governed, or machine-
actionable artefact. Existing ESG ontologies define formal schemas
but do not address scalable population and governance from au-
thoritative regulatory sources, while unconstrained large language
model (LLM) extraction frequently produces semantically incorrect
entities, hallucinated relationships, and structurally invalid graphs.
OntoMetric is an ontology-guided framework for the automated
construction and governance of ESG metric knowledge graphs from
regulatory documents that operationalises the ESG Metric Knowl-
edge Graph (ESGMKG) ontology as a first-class constraint embed-
ded directly into the extraction and population process. The frame-
work integrates structure-aware segmentation, ontology-constrained
LLM extraction enriched with semantic fields and deterministic
identifiers, and two-phase validation combining semantic type
verification with rule-based schema checking, while preserving
segment-level and page-level provenance to ensure traceability to
regulatory source text. Evaluation on five ESG regulatory standards
shows that ontology-guided extraction achieves 65–90% semantic
accuracy and over 80% schema compliance, compared with 3–10%
for unconstrained baseline extraction, and yields stable cost effi-
ciency with a cost per validated entity of $0.01–$0.02 and a 48×
efficiency improvement over baseline.
Keywords
Ontology-Guided LLM Extraction, ESG Knowledge Graphs, Two-
Phase Validation, Provenance Preservation, Regulatory Knowledge
Engineering, AI-Ready Knowledge Representation
∗Corresponding author.
1
Introduction
Environmental, Social, and Governance (ESG) metrics constitute a
structured body of domain knowledge that specifies what must be
measured, how values are computed, which units apply, and how
individual indicators depend on one another. Beyond simple nu-
merical values, ESG metric knowledge includes formal definitions,
scope and boundary conditions, disaggregation rules, calculation
models, and compositional dependencies between metrics and their
input variables. Collectively, these elements form an implicit ESG
metric knowledge graph that connects industries, reporting con-
texts, metric categories, metrics, and computational models into a
coherent semantic structure.
In practice, however, this ESG metric knowledge graph does
not exist as an explicit, governed, or machine-actionable artefact.
Instead, metric definitions, calculation logic, and inter-metric de-
pendencies are embedded implicitly in regulatory documents and
reporting artefacts. Where ESG ontologies have been proposed,
they typically provide only formal schemas or high-level concept
hierarchies, without populated instances or traceable links to au-
thoritative metric definitions. As a result, the construction and
maintenance of ESG metric knowledge graphs remains a largely
manual, expert-driven process.
Several efforts have sought to formalise ESG knowledge using
ontologies and semantic models. Notably, we previously proposed
the ESG Metric Knowledge Graph (ESGMKG), an ontology-driven
architecture that defines the core entity types, relationships, and
compositional structures required to represent ESG metric knowl-
edge, including industries, reporting frameworks, metric categories,
metrics, and calculation models [16]. However, ESGMKG and re-
lated ontologies focus primarily on schema definition rather than
scalable population and governance. They do not address how ESG
metric knowledge can be constructed automatically from regulatory
sources or how provenance can be preserved at scale.
In real-world settings, ESG metric knowledge is derived from
multiple sources, among which regulatory and quasi-regulatory
arXiv:2512.01289v2 [cs.AI] 26 Jan 2026
Mingqin Yu, Fethi Rabhi, Boming Xia, Zhengyi Yang, Felix Tan, and Qi
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.