ESGReveal: An LLM-based approach for extracting structured data from ESG reports

Zou, Yi; Shi, Mengying; Chen, Zhongjie; Deng, Zhu; Lei, Zongxiong; Zeng, Zihan; Yang, Shiming; Tong, Hongxiang; Xiao, Lei; Zhou, Wenwen

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.jclepro.2024.144572
WOS: WOS:001409239500001
Find via

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Geography: Journal/Magazine Articles

Article: ESGReveal: An LLM-based approach for extracting structured data from ESG reports

Title	ESGReveal: An LLM-based approach for extracting structured data from ESG reports
Authors	Zou, Yi Shi, Mengying Chen, Zhongjie Deng, Zhu Lei, Zongxiong Zeng, Zihan Yang, Shiming Tong, Hongxiang Xiao, Lei Zhou, Wenwen
Issue Date	15-Jan-2025
Publisher	Elsevier
Citation	Journal of Cleaner Production, 2025, v. 489 How to Cite? DOI: http://dx.doi.org/10.1016/j.jclepro.2024.144572
Abstract	As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval.
Persistent Identifier	http://hdl.handle.net/10722/358232
ISSN	0959-6526 2023 Impact Factor: 9.7 2023 SCImago Journal Rankings: 2.058
ISI Accession Number ID	WOS:001409239500001

DC Field	Value	Language
dc.contributor.author	Zou, Yi	-
dc.contributor.author	Shi, Mengying	-
dc.contributor.author	Chen, Zhongjie	-
dc.contributor.author	Deng, Zhu	-
dc.contributor.author	Lei, Zongxiong	-
dc.contributor.author	Zeng, Zihan	-
dc.contributor.author	Yang, Shiming	-
dc.contributor.author	Tong, Hongxiang	-
dc.contributor.author	Xiao, Lei	-
dc.contributor.author	Zhou, Wenwen	-
dc.date.accessioned	2025-07-26T00:30:30Z	-
dc.date.available	2025-07-26T00:30:30Z	-
dc.date.issued	2025-01-15	-
dc.identifier.citation	Journal of Cleaner Production, 2025, v. 489	-
dc.identifier.issn	0959-6526	-
dc.identifier.uri	http://hdl.handle.net/10722/358232	-
dc.description.abstract	<p>As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval. <br></p>	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Journal of Cleaner Production	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	ESGReveal: An LLM-based approach for extracting structured data from ESG reports	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.jclepro.2024.144572	-
dc.identifier.volume	489	-
dc.identifier.eissn	1879-1786	-
dc.identifier.isi	WOS:001409239500001	-
dc.identifier.issnl	0959-6526	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: ESGReveal: An LLM-based approach for extracting structured data from ESG reports

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats