File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: ESGReveal: An LLM-based approach for extracting structured data from ESG reports

TitleESGReveal: An LLM-based approach for extracting structured data from ESG reports
Authors
Issue Date15-Jan-2025
PublisherElsevier
Citation
Journal of Cleaner Production, 2025, v. 489 How to Cite?
Abstract

As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval. 


Persistent Identifierhttp://hdl.handle.net/10722/358232
ISSN
2023 Impact Factor: 9.7
2023 SCImago Journal Rankings: 2.058
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorZou, Yi-
dc.contributor.authorShi, Mengying-
dc.contributor.authorChen, Zhongjie-
dc.contributor.authorDeng, Zhu-
dc.contributor.authorLei, Zongxiong-
dc.contributor.authorZeng, Zihan-
dc.contributor.authorYang, Shiming-
dc.contributor.authorTong, Hongxiang-
dc.contributor.authorXiao, Lei-
dc.contributor.authorZhou, Wenwen-
dc.date.accessioned2025-07-26T00:30:30Z-
dc.date.available2025-07-26T00:30:30Z-
dc.date.issued2025-01-15-
dc.identifier.citationJournal of Cleaner Production, 2025, v. 489-
dc.identifier.issn0959-6526-
dc.identifier.urihttp://hdl.handle.net/10722/358232-
dc.description.abstract<p>As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval. <br></p>-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofJournal of Cleaner Production-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleESGReveal: An LLM-based approach for extracting structured data from ESG reports -
dc.typeArticle-
dc.identifier.doi10.1016/j.jclepro.2024.144572-
dc.identifier.volume489-
dc.identifier.eissn1879-1786-
dc.identifier.isiWOS:001409239500001-
dc.identifier.issnl0959-6526-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats