File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Geocoding the past world: unearthing coordinates of early China from texts using generative AI

TitleGeocoding the past world: unearthing coordinates of early China from texts using generative AI
Authors
Keywordsearly China
gazetteer enrichment
generative AI
Historical toponym resolution
spatial humanities
Issue Date2025
Citation
International Journal of Geographical Information Science, 2025 How to Cite?
AbstractExtracting geographic information from historical texts presents unique challenges. To address these challenges, this study leverages generative large language models (LLMs) to extract historical toponyms and their corresponding location references from texts. The coordinates of the extracted toponyms are then identified by a historical geocoder, which also calculates their maximum error distances based on the location references, indicating the degree of uncertainty. Both the extraction and geocoding processes are integrated into a novel tool named ‘His-Geo’ (https://github.com/yukiyuqichen/His-Geo). To evaluate the results, this study also curates a manually annotated dataset, the Early China Historical Geographic Corpus (CHGC-Early), filling the gap in the absence of geographic data for early China in existing gazetteers and providing a benchmark dataset for training and evaluating approaches for tasks related to geographic information extraction from premodern Chinese texts. The evaluation results show a satisfactory 0.831 F1 score for the GPT-4o model, demonstrating the remarkable capability of generative large language models in extracting geographic information from lengthy, unstructured texts that encompass diverse and sometimes conflicting views.
Persistent Identifierhttp://hdl.handle.net/10722/365306
ISSN
2023 Impact Factor: 4.3
2023 SCImago Journal Rankings: 1.436

 

DC FieldValueLanguage
dc.contributor.authorChen, Yuqi-
dc.contributor.authorShang, Wenyi-
dc.contributor.authorWang, Hongsu-
dc.contributor.authorZhang, Sophia-
dc.contributor.authorBol, Peter K.-
dc.date.accessioned2025-11-04T09:40:09Z-
dc.date.available2025-11-04T09:40:09Z-
dc.date.issued2025-
dc.identifier.citationInternational Journal of Geographical Information Science, 2025-
dc.identifier.issn1365-8816-
dc.identifier.urihttp://hdl.handle.net/10722/365306-
dc.description.abstractExtracting geographic information from historical texts presents unique challenges. To address these challenges, this study leverages generative large language models (LLMs) to extract historical toponyms and their corresponding location references from texts. The coordinates of the extracted toponyms are then identified by a historical geocoder, which also calculates their maximum error distances based on the location references, indicating the degree of uncertainty. Both the extraction and geocoding processes are integrated into a novel tool named ‘His-Geo’ (https://github.com/yukiyuqichen/His-Geo). To evaluate the results, this study also curates a manually annotated dataset, the Early China Historical Geographic Corpus (CHGC-Early), filling the gap in the absence of geographic data for early China in existing gazetteers and providing a benchmark dataset for training and evaluating approaches for tasks related to geographic information extraction from premodern Chinese texts. The evaluation results show a satisfactory 0.831 F1 score for the GPT-4o model, demonstrating the remarkable capability of generative large language models in extracting geographic information from lengthy, unstructured texts that encompass diverse and sometimes conflicting views.-
dc.languageeng-
dc.relation.ispartofInternational Journal of Geographical Information Science-
dc.subjectearly China-
dc.subjectgazetteer enrichment-
dc.subjectgenerative AI-
dc.subjectHistorical toponym resolution-
dc.subjectspatial humanities-
dc.titleGeocoding the past world: unearthing coordinates of early China from texts using generative AI-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1080/13658816.2025.2491711-
dc.identifier.scopuseid_2-s2.0-105003496118-
dc.identifier.eissn1365-8824-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats