File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1080/13658816.2025.2491711
- Scopus: eid_2-s2.0-105003496118
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Geocoding the past world: unearthing coordinates of early China from texts using generative AI
| Title | Geocoding the past world: unearthing coordinates of early China from texts using generative AI |
|---|---|
| Authors | |
| Keywords | early China gazetteer enrichment generative AI Historical toponym resolution spatial humanities |
| Issue Date | 2025 |
| Citation | International Journal of Geographical Information Science, 2025 How to Cite? |
| Abstract | Extracting geographic information from historical texts presents unique challenges. To address these challenges, this study leverages generative large language models (LLMs) to extract historical toponyms and their corresponding location references from texts. The coordinates of the extracted toponyms are then identified by a historical geocoder, which also calculates their maximum error distances based on the location references, indicating the degree of uncertainty. Both the extraction and geocoding processes are integrated into a novel tool named ‘His-Geo’ (https://github.com/yukiyuqichen/His-Geo). To evaluate the results, this study also curates a manually annotated dataset, the Early China Historical Geographic Corpus (CHGC-Early), filling the gap in the absence of geographic data for early China in existing gazetteers and providing a benchmark dataset for training and evaluating approaches for tasks related to geographic information extraction from premodern Chinese texts. The evaluation results show a satisfactory 0.831 F1 score for the GPT-4o model, demonstrating the remarkable capability of generative large language models in extracting geographic information from lengthy, unstructured texts that encompass diverse and sometimes conflicting views. |
| Persistent Identifier | http://hdl.handle.net/10722/365306 |
| ISSN | 2023 Impact Factor: 4.3 2023 SCImago Journal Rankings: 1.436 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Chen, Yuqi | - |
| dc.contributor.author | Shang, Wenyi | - |
| dc.contributor.author | Wang, Hongsu | - |
| dc.contributor.author | Zhang, Sophia | - |
| dc.contributor.author | Bol, Peter K. | - |
| dc.date.accessioned | 2025-11-04T09:40:09Z | - |
| dc.date.available | 2025-11-04T09:40:09Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | International Journal of Geographical Information Science, 2025 | - |
| dc.identifier.issn | 1365-8816 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/365306 | - |
| dc.description.abstract | Extracting geographic information from historical texts presents unique challenges. To address these challenges, this study leverages generative large language models (LLMs) to extract historical toponyms and their corresponding location references from texts. The coordinates of the extracted toponyms are then identified by a historical geocoder, which also calculates their maximum error distances based on the location references, indicating the degree of uncertainty. Both the extraction and geocoding processes are integrated into a novel tool named ‘His-Geo’ (https://github.com/yukiyuqichen/His-Geo). To evaluate the results, this study also curates a manually annotated dataset, the Early China Historical Geographic Corpus (CHGC-Early), filling the gap in the absence of geographic data for early China in existing gazetteers and providing a benchmark dataset for training and evaluating approaches for tasks related to geographic information extraction from premodern Chinese texts. The evaluation results show a satisfactory 0.831 F1 score for the GPT-4o model, demonstrating the remarkable capability of generative large language models in extracting geographic information from lengthy, unstructured texts that encompass diverse and sometimes conflicting views. | - |
| dc.language | eng | - |
| dc.relation.ispartof | International Journal of Geographical Information Science | - |
| dc.subject | early China | - |
| dc.subject | gazetteer enrichment | - |
| dc.subject | generative AI | - |
| dc.subject | Historical toponym resolution | - |
| dc.subject | spatial humanities | - |
| dc.title | Geocoding the past world: unearthing coordinates of early China from texts using generative AI | - |
| dc.type | Article | - |
| dc.description.nature | link_to_subscribed_fulltext | - |
| dc.identifier.doi | 10.1080/13658816.2025.2491711 | - |
| dc.identifier.scopus | eid_2-s2.0-105003496118 | - |
| dc.identifier.eissn | 1365-8824 | - |
