File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: Vision language model (VLM)-enabled street view analytics: a systematic literature review

TitleVision language model (VLM)-enabled street view analytics: a systematic literature review
Authors
KeywordsLarge language model
Multimodal learning
Street view analytics
Systematic literature review
Vision language model
Issue Date9-Dec-2025
PublisherEmerald
Citation
Engineering, Construction and Architectural Management, 2025, p. 1-19 How to Cite?
Abstract

Purpose

Street view analytics (SVA) is an emerging field focusing on the systematic analysis of street-level imagery to understand urban environments, which has rapidly advanced with the advent of vision language models (VLMs). Despite the significant advancements, a critical review of the applications of VLMs for SVA is lacking. This paper aims to fill this gap by providing a comprehensive literature review on VLM-enabled SVA.

Design/methodology/approach

This study adopts a Preferred Reporting Items for Systematic Reviews and Meta-Analyses-guided systematic literature review. After keyword retrieval, literature collection, thematic screening and a five-domain quality assessment (data representativeness, ground truth validity, model design and/or analytic rigor, validation and/or generalization and reporting and/or reproducibility), 69 VLM-enabled SVA studies (2020–2025) were selected. Five reviewers independently extracted and synthesized evidence, and inter-rater reliability was quantified to verify consistency.

Findings

The systematic analysis underscores the transformative potential of VLMs in SVA, emphasizing their multimodal data handling and open-domain knowledge integration. However, key challenges, while rooted in broader SVA limitations, manifest distinctly in VLM contexts: temporal dynamics, contextual reliance, annotation inconsistencies, computational demands and process transparency. Handling remains task-dependent, with future research focusing on city- and year-held-out temporal evaluation, robustness to street-level variability, retrieval-augmented generation for consistency, hybrid edge-cloud models and chain-of-thought prompting.

Originality/value

This study contributes to the field by synthesizing the latest development of VLMs for SVA, identifying avenues for future research and ultimately proposing an integrated workflow for enhancing VLMs' applications in SVA tasks.


Persistent Identifierhttp://hdl.handle.net/10722/369161
ISSN
2023 Impact Factor: 3.6
2023 SCImago Journal Rankings: 0.896

 

DC FieldValueLanguage
dc.contributor.authorPeng, Ziyu-
dc.contributor.authorLu, Weisheng-
dc.contributor.authorAn, Hongda-
dc.contributor.authorXia, Xianhua-
dc.contributor.authorZhang, Yi-
dc.contributor.authorXue, Fan-
dc.contributor.authorChen, Junjie-
dc.date.accessioned2026-01-20T08:35:17Z-
dc.date.available2026-01-20T08:35:17Z-
dc.date.issued2025-12-09-
dc.identifier.citationEngineering, Construction and Architectural Management, 2025, p. 1-19-
dc.identifier.issn0969-9988-
dc.identifier.urihttp://hdl.handle.net/10722/369161-
dc.description.abstract<p>Purpose</p><p>Street view analytics (SVA) is an emerging field focusing on the systematic analysis of street-level imagery to understand urban environments, which has rapidly advanced with the advent of vision language models (VLMs). Despite the significant advancements, a critical review of the applications of VLMs for SVA is lacking. This paper aims to fill this gap by providing a comprehensive literature review on VLM-enabled SVA.</p><p>Design/methodology/approach</p><p>This study adopts a Preferred Reporting Items for Systematic Reviews and Meta-Analyses-guided systematic literature review. After keyword retrieval, literature collection, thematic screening and a five-domain quality assessment (data representativeness, ground truth validity, model design and/or analytic rigor, validation and/or generalization and reporting and/or reproducibility), 69 VLM-enabled SVA studies (2020–2025) were selected. Five reviewers independently extracted and synthesized evidence, and inter-rater reliability was quantified to verify consistency.</p><p>Findings</p><p>The systematic analysis underscores the transformative potential of VLMs in SVA, emphasizing their multimodal data handling and open-domain knowledge integration. However, key challenges, while rooted in broader SVA limitations, manifest distinctly in VLM contexts: temporal dynamics, contextual reliance, annotation inconsistencies, computational demands and process transparency. Handling remains task-dependent, with future research focusing on city- and year-held-out temporal evaluation, robustness to street-level variability, retrieval-augmented generation for consistency, hybrid edge-cloud models and chain-of-thought prompting.</p><p>Originality/value</p><p>This study contributes to the field by synthesizing the latest development of VLMs for SVA, identifying avenues for future research and ultimately proposing an integrated workflow for enhancing VLMs' applications in SVA tasks.</p>-
dc.languageeng-
dc.publisherEmerald-
dc.relation.ispartofEngineering, Construction and Architectural Management-
dc.subjectLarge language model-
dc.subjectMultimodal learning-
dc.subjectStreet view analytics-
dc.subjectSystematic literature review-
dc.subjectVision language model-
dc.titleVision language model (VLM)-enabled street view analytics: a systematic literature review-
dc.typeArticle-
dc.identifier.doi10.1108/ECAM-07-2025-1133-
dc.identifier.scopuseid_2-s2.0-105025427009-
dc.identifier.spage1-
dc.identifier.epage19-
dc.identifier.eissn1365-232X-
dc.identifier.issnl0969-9988-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats