File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models

TitleGEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models
Authors
KeywordsData synthesis
glass segmentation
segment anything
transfer learning
vision foundation models
Issue Date28-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Multimedia, 2025, v. 27 How to Cite?
Abstract

Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of annotated datasets and the design of network architectures. However, the evident drawback with these mainstream solutions lies in the time-consuming and labor-intensive process of curating datasets, alongside the increasing complexity of model structures. In this paper, we propose to address these issues by fully harnessing the capabilities of two existing vision foundation models (VFMs): Stable Diffusion and Segment Anything Model (SAM). Firstly, we construct a Synthetic but photorealistic large-scale Glass Surface Detection dataset, dubbed S-GSD, without any labour cost via Stable Diffusion. This dataset consists of four different scales, consisting of 168 k images totally with precise masks. Besides, based on the powerful segmentation ability of SAM, we devise a simple Glass surface sEgMentor named GEM, which follows the simple query-based encoder-decoder architecture. Comprehensive experiments are conducted on the large-scale glass segmentation dataset GSD-S. Our GEM establishes a new state-of-the-art performance with the help of these two VFMs, surpassing the best-reported method GlassSemNet with an IoU improvement of 2.1%. Additionally, extensive experiments demonstrate that our synthetic dataset S-GSD exhibits remarkable performance in zero-shot and transfer learning settings.


Persistent Identifierhttp://hdl.handle.net/10722/357469
ISSN
2023 Impact Factor: 8.4
2023 SCImago Journal Rankings: 2.260
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHao, Jing-
dc.contributor.authorLiu, Moyun-
dc.contributor.authorYang, Jinrong-
dc.contributor.authorHung, Kuo Feng-
dc.date.accessioned2025-07-22T03:12:56Z-
dc.date.available2025-07-22T03:12:56Z-
dc.date.issued2025-01-28-
dc.identifier.citationIEEE Transactions on Multimedia, 2025, v. 27-
dc.identifier.issn1520-9210-
dc.identifier.urihttp://hdl.handle.net/10722/357469-
dc.description.abstract<p>Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of annotated datasets and the design of network architectures. However, the evident drawback with these mainstream solutions lies in the time-consuming and labor-intensive process of curating datasets, alongside the increasing complexity of model structures. In this paper, we propose to address these issues by fully harnessing the capabilities of two existing vision foundation models (VFMs): Stable Diffusion and Segment Anything Model (SAM). Firstly, we construct a Synthetic but photorealistic large-scale Glass Surface Detection dataset, dubbed S-GSD, without any labour cost via Stable Diffusion. This dataset consists of four different scales, consisting of 168 k images totally with precise masks. Besides, based on the powerful segmentation ability of SAM, we devise a simple Glass surface sEgMentor named GEM, which follows the simple query-based encoder-decoder architecture. Comprehensive experiments are conducted on the large-scale glass segmentation dataset GSD-S. Our GEM establishes a new state-of-the-art performance with the help of these two VFMs, surpassing the best-reported method GlassSemNet with an IoU improvement of 2.1%. Additionally, extensive experiments demonstrate that our synthetic dataset S-GSD exhibits remarkable performance in zero-shot and transfer learning settings.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Multimedia-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectData synthesis-
dc.subjectglass segmentation-
dc.subjectsegment anything-
dc.subjecttransfer learning-
dc.subjectvision foundation models-
dc.titleGEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models-
dc.typeArticle-
dc.identifier.doi10.1109/TMM.2025.3535404-
dc.identifier.scopuseid_2-s2.0-85216857064-
dc.identifier.volume27-
dc.identifier.eissn1941-0077-
dc.identifier.isiWOS:001506811400008-
dc.identifier.issnl1520-9210-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats