File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks

TitleTask-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Authors
Keywordslarge multimodal models (LMMs)
resource allocation
Semantic communication
user attention
Issue Date1-Jan-2025
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Mobile Computing, 2025 How to Cite?
AbstractTask-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12dB and 33.1% at 10dB, respectively.
Persistent Identifierhttp://hdl.handle.net/10722/362104
ISSN
2023 Impact Factor: 7.7
2023 SCImago Journal Rankings: 2.755

 

DC FieldValueLanguage
dc.contributor.authorDu, Baoxia-
dc.contributor.authorDu, Hongyang-
dc.contributor.authorNiyato, Dusit-
dc.contributor.authorLi, Ruidong-
dc.date.accessioned2025-09-19T00:32:02Z-
dc.date.available2025-09-19T00:32:02Z-
dc.date.issued2025-01-01-
dc.identifier.citationIEEE Transactions on Mobile Computing, 2025-
dc.identifier.issn1536-1233-
dc.identifier.urihttp://hdl.handle.net/10722/362104-
dc.description.abstractTask-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12dB and 33.1% at 10dB, respectively.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Mobile Computing-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectlarge multimodal models (LMMs)-
dc.subjectresource allocation-
dc.subjectSemantic communication-
dc.subjectuser attention-
dc.titleTask-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks-
dc.typeArticle-
dc.identifier.doi10.1109/TMC.2025.3564543-
dc.identifier.scopuseid_2-s2.0-105003692530-
dc.identifier.eissn1558-0660-
dc.identifier.issnl1536-1233-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats