File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing

TitleMultimodal Semantic Communication for Generative Audio-Driven Video Conferencing
Authors
Keywordsgenerative adversarial network
Multimodal semantic communication
video generation
Issue Date2024
Citation
IEEE Wireless Communications Letters, 2024 How to Cite?
AbstractThis paper studies an efficient multimodal data communication scheme for video conferencing. In our considered system, a speaker gives a talk to the audiences, with talking head video and audio being transmitted. Since the speaker does not frequently change posture and high-fidelity transmission of audio (speech and music) is required, redundant visual video data exists and can be removed by generating the video from the audio. To this end, we propose a wave-to-video (Wav2Vid) system, an efficient video transmission framework that reduces transmitted data by generating talking head video from audio. In particular, full-duration audio and short-duration video data are synchronously transmitted through a wireless channel, with neural networks (NNs) extracting and encoding audio and video semantics. The receiver then combines the decoded audio and video data, as well as uses a generative adversarial network (GAN) based model to generate the lip movement videos of the speaker. Simulation results show that the proposed Wav2Vid system can reduce the amount of transmitted data by up to 83% while maintaining the perceptual quality of the generated conferencing video.
Persistent Identifierhttp://hdl.handle.net/10722/353228
ISSN
2023 Impact Factor: 4.6
2023 SCImago Journal Rankings: 2.872
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorTong, Haonan-
dc.contributor.authorLi, Haopeng-
dc.contributor.authorDu, Hongyang-
dc.contributor.authorYang, Zhaohui-
dc.contributor.authorYin, Changchuan-
dc.contributor.authorNiyato, Dusit-
dc.date.accessioned2025-01-13T03:02:44Z-
dc.date.available2025-01-13T03:02:44Z-
dc.date.issued2024-
dc.identifier.citationIEEE Wireless Communications Letters, 2024-
dc.identifier.issn2162-2337-
dc.identifier.urihttp://hdl.handle.net/10722/353228-
dc.description.abstractThis paper studies an efficient multimodal data communication scheme for video conferencing. In our considered system, a speaker gives a talk to the audiences, with talking head video and audio being transmitted. Since the speaker does not frequently change posture and high-fidelity transmission of audio (speech and music) is required, redundant visual video data exists and can be removed by generating the video from the audio. To this end, we propose a wave-to-video (Wav2Vid) system, an efficient video transmission framework that reduces transmitted data by generating talking head video from audio. In particular, full-duration audio and short-duration video data are synchronously transmitted through a wireless channel, with neural networks (NNs) extracting and encoding audio and video semantics. The receiver then combines the decoded audio and video data, as well as uses a generative adversarial network (GAN) based model to generate the lip movement videos of the speaker. Simulation results show that the proposed Wav2Vid system can reduce the amount of transmitted data by up to 83% while maintaining the perceptual quality of the generated conferencing video.-
dc.languageeng-
dc.relation.ispartofIEEE Wireless Communications Letters-
dc.subjectgenerative adversarial network-
dc.subjectMultimodal semantic communication-
dc.subjectvideo generation-
dc.titleMultimodal Semantic Communication for Generative Audio-Driven Video Conferencing-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/LWC.2024.3488859-
dc.identifier.scopuseid_2-s2.0-85208406401-
dc.identifier.eissn2162-2345-
dc.identifier.isiWOS:001395714200025-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats