File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TMC.2024.3449645
- Scopus: eid_2-s2.0-85202773311
- WOS: WOS:001359244600283
- Find via

Supplementary
- Citations:
- Appears in Collections:
Article: Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering
| Title | Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering |
|---|---|
| Authors | |
| Keywords | Cross-Modal attention diffusion generative semantic communications mobile AIGC |
| Issue Date | 2024 |
| Citation | IEEE Transactions on Mobile Computing, 2024, v. 23, n. 12, p. 14871-14888 How to Cite? |
| Abstract | Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential transmission failures. In this paper, we apply cross-modal Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome wireless bandwidth constraints. Specifically, we utilize cross-modal attention maps to indicate the correlation between user prompts and each part of AIGC outputs. In this way, the MASP can analyze the prompt context and filter the most semantically important content efficiently. Only semantic information is transmitted, with which users can recover the entire AIGC output with high quality while saving mobile bandwidth. Since the transmitted information not only preserves the semantics but also prompts the recovery, we formulate a joint semantic encoding and prompt engineering problem to optimize the bandwidth allocation among users. Particularly, we present a human-perceptual metric named Joint Perceptual Similarity and Quality (JPSQ), which is fused by two learning-based measurements regarding semantic similarity and aesthetic quality, respectively. Furthermore, we develop the Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and leverages the diffusion process to enhance the environment exploration ability of traditional deep reinforcement learning (DRL). Extensive experiments demonstrate that our proposal can reduce the bandwidth consumption of mobile users by 49.4% on average, with almost no perceptual difference in AIGC output quality. Moreover, the ADD algorithm shows superior performance over baseline DRL methods, with 1.74× higher overall reward. |
| Persistent Identifier | http://hdl.handle.net/10722/353211 |
| ISSN | 2023 Impact Factor: 7.7 2023 SCImago Journal Rankings: 2.755 |
| ISI Accession Number ID |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Liu, Yinqiu | - |
| dc.contributor.author | Du, Hongyang | - |
| dc.contributor.author | Niyato, Dusit | - |
| dc.contributor.author | Kang, Jiawen | - |
| dc.contributor.author | Xiong, Zehui | - |
| dc.contributor.author | Mao, Shiwen | - |
| dc.contributor.author | Zhang, Ping | - |
| dc.contributor.author | Shen, Xuemin | - |
| dc.date.accessioned | 2025-01-13T03:02:39Z | - |
| dc.date.available | 2025-01-13T03:02:39Z | - |
| dc.date.issued | 2024 | - |
| dc.identifier.citation | IEEE Transactions on Mobile Computing, 2024, v. 23, n. 12, p. 14871-14888 | - |
| dc.identifier.issn | 1536-1233 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/353211 | - |
| dc.description.abstract | Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential transmission failures. In this paper, we apply cross-modal Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome wireless bandwidth constraints. Specifically, we utilize cross-modal attention maps to indicate the correlation between user prompts and each part of AIGC outputs. In this way, the MASP can analyze the prompt context and filter the most semantically important content efficiently. Only semantic information is transmitted, with which users can recover the entire AIGC output with high quality while saving mobile bandwidth. Since the transmitted information not only preserves the semantics but also prompts the recovery, we formulate a joint semantic encoding and prompt engineering problem to optimize the bandwidth allocation among users. Particularly, we present a human-perceptual metric named Joint Perceptual Similarity and Quality (JPSQ), which is fused by two learning-based measurements regarding semantic similarity and aesthetic quality, respectively. Furthermore, we develop the Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and leverages the diffusion process to enhance the environment exploration ability of traditional deep reinforcement learning (DRL). Extensive experiments demonstrate that our proposal can reduce the bandwidth consumption of mobile users by 49.4% on average, with almost no perceptual difference in AIGC output quality. Moreover, the ADD algorithm shows superior performance over baseline DRL methods, with 1.74× higher overall reward. | - |
| dc.language | eng | - |
| dc.relation.ispartof | IEEE Transactions on Mobile Computing | - |
| dc.subject | Cross-Modal attention | - |
| dc.subject | diffusion | - |
| dc.subject | generative semantic communications | - |
| dc.subject | mobile AIGC | - |
| dc.title | Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering | - |
| dc.type | Article | - |
| dc.description.nature | published_or_final_version | - |
| dc.identifier.doi | 10.1109/TMC.2024.3449645 | - |
| dc.identifier.scopus | eid_2-s2.0-85202773311 | - |
| dc.identifier.volume | 23 | - |
| dc.identifier.issue | 12 | - |
| dc.identifier.spage | 14871 | - |
| dc.identifier.epage | 14888 | - |
| dc.identifier.eissn | 1558-0660 | - |
| dc.identifier.isi | WOS:001359244600283 | - |
