DiffMM: Multi-Modal Diffusion Model for Recommendation

Jiang, Yangqin; Xia, Lianghao; Wei, Wei; Luo, Da; Lin, Kangyi; Huang, Chao

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3664647.3681498
Scopus: eid_2-s2.0-85209780147

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: DiffMM: Multi-Modal Diffusion Model for Recommendation

Title	DiffMM: Multi-Modal Diffusion Model for Recommendation
Authors	Jiang, Yangqin Xia, Lianghao Wei, Wei Luo, Da Lin, Kangyi Huang, Chao
Keywords	diffusion model multi-modal recommendation
Issue Date	2024
Citation	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 2024, p. 7591-7599 How to Cite? DOI: http://dx.doi.org/10.1145/3664647.3681498
Abstract	The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniques to enhance recommender systems. However, these methods often rely on simplistic random augmentation or intuitive cross-view information, which can introduce irrelevant noise and fail to accurately align the multi-modal context with user-item interaction modeling. To fill this research gap, we propose a novel multi-modal graph diffusion model for recommendation called DiffMM. The proposed framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning, better aligning multi-modal feature information with collaborative relation modeling. Our approach leverages diffusion models' generative capabilities to automatically generate a user-item graph that is aware of different modalities, enabling the incorporation of useful multi-modal knowledge in modeling user-item interactions. We conduct extensive experiments on three public datasets, demonstrating the superiority of our DiffMM over various competitive baselines.
Persistent Identifier	http://hdl.handle.net/10722/355877

DC Field	Value	Language
dc.contributor.author	Jiang, Yangqin	-
dc.contributor.author	Xia, Lianghao	-
dc.contributor.author	Wei, Wei	-
dc.contributor.author	Luo, Da	-
dc.contributor.author	Lin, Kangyi	-
dc.contributor.author	Huang, Chao	-
dc.date.accessioned	2025-05-19T05:46:21Z	-
dc.date.available	2025-05-19T05:46:21Z	-
dc.date.issued	2024	-
dc.identifier.citation	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, 2024, p. 7591-7599	-
dc.identifier.uri	http://hdl.handle.net/10722/355877	-
dc.description.abstract	The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniques to enhance recommender systems. However, these methods often rely on simplistic random augmentation or intuitive cross-view information, which can introduce irrelevant noise and fail to accurately align the multi-modal context with user-item interaction modeling. To fill this research gap, we propose a novel multi-modal graph diffusion model for recommendation called DiffMM. The proposed framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning, better aligning multi-modal feature information with collaborative relation modeling. Our approach leverages diffusion models' generative capabilities to automatically generate a user-item graph that is aware of different modalities, enabling the incorporation of useful multi-modal knowledge in modeling user-item interactions. We conduct extensive experiments on three public datasets, demonstrating the superiority of our DiffMM over various competitive baselines.	-
dc.language	eng	-
dc.relation.ispartof	MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia	-
dc.subject	diffusion model	-
dc.subject	multi-modal	-
dc.subject	recommendation	-
dc.title	DiffMM: Multi-Modal Diffusion Model for Recommendation	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3664647.3681498	-
dc.identifier.scopus	eid_2-s2.0-85209780147	-
dc.identifier.spage	7591	-
dc.identifier.epage	7599	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: DiffMM: Multi-Modal Diffusion Model for Recommendation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats