Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication

There are no files associated with this item.

Title	Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication
Authors	Jiang, Chenyu Tian, Ye Jia, Zhen Wu, Chuan Wang, Yida Zheng, Shuai
Issue Date	15-May-2024
Persistent Identifier	http://hdl.handle.net/10722/347385

DC Field	Value	Language
dc.contributor.author	Jiang, Chenyu	-
dc.contributor.author	Tian, Ye	-
dc.contributor.author	Jia, Zhen	-
dc.contributor.author	Wu, Chuan	-
dc.contributor.author	Wang, Yida	-
dc.contributor.author	Zheng, Shuai	-
dc.date.accessioned	2024-09-23T00:30:15Z	-
dc.date.available	2024-09-23T00:30:15Z	-
dc.date.issued	2024-05-15	-
dc.identifier.uri	http://hdl.handle.net/10722/347385	-
dc.language	eng	-
dc.relation.ispartof	The Seventh Conference on Machine Learning and Systems (MLSys) (13/05/2024-16/05/2024, Santa Clara)	-
dc.title	Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication	-
dc.type	Conference_Paper	-