Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers

Hwang, K; Wang, C; Wang, CL

File Download

26867.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/HPCA.1997.569646
Scopus: eid_2-s2.0-0030784082
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers

Title	Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers
Authors	Hwang, K Wang, C Wang, CL
Keywords	Collective communications Multicomputers Message passing Startup latency Aggregated bandwidth
Issue Date	1997
Publisher	IEEE.
Citation	The 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115 How to Cite? DOI: http://dx.doi.org/10.1109/HPCA.1997.569646
Abstract	We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication.
Persistent Identifier	http://hdl.handle.net/10722/45577
ISSN	1530-0897 2020 SCImago Journal Rankings: 0.910
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Hwang, K	en_HK
dc.contributor.author	Wang, C	en_HK
dc.contributor.author	Wang, CL	en_HK
dc.date.accessioned	2007-10-30T06:29:34Z	-
dc.date.available	2007-10-30T06:29:34Z	-
dc.date.issued	1997	en_HK
dc.identifier.citation	The 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115	en_HK
dc.identifier.issn	1530-0897	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/45577	-
dc.description.abstract	We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication.	en_HK
dc.format.extent	962712 bytes	-
dc.format.extent	6534 bytes	-
dc.format.extent	2160 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	text/plain	-
dc.format.mimetype	text/plain	-
dc.language	eng	en_HK
dc.publisher	IEEE.	en_HK
dc.relation.ispartof	IEEE High-Performance Computer Architecture Symposium Proceedings	-
dc.rights	©1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.	-
dc.subject	Collective communications	en_HK
dc.subject	Multicomputers	en_HK
dc.subject	Message passing	en_HK
dc.subject	Startup latency	en_HK
dc.subject	Aggregated bandwidth	en_HK
dc.title	Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1530-0897&volume=&spage=106&epage=115&date=1997&atitle=Evaluating+MPI+collective+communication+on+the+SP2,+T3D,+and+Paragon+multicomputers	en_HK
dc.identifier.email	Wang, C: clwang@cs.hku.hk	-
dc.identifier.authority	Wang, C=rp00183	-
dc.description.nature	published_or_final_version	en_HK
dc.identifier.doi	10.1109/HPCA.1997.569646	en_HK
dc.identifier.scopus	eid_2-s2.0-0030784082	-
dc.identifier.hkuros	26867	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-0030784082&selection=ref&src=s&origin=recordpage	-
dc.identifier.spage	106	-
dc.identifier.epage	115	-
dc.identifier.scopusauthorid	Hwang, Kai=7402426691	-
dc.identifier.scopusauthorid	Wang, Choming=7501630962	-
dc.identifier.scopusauthorid	Wang, ChoLi=7501646188	-
dc.customcontrol.immutable	sml 160111 - merged	-
dc.identifier.issnl	1530-0897	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats