File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/HPCA.1997.569646
- Scopus: eid_2-s2.0-0030784082
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers
Title | Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers |
---|---|
Authors | |
Keywords | Collective communications Multicomputers Message passing Startup latency Aggregated bandwidth |
Issue Date | 1997 |
Publisher | IEEE. |
Citation | The 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115 How to Cite? |
Abstract | We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication. |
Persistent Identifier | http://hdl.handle.net/10722/45577 |
ISSN | 2020 SCImago Journal Rankings: 0.910 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Hwang, K | en_HK |
dc.contributor.author | Wang, C | en_HK |
dc.contributor.author | Wang, CL | en_HK |
dc.date.accessioned | 2007-10-30T06:29:34Z | - |
dc.date.available | 2007-10-30T06:29:34Z | - |
dc.date.issued | 1997 | en_HK |
dc.identifier.citation | The 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115 | en_HK |
dc.identifier.issn | 1530-0897 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/45577 | - |
dc.description.abstract | We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication. | en_HK |
dc.format.extent | 962712 bytes | - |
dc.format.extent | 6534 bytes | - |
dc.format.extent | 2160 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.format.mimetype | text/plain | - |
dc.format.mimetype | text/plain | - |
dc.language | eng | en_HK |
dc.publisher | IEEE. | en_HK |
dc.relation.ispartof | IEEE High-Performance Computer Architecture Symposium Proceedings | - |
dc.rights | ©1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. | - |
dc.subject | Collective communications | en_HK |
dc.subject | Multicomputers | en_HK |
dc.subject | Message passing | en_HK |
dc.subject | Startup latency | en_HK |
dc.subject | Aggregated bandwidth | en_HK |
dc.title | Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers | en_HK |
dc.type | Conference_Paper | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1530-0897&volume=&spage=106&epage=115&date=1997&atitle=Evaluating+MPI+collective+communication+on+the+SP2,+T3D,+and+Paragon+multicomputers | en_HK |
dc.identifier.email | Wang, C: clwang@cs.hku.hk | - |
dc.identifier.authority | Wang, C=rp00183 | - |
dc.description.nature | published_or_final_version | en_HK |
dc.identifier.doi | 10.1109/HPCA.1997.569646 | en_HK |
dc.identifier.scopus | eid_2-s2.0-0030784082 | - |
dc.identifier.hkuros | 26867 | - |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-0030784082&selection=ref&src=s&origin=recordpage | - |
dc.identifier.spage | 106 | - |
dc.identifier.epage | 115 | - |
dc.identifier.scopusauthorid | Hwang, Kai=7402426691 | - |
dc.identifier.scopusauthorid | Wang, Choming=7501630962 | - |
dc.identifier.scopusauthorid | Wang, ChoLi=7501646188 | - |
dc.customcontrol.immutable | sml 160111 - merged | - |
dc.identifier.issnl | 1530-0897 | - |