File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers

TitleEvaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers
Authors
KeywordsCollective communications
Multicomputers
Message passing
Startup latency
Aggregated bandwidth
Issue Date1997
PublisherIEEE.
Citation
The 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115 How to Cite?
AbstractWe evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication.
Persistent Identifierhttp://hdl.handle.net/10722/45577
ISSN
2020 SCImago Journal Rankings: 0.910
References

 

DC FieldValueLanguage
dc.contributor.authorHwang, Ken_HK
dc.contributor.authorWang, Cen_HK
dc.contributor.authorWang, CLen_HK
dc.date.accessioned2007-10-30T06:29:34Z-
dc.date.available2007-10-30T06:29:34Z-
dc.date.issued1997en_HK
dc.identifier.citationThe 3rd International Conference on High-Performance Computer Architecture Proceedings, San Antonio, TX., 1-5 February 1997, p. 106-115en_HK
dc.identifier.issn1530-0897en_HK
dc.identifier.urihttp://hdl.handle.net/10722/45577-
dc.description.abstractWe evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication.en_HK
dc.format.extent962712 bytes-
dc.format.extent6534 bytes-
dc.format.extent2160 bytes-
dc.format.mimetypeapplication/pdf-
dc.format.mimetypetext/plain-
dc.format.mimetypetext/plain-
dc.languageengen_HK
dc.publisherIEEE.en_HK
dc.relation.ispartofIEEE High-Performance Computer Architecture Symposium Proceedings-
dc.rights©1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.-
dc.subjectCollective communicationsen_HK
dc.subjectMulticomputersen_HK
dc.subjectMessage passingen_HK
dc.subjectStartup latencyen_HK
dc.subjectAggregated bandwidthen_HK
dc.titleEvaluating MPI collective communication on the SP2, T3D, and Paragon multicomputersen_HK
dc.typeConference_Paperen_HK
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1530-0897&volume=&spage=106&epage=115&date=1997&atitle=Evaluating+MPI+collective+communication+on+the+SP2,+T3D,+and+Paragon+multicomputersen_HK
dc.identifier.emailWang, C: clwang@cs.hku.hk-
dc.identifier.authorityWang, C=rp00183-
dc.description.naturepublished_or_final_versionen_HK
dc.identifier.doi10.1109/HPCA.1997.569646en_HK
dc.identifier.scopuseid_2-s2.0-0030784082-
dc.identifier.hkuros26867-
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-0030784082&selection=ref&src=s&origin=recordpage-
dc.identifier.spage106-
dc.identifier.epage115-
dc.identifier.scopusauthoridHwang, Kai=7402426691-
dc.identifier.scopusauthoridWang, Choming=7501630962-
dc.identifier.scopusauthoridWang, ChoLi=7501646188-
dc.customcontrol.immutablesml 160111 - merged-
dc.identifier.issnl1530-0897-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats