Resource scaling effects on MPP performance: The STAP benchmark implications

Hwang, K; Wang, C; Wang, CL; Xu, Z

File Download

46234.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/71.770197
Scopus: eid_2-s2.0-0032638174
WOS: WOS:000080635600007
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles

Article: Resource scaling effects on MPP performance: The STAP benchmark implications

Title	Resource scaling effects on MPP performance: The STAP benchmark implications
Authors	Hwang, K Wang, C Wang, CL Xu, Z
Issue Date	1999
Publisher	I E E E. The Journal's web site is located at http://www.computer.org/tpds
Citation	Ieee Transactions On Parallel And Distributed Systems, 1999, v. 10 n. 5, p. 509-527 How to Cite? DOI: http://dx.doi.org/10.1109/71.770197
Abstract	Presently, massively parallel processors (MPPs) are available only in a few commercial models. A sequence of three ASCI Teraflops MPPs has appeared before the new millenium. This paper evaluates six MPP systems through STAP benchmark experiments. The STAP is a radar signal processing benchmark which exploits regularly structured SPMD data parallelism. We reveal the resource scaling effects on MPP performance along orthogonal dimensions of machine size, processor speed, memory capacity, messaging latency, and network bandwidth. We show how to achieve balanced resources scaling against enlarged workload (problem size). Among three commercial MPPs, the IBM SP2 shows the highest speed and efficiency, attributed to its well-designed network with middleware support for single system image. The Cray T3D demonstrates a high network bandwidth with a good NUMA memory hierarchy. The Intel Paragon trails far behind due to slow processors used and excessive latency experienced in passing messages. Our analysis projects the lowest STAP speed on the ASCI Red, compared with the projected speed of two ASCI Blue machines. This is attributed to slow processors used in ASCI Red and the mismatch between its hardware and software. The Blue Pacific shows the highest potential to deliver scalable performance up to thousands of nodes. The Blue Mountain is designed to have the highest network bandwidth. Our results suggest a limit on the scalability of the distributed shared-memory (DSM) architecture adopted in Blue Mountain. The scaling model offers a quantitative method to match resource scaling with problem scaling to yield a truly scalable performance. The model helps MPP designers optimize the processors, memory, network, and I/O subsystems of an MPP. For MPP users, the scaling results can be applied to partition a large workload for SPMD execution or to minimize the software overhead in collective communication or remote memory update operations. Finally, our scaling model is assessed to evaluate MPPs with benchmarks other than STAP.
Persistent Identifier	http://hdl.handle.net/10722/42822
ISSN	1045-9219 2023 Impact Factor: 5.6 2023 SCImago Journal Rankings: 2.340
ISI Accession Number ID	WOS:000080635600007
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Hwang, K	en_HK
dc.contributor.author	Wang, C	en_HK
dc.contributor.author	Wang, CL	en_HK
dc.contributor.author	Xu, Z	en_HK
dc.date.accessioned	2007-03-23T04:32:50Z	-
dc.date.available	2007-03-23T04:32:50Z	-
dc.date.issued	1999	en_HK
dc.identifier.citation	Ieee Transactions On Parallel And Distributed Systems, 1999, v. 10 n. 5, p. 509-527	en_HK
dc.identifier.issn	1045-9219	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/42822	-
dc.description.abstract	Presently, massively parallel processors (MPPs) are available only in a few commercial models. A sequence of three ASCI Teraflops MPPs has appeared before the new millenium. This paper evaluates six MPP systems through STAP benchmark experiments. The STAP is a radar signal processing benchmark which exploits regularly structured SPMD data parallelism. We reveal the resource scaling effects on MPP performance along orthogonal dimensions of machine size, processor speed, memory capacity, messaging latency, and network bandwidth. We show how to achieve balanced resources scaling against enlarged workload (problem size). Among three commercial MPPs, the IBM SP2 shows the highest speed and efficiency, attributed to its well-designed network with middleware support for single system image. The Cray T3D demonstrates a high network bandwidth with a good NUMA memory hierarchy. The Intel Paragon trails far behind due to slow processors used and excessive latency experienced in passing messages. Our analysis projects the lowest STAP speed on the ASCI Red, compared with the projected speed of two ASCI Blue machines. This is attributed to slow processors used in ASCI Red and the mismatch between its hardware and software. The Blue Pacific shows the highest potential to deliver scalable performance up to thousands of nodes. The Blue Mountain is designed to have the highest network bandwidth. Our results suggest a limit on the scalability of the distributed shared-memory (DSM) architecture adopted in Blue Mountain. The scaling model offers a quantitative method to match resource scaling with problem scaling to yield a truly scalable performance. The model helps MPP designers optimize the processors, memory, network, and I/O subsystems of an MPP. For MPP users, the scaling results can be applied to partition a large workload for SPMD execution or to minimize the software overhead in collective communication or remote memory update operations. Finally, our scaling model is assessed to evaluate MPPs with benchmarks other than STAP.	en_HK
dc.format.extent	995649 bytes	-
dc.format.extent	25600 bytes	-
dc.format.mimetype	application/pdf	-
dc.format.mimetype	application/msword	-
dc.language	eng	en_HK
dc.publisher	I E E E. The Journal's web site is located at http://www.computer.org/tpds	en_HK
dc.relation.ispartof	IEEE Transactions on Parallel and Distributed Systems	en_HK
dc.rights	©1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.	-
dc.title	Resource scaling effects on MPP performance: The STAP benchmark implications	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1045-9219&volume=10&issue=5&spage=509&epage=527&date=1999&atitle=Resource+scaling+effects+on+MPP+performance:+the+STAP+benchmark+implications	en_HK
dc.identifier.email	Wang, CL:clwang@cs.hku.hk	en_HK
dc.identifier.authority	Wang, CL=rp00183	en_HK
dc.description.nature	published_or_final_version	en_HK
dc.identifier.doi	10.1109/71.770197	en_HK
dc.identifier.scopus	eid_2-s2.0-0032638174	en_HK
dc.identifier.hkuros	46234	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-0032638174&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	10	en_HK
dc.identifier.issue	5	en_HK
dc.identifier.spage	509	en_HK
dc.identifier.epage	527	en_HK
dc.identifier.isi	WOS:000080635600007	-
dc.publisher.place	United States	en_HK
dc.identifier.scopusauthorid	Hwang, K=7402426691	en_HK
dc.identifier.scopusauthorid	Wang, C=7501630962	en_HK
dc.identifier.scopusauthorid	Wang, CL=7501646188	en_HK
dc.identifier.scopusauthorid	Xu, Z=7405426306	en_HK
dc.identifier.issnl	1045-9219	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Resource scaling effects on MPP performance: The STAP benchmark implications

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats