GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs

Zhang, C; Han, G; Wang, CL

File Download

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CCGrid.2013.34
Scopus: eid_2-s2.0-84881269753
WOS: WOS:000325006300019

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs

Title	GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs
Authors	Zhang, C Han, G Wang, CL
Keywords	GPGPU Speculative loop parallelization Thread-level speculation (TLS) GPU-TLS
Issue Date	2013
Publisher	IEEE Computer Society. The Journal's website is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000093
Citation	The 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2013), Delft, Netherlands, 13-16 May 2013. In Conference Proceedings, 2013, p. 120-127 How to Cite? DOI: http://dx.doi.org/10.1109/CCGrid.2013.34
Abstract	Recently GPUs have risen as one important parallel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and OpenCL. To ease the programming efforts, some research has proposed automatically generating parallel GPU codes by complex compile-time techniques. However, this approach can only parallelize loops 100% free of inter-iteration dependencies (i.e., DOALL loops). To exploit runtime parallelism, which cannot be proven by static analysis, in this work, we propose GPU-TLS, a runtime system to speculatively parallelize possibly-parallel loops in sequential programs on GPUs. GPU-TLS parallelizes a possibly-parallel loop by chopping it into smaller sub-loops, each of which is executed in parallel by a GPU kernel, speculating that no inter-iteration dependencies exist. After dependency checking, the buffered writes of iterations without mis-speculations are copied to the master memory while iterations encountering mis-speculations are re-executed. GPU-TLS addresses several key problems of speculative loop parallelization on GPUs: (1) The larger mis-speculation rate caused by larger number of threads is reduced by three approaches: the loop chopping parallelization approach, the deferred memory update scheme and intra-warp value forwarding method. (2) The larger overhead of dependency checking is reduced by a hybrid scheme: eager intra-warp dependency checking combined with lazy inter-warp dependency checking. (3) The bottleneck of serial commit is alleviated by a parallel commit scheme, which allows different iterations to enter the commit phase out of order but still guarantees sequential semantics. Extensive evaluations using both microbenchmarks and reallife applications on two recent NVIDIA GPU cards show that speculative loop parallelization using GPU-TLS can achieve speedups ranging from 5 to 160 for sequential programs with possibly-parallel loops. © 2013 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/189638
ISBN	978-0-7695-4996-5
ISI Accession Number ID	WOS:000325006300019

DC Field	Value	Language
dc.contributor.author	Zhang, C	en_US
dc.contributor.author	Han, G	en_US
dc.contributor.author	Wang, CL	en_US
dc.date.accessioned	2013-09-17T14:50:33Z	-
dc.date.available	2013-09-17T14:50:33Z	-
dc.date.issued	2013	en_US
dc.identifier.citation	The 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2013), Delft, Netherlands, 13-16 May 2013. In Conference Proceedings, 2013, p. 120-127	en_US
dc.identifier.isbn	978-0-7695-4996-5	-
dc.identifier.uri	http://hdl.handle.net/10722/189638	-
dc.description.abstract	Recently GPUs have risen as one important parallel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and OpenCL. To ease the programming efforts, some research has proposed automatically generating parallel GPU codes by complex compile-time techniques. However, this approach can only parallelize loops 100% free of inter-iteration dependencies (i.e., DOALL loops). To exploit runtime parallelism, which cannot be proven by static analysis, in this work, we propose GPU-TLS, a runtime system to speculatively parallelize possibly-parallel loops in sequential programs on GPUs. GPU-TLS parallelizes a possibly-parallel loop by chopping it into smaller sub-loops, each of which is executed in parallel by a GPU kernel, speculating that no inter-iteration dependencies exist. After dependency checking, the buffered writes of iterations without mis-speculations are copied to the master memory while iterations encountering mis-speculations are re-executed. GPU-TLS addresses several key problems of speculative loop parallelization on GPUs: (1) The larger mis-speculation rate caused by larger number of threads is reduced by three approaches: the loop chopping parallelization approach, the deferred memory update scheme and intra-warp value forwarding method. (2) The larger overhead of dependency checking is reduced by a hybrid scheme: eager intra-warp dependency checking combined with lazy inter-warp dependency checking. (3) The bottleneck of serial commit is alleviated by a parallel commit scheme, which allows different iterations to enter the commit phase out of order but still guarantees sequential semantics. Extensive evaluations using both microbenchmarks and reallife applications on two recent NVIDIA GPU cards show that speculative loop parallelization using GPU-TLS can achieve speedups ranging from 5 to 160 for sequential programs with possibly-parallel loops. © 2013 IEEE.	-
dc.language	eng	en_US
dc.publisher	IEEE Computer Society. The Journal's website is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000093	-
dc.relation.ispartof	IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Proceedings	en_US
dc.subject	GPGPU	-
dc.subject	Speculative loop parallelization	-
dc.subject	Thread-level speculation (TLS)	-
dc.subject	GPU-TLS	-
dc.title	GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Zhang, C: cgzhang@cs.hku.hk	en_US
dc.identifier.email	Han, G: gdhan@cs.hku.hk	-
dc.identifier.email	Wang, CL: clwang@cs.hku.hk	-
dc.identifier.authority	Wang, CL=rp00183	en_US
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CCGrid.2013.34	-
dc.identifier.scopus	eid_2-s2.0-84881269753	-
dc.identifier.hkuros	223371	en_US
dc.identifier.spage	120	-
dc.identifier.epage	127	-
dc.identifier.isi	WOS:000325006300019	-
dc.publisher.place	United States	en_US
dc.customcontrol.immutable	sml 131002	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: GPU-TLS: an efficient runtime for speculative loop parallelization on GPUs

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats