Java with Auto-parallelization on Graphics Coprocessing Architecture

Han, G; Zhang, C; Lam, KT; Wang, CL

File Download

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICPP.2013.62
Scopus: eid_2-s2.0-84893224794
WOS: WOS:000330046000052
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Java with Auto-parallelization on Graphics Coprocessing Architecture

Title	Java with Auto-parallelization on Graphics Coprocessing Architecture
Authors	Han, G Zhang, C Lam, KT Wang, CL
Issue Date	2013
Publisher	I E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540
Citation	The 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509 How to Cite? DOI: http://dx.doi.org/10.1109/ICPP.2013.62
Abstract	GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively.
Persistent Identifier	http://hdl.handle.net/10722/189651
ISSN	0190-3918 2020 SCImago Journal Rankings: 0.269
ISI Accession Number ID	WOS:000330046000052

DC Field	Value	Language
dc.contributor.author	Han, G	en_US
dc.contributor.author	Zhang, C	en_US
dc.contributor.author	Lam, KT	en_US
dc.contributor.author	Wang, CL	en_US
dc.date.accessioned	2013-09-17T14:50:38Z	-
dc.date.available	2013-09-17T14:50:38Z	-
dc.date.issued	2013	en_US
dc.identifier.citation	The 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509	en_US
dc.identifier.issn	0190-3918	-
dc.identifier.uri	http://hdl.handle.net/10722/189651	-
dc.description.abstract	GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively.	-
dc.language	eng	en_US
dc.publisher	I E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540	-
dc.relation.ispartof	International Conference on Parallel Processing	en_US
dc.title	Java with Auto-parallelization on Graphics Coprocessing Architecture	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Lam, KT: kingtin@hku.hk	en_US
dc.identifier.email	Wang, CL: clwang@cs.hku.hk	en_US
dc.identifier.authority	Wang, CL=rp00183	en_US
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICPP.2013.62	-
dc.identifier.scopus	eid_2-s2.0-84893224794	-
dc.identifier.hkuros	225162	en_US
dc.identifier.spage	504	-
dc.identifier.epage	509	-
dc.identifier.isi	WOS:000330046000052	-
dc.publisher.place	United States	-
dc.identifier.issnl	0190-3918	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Java with Auto-parallelization on Graphics Coprocessing Architecture

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats