File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Java with Auto-parallelization on Graphics Coprocessing Architecture

TitleJava with Auto-parallelization on Graphics Coprocessing Architecture
Authors
Issue Date2013
PublisherI E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540
Citation
The 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509 How to Cite?
AbstractGPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively.
Persistent Identifierhttp://hdl.handle.net/10722/189651
ISSN
2020 SCImago Journal Rankings: 0.269
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHan, Gen_US
dc.contributor.authorZhang, Cen_US
dc.contributor.authorLam, KTen_US
dc.contributor.authorWang, CLen_US
dc.date.accessioned2013-09-17T14:50:38Z-
dc.date.available2013-09-17T14:50:38Z-
dc.date.issued2013en_US
dc.identifier.citationThe 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509en_US
dc.identifier.issn0190-3918-
dc.identifier.urihttp://hdl.handle.net/10722/189651-
dc.description.abstractGPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively.-
dc.languageengen_US
dc.publisherI E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540-
dc.relation.ispartofInternational Conference on Parallel Processingen_US
dc.titleJava with Auto-parallelization on Graphics Coprocessing Architectureen_US
dc.typeConference_Paperen_US
dc.identifier.emailLam, KT: kingtin@hku.hken_US
dc.identifier.emailWang, CL: clwang@cs.hku.hken_US
dc.identifier.authorityWang, CL=rp00183en_US
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/ICPP.2013.62-
dc.identifier.scopuseid_2-s2.0-84893224794-
dc.identifier.hkuros225162en_US
dc.identifier.spage504-
dc.identifier.epage509-
dc.identifier.isiWOS:000330046000052-
dc.publisher.placeUnited States-
dc.identifier.issnl0190-3918-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats