File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: On-GPU thread-data remapping for nested branch divergence

TitleOn-GPU thread-data remapping for nested branch divergence
Authors
KeywordsGPGPU
Branch divergence
SIMD
Race condition
Issue Date2020
PublisherAcademic Press. The Journal's web site is located at http://www.elsevier.com/locate/jpdc
Citation
Journal of Parallel and Distributed Computing, 2020, v. 139, p. 75-86 How to Cite?
AbstractNested branches are common in applications with decision trees. The more layers in the branch nest, the larger slowdown is caused by nested branch divergence on GPU. Since inner branches are impractical to evaluate on host end, thread-data remapping via GPU shared memory is so far the most suitable solution. However, existing solution cannot handle inner branches directly due to undefined behavior of GPU barrier function when executed within branch statements. Race condition needs to be prevented without using barrier function. Targeting nested divergence, we propose NeX as a nested extension scheme featuring an inter-thread protocol that supports sub-workgroup synchronization. We further exploit the on-the-fly nature of Head-or-Tail (HoT) algorithm and propose HoT2 with enhanced flexibility of wavefront scheduling. Evaluated on four GPU models including NVIDIA Volta and Turing, HoT2 confirms to be more efficient. For benchmarks with branch nests up to five-layer-deep, NeX further boosts performance by up to 1.56x.
Persistent Identifierhttp://hdl.handle.net/10722/283321
ISSN
2023 Impact Factor: 3.4
2023 SCImago Journal Rankings: 1.187
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLIN, H-
dc.contributor.authorWang, CL-
dc.date.accessioned2020-06-22T02:55:00Z-
dc.date.available2020-06-22T02:55:00Z-
dc.date.issued2020-
dc.identifier.citationJournal of Parallel and Distributed Computing, 2020, v. 139, p. 75-86-
dc.identifier.issn0743-7315-
dc.identifier.urihttp://hdl.handle.net/10722/283321-
dc.description.abstractNested branches are common in applications with decision trees. The more layers in the branch nest, the larger slowdown is caused by nested branch divergence on GPU. Since inner branches are impractical to evaluate on host end, thread-data remapping via GPU shared memory is so far the most suitable solution. However, existing solution cannot handle inner branches directly due to undefined behavior of GPU barrier function when executed within branch statements. Race condition needs to be prevented without using barrier function. Targeting nested divergence, we propose NeX as a nested extension scheme featuring an inter-thread protocol that supports sub-workgroup synchronization. We further exploit the on-the-fly nature of Head-or-Tail (HoT) algorithm and propose HoT2 with enhanced flexibility of wavefront scheduling. Evaluated on four GPU models including NVIDIA Volta and Turing, HoT2 confirms to be more efficient. For benchmarks with branch nests up to five-layer-deep, NeX further boosts performance by up to 1.56x.-
dc.languageeng-
dc.publisherAcademic Press. The Journal's web site is located at http://www.elsevier.com/locate/jpdc-
dc.relation.ispartofJournal of Parallel and Distributed Computing-
dc.subjectGPGPU-
dc.subjectBranch divergence-
dc.subjectSIMD-
dc.subjectRace condition-
dc.titleOn-GPU thread-data remapping for nested branch divergence-
dc.typeArticle-
dc.identifier.emailWang, CL: clwang@cs.hku.hk-
dc.identifier.authorityWang, CL=rp00183-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.jpdc.2020.02.003-
dc.identifier.scopuseid_2-s2.0-85079824306-
dc.identifier.hkuros310355-
dc.identifier.volume139-
dc.identifier.spage75-
dc.identifier.epage86-
dc.identifier.isiWOS:000520948700007-
dc.publisher.placeUnited States-
dc.identifier.issnl0743-7315-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats