File Download
Supplementary

postgraduate thesis: QuickDough : a rapid FPGA loop accelerator design framework using soft coarse-grained reconfigurable array overlay

TitleQuickDough : a rapid FPGA loop accelerator design framework using soft coarse-grained reconfigurable array overlay
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, C. [刘成]. (2015). QuickDough : a rapid FPGA loop accelerator design framework using soft coarse-grained reconfigurable array overlay. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760939
AbstractThe use of FPGAs as accelerators for compute-intensive loops has been demonstrated by numerous researchers as an effective solution to meet both the performance and energy efficiency requirements across many application domains. However, the design productivity of developing FPGA accelerators remains much lower compared to the use of a typical software development flow. Although the use of high-level design tools may partly alleviate this shortcoming, the lengthy low-level FPGA implementation process including synthesis, placing and routing as well as the complex design space exploration (DSE) dramatically limits the number of compile-debug-edit cycles per day and hinders the widespread adoption of FPGAs. To address the design productivity problem, this disseration work has developed QuickDough, a framework that can rapidly generate the loop accelerators and their associated hardware-software interfaces. By utilizing a soft coarse-grained reconfigurable array (SCGRA) overlay as an intermediate fabric built on top of off-the-shelf FPGAs, QuickDough partitions the complex accelerator development flow into two paths. Along the rapid and common path, it transforms the loop kernel to data flow graph (DFG), schedules the DFG to the overlay through a rapid operation scheduling and then generates the FPGA accelerator bitstream through a rapid integration of the scheduling result and a partially implemented overlay bitstream selected from a pre-built accelerator library. By employing different selection algorithms, QuickDough allows users to perform trade-off between performance and compilation time. According to the experiments, QuickDough is able to produce accelerators in the order of seconds with pre-built library while achieving up to 9X performance speedup over the execution of the same software running on a hard ARM processor. Meanwhile, QuickDough also includes a relatively slow yet less frequent path which pre-builds an overlay based accelerator library targeting a group of applications. To expedite the library generation process, a representative set of accelerator configurations are chosen as the library and generated automatically using a template based system. In addition, intensive application-specific customization is also optional to produce accelerators with optimized performance or energy efficiency. By taking advantage of the regularity of the overlay based accelerators, the customization process can be two orders of magnitude faster compared to an exhaustive exploration while achieving similar performance. Finally, the underlying SCGRA overlay is the backbone of QuickDough and it is critical to the performance of the resulting FPGA loop accelerators. To that end, a highly pipelined SCGRA overlay template is developed to work at high clock frequency which helps to enhance both the performance and the energy efficiency. Also it is simple, easy to be extended and scalable for the customization specifically to various compute kernels.
DegreeDoctor of Philosophy
SubjectField programmable gate arrays
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/226764

 

DC FieldValueLanguage
dc.contributor.authorLiu, Cheng-
dc.contributor.author刘成-
dc.date.accessioned2016-06-30T04:24:06Z-
dc.date.available2016-06-30T04:24:06Z-
dc.date.issued2015-
dc.identifier.citationLiu, C. [刘成]. (2015). QuickDough : a rapid FPGA loop accelerator design framework using soft coarse-grained reconfigurable array overlay. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760939-
dc.identifier.urihttp://hdl.handle.net/10722/226764-
dc.description.abstractThe use of FPGAs as accelerators for compute-intensive loops has been demonstrated by numerous researchers as an effective solution to meet both the performance and energy efficiency requirements across many application domains. However, the design productivity of developing FPGA accelerators remains much lower compared to the use of a typical software development flow. Although the use of high-level design tools may partly alleviate this shortcoming, the lengthy low-level FPGA implementation process including synthesis, placing and routing as well as the complex design space exploration (DSE) dramatically limits the number of compile-debug-edit cycles per day and hinders the widespread adoption of FPGAs. To address the design productivity problem, this disseration work has developed QuickDough, a framework that can rapidly generate the loop accelerators and their associated hardware-software interfaces. By utilizing a soft coarse-grained reconfigurable array (SCGRA) overlay as an intermediate fabric built on top of off-the-shelf FPGAs, QuickDough partitions the complex accelerator development flow into two paths. Along the rapid and common path, it transforms the loop kernel to data flow graph (DFG), schedules the DFG to the overlay through a rapid operation scheduling and then generates the FPGA accelerator bitstream through a rapid integration of the scheduling result and a partially implemented overlay bitstream selected from a pre-built accelerator library. By employing different selection algorithms, QuickDough allows users to perform trade-off between performance and compilation time. According to the experiments, QuickDough is able to produce accelerators in the order of seconds with pre-built library while achieving up to 9X performance speedup over the execution of the same software running on a hard ARM processor. Meanwhile, QuickDough also includes a relatively slow yet less frequent path which pre-builds an overlay based accelerator library targeting a group of applications. To expedite the library generation process, a representative set of accelerator configurations are chosen as the library and generated automatically using a template based system. In addition, intensive application-specific customization is also optional to produce accelerators with optimized performance or energy efficiency. By taking advantage of the regularity of the overlay based accelerators, the customization process can be two orders of magnitude faster compared to an exhaustive exploration while achieving similar performance. Finally, the underlying SCGRA overlay is the backbone of QuickDough and it is critical to the performance of the resulting FPGA loop accelerators. To that end, a highly pipelined SCGRA overlay template is developed to work at high clock frequency which helps to enhance both the performance and the energy efficiency. Also it is simple, easy to be extended and scalable for the customization specifically to various compute kernels.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshField programmable gate arrays-
dc.titleQuickDough : a rapid FPGA loop accelerator design framework using soft coarse-grained reconfigurable array overlay-
dc.typePG_Thesis-
dc.identifier.hkulb5760939-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats