File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: A run-time hardware task execution framework for FPGA-accelerated heterogeneous cluster

TitleA run-time hardware task execution framework for FPGA-accelerated heterogeneous cluster
Authors
Advisors
Advisor(s):Lam, EYMSo, HKH
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Choi, Y. [蔡育明]. (2013). A run-time hardware task execution framework for FPGA-accelerated heterogeneous cluster. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5270558
AbstractThe era of big data has led to problems of unprecedented scale and complexity that are challenging the computing capability of conventional computer systems. One way to address the computational and communication challenges of such demanding applications is to incorporate the use of non-conventional hardware accelerators such as FPGAs into existing systems. By providing a mix of FPGAs and conventional CPUs as computing resources in a heterogeneous cluster, a distributed computing environment can be achieved to address the need of both compute-intensive and data-intensive applications. However, utilizing heterogeneous clusters requires application developers’ comprehensive knowledge on both hardware and software. In order to assist programmers to take advantage of the synergy between hardware and software easily, an easy-to-use framework for virtualizing the underlying FPGA computing resources of the heterogeneous cluster is motivated. In this work, a heterogeneous cluster consisting of both FPGAs and CPUs was built and a framework for managing multiple FPGAs across the cluster was designed. The major contribution of the framework is to provide an abstraction layer between the application developer and the underlying FPGA computing resources, so as to improve the overall design productivity. An inter-FPGA communication system was implemented such that gateware executing on FPGAs can communicate with each other autonomously to the CPU. Furthermore, to demonstrate a real-life application on the heterogeneous cluster, a generic k-means clustering application was implemented, using the MapReduce programming model. The implementation of the k-means application on multiple FPGAs was compared with a software-only version that was run on a Hadoop multi-core computer cluster. The performance results show that the FPGA version outperforms the Hadoop version across various parameters. An in-depth study on the communication bottleneck presented in the system was also carried out. A number of experiments were specifically designed to benchmark the performance of each I/O channel. The study shows that the major source of I/O bottleneck lies at the communication between the host system and the FPGA. This gives insight into programming considerations of potential applications on the cluster as well as improvement to the framework. Moreover, the benefit of multiple FPGAs was investigated through a series of experiments. Compared with putting all mappers on a single FPGA, it was found that distributing the same amount of mappers across more FPGAs can provide a tradeoff between FPGA resources and I/O performance.
DegreeMaster of Philosophy
SubjectHigh performance computing
Field programmable gate arrays
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/206679
HKU Library Item IDb5270558

 

DC FieldValueLanguage
dc.contributor.advisorLam, EYM-
dc.contributor.advisorSo, HKH-
dc.contributor.authorChoi, Yuk-ming-
dc.contributor.author蔡育明-
dc.date.accessioned2014-11-25T03:53:17Z-
dc.date.available2014-11-25T03:53:17Z-
dc.date.issued2013-
dc.identifier.citationChoi, Y. [蔡育明]. (2013). A run-time hardware task execution framework for FPGA-accelerated heterogeneous cluster. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5270558-
dc.identifier.urihttp://hdl.handle.net/10722/206679-
dc.description.abstractThe era of big data has led to problems of unprecedented scale and complexity that are challenging the computing capability of conventional computer systems. One way to address the computational and communication challenges of such demanding applications is to incorporate the use of non-conventional hardware accelerators such as FPGAs into existing systems. By providing a mix of FPGAs and conventional CPUs as computing resources in a heterogeneous cluster, a distributed computing environment can be achieved to address the need of both compute-intensive and data-intensive applications. However, utilizing heterogeneous clusters requires application developers’ comprehensive knowledge on both hardware and software. In order to assist programmers to take advantage of the synergy between hardware and software easily, an easy-to-use framework for virtualizing the underlying FPGA computing resources of the heterogeneous cluster is motivated. In this work, a heterogeneous cluster consisting of both FPGAs and CPUs was built and a framework for managing multiple FPGAs across the cluster was designed. The major contribution of the framework is to provide an abstraction layer between the application developer and the underlying FPGA computing resources, so as to improve the overall design productivity. An inter-FPGA communication system was implemented such that gateware executing on FPGAs can communicate with each other autonomously to the CPU. Furthermore, to demonstrate a real-life application on the heterogeneous cluster, a generic k-means clustering application was implemented, using the MapReduce programming model. The implementation of the k-means application on multiple FPGAs was compared with a software-only version that was run on a Hadoop multi-core computer cluster. The performance results show that the FPGA version outperforms the Hadoop version across various parameters. An in-depth study on the communication bottleneck presented in the system was also carried out. A number of experiments were specifically designed to benchmark the performance of each I/O channel. The study shows that the major source of I/O bottleneck lies at the communication between the host system and the FPGA. This gives insight into programming considerations of potential applications on the cluster as well as improvement to the framework. Moreover, the benefit of multiple FPGAs was investigated through a series of experiments. Compared with putting all mappers on a single FPGA, it was found that distributing the same amount of mappers across more FPGAs can provide a tradeoff between FPGA resources and I/O performance.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshHigh performance computing-
dc.subject.lcshField programmable gate arrays-
dc.titleA run-time hardware task execution framework for FPGA-accelerated heterogeneous cluster-
dc.typePG_Thesis-
dc.identifier.hkulb5270558-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5270558-
dc.identifier.mmsid991038815139703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats