File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: A soft processor overlay with tightly-coupled FPGA accelerator
Title | A soft processor overlay with tightly-coupled FPGA accelerator |
---|---|
Authors | |
Issue Date | 2015 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ng, H. [吳浩彰]. (2015). A soft processor overlay with tightly-coupled FPGA accelerator. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760970 |
Abstract | FPGA overlays have shown the potential to improve designers’ productivity through balancing flexibility and ease of configuration of the underlying fabric while maintaining considerable overall performance promised by FPGAs. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while maintaining the benefits of being implemented within the same overlay framework.
This thesis presents an open-source soft processor that is tightly-coupled with FPGA accelerator as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and simplicity, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms.
Experiment shows that the integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time flexibility. The processor can be synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268:67MHz on Virtex-7 device. Synthesized results of the soft processor also display improvement on FPGA resource consumption and efficiency when compared to existing RISC-V design.
In addition, this thesis also presents an FPGA-centric approach that allows gateware to directly access the virtual memory space as part of the executing process without involving the CPU. It allows efficient access to memory in heterogeneous systems and complements traditional software-centric approach by providing a simplified memory access model to improve designers’ productivity and high-level compilation tools portability.
In this approach, a caching address translation buffer was implemented alongside the user FPGA gateware to provide runtime mapping between virtual and physical memory addresses. It coordinates with the OS running on the CPU to update address translations and to maintain memory consistency. The system was implemented on a commercial off-the-shelf FPGA add-on card to demonstrate the viability of such approach in low-cost systems.
Experiment with a 2D stencil computing application implemented with this FPGA-centric approach results in reasonable performance improvement when compared to a typical software-centric implementation; while the number of context switches between FPGA and CPU in both kernel and user mode was significantly reduced, freeing the CPU for other concurrent user tasks. |
Degree | Master of Philosophy |
Subject | Field programmable gate arrays |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/226791 |
HKU Library Item ID | b5760970 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ng, Ho-cheung | - |
dc.contributor.author | 吳浩彰 | - |
dc.date.accessioned | 2016-06-30T04:24:11Z | - |
dc.date.available | 2016-06-30T04:24:11Z | - |
dc.date.issued | 2015 | - |
dc.identifier.citation | Ng, H. [吳浩彰]. (2015). A soft processor overlay with tightly-coupled FPGA accelerator. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5760970 | - |
dc.identifier.uri | http://hdl.handle.net/10722/226791 | - |
dc.description.abstract | FPGA overlays have shown the potential to improve designers’ productivity through balancing flexibility and ease of configuration of the underlying fabric while maintaining considerable overall performance promised by FPGAs. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while maintaining the benefits of being implemented within the same overlay framework. This thesis presents an open-source soft processor that is tightly-coupled with FPGA accelerator as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and simplicity, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms. Experiment shows that the integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time flexibility. The processor can be synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268:67MHz on Virtex-7 device. Synthesized results of the soft processor also display improvement on FPGA resource consumption and efficiency when compared to existing RISC-V design. In addition, this thesis also presents an FPGA-centric approach that allows gateware to directly access the virtual memory space as part of the executing process without involving the CPU. It allows efficient access to memory in heterogeneous systems and complements traditional software-centric approach by providing a simplified memory access model to improve designers’ productivity and high-level compilation tools portability. In this approach, a caching address translation buffer was implemented alongside the user FPGA gateware to provide runtime mapping between virtual and physical memory addresses. It coordinates with the OS running on the CPU to update address translations and to maintain memory consistency. The system was implemented on a commercial off-the-shelf FPGA add-on card to demonstrate the viability of such approach in low-cost systems. Experiment with a 2D stencil computing application implemented with this FPGA-centric approach results in reasonable performance improvement when compared to a typical software-centric implementation; while the number of context switches between FPGA and CPU in both kernel and user mode was significantly reduced, freeing the CPU for other concurrent user tasks. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Field programmable gate arrays | - |
dc.title | A soft processor overlay with tightly-coupled FPGA accelerator | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5760970 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5760970 | - |
dc.identifier.mmsid | 991019898839703414 | - |