File Download
Supplementary

postgraduate thesis: Efficient training and inference of deep neural networks

TitleEfficient training and inference of deep neural networks
Authors
Advisors
Advisor(s):So, HKHLam, EYM
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wang, M. [王茂林]. (2020). Efficient training and inference of deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDeep Neural Networks (DNNs) are widely used in many fields due to their superior performance. However, their computation complexities in both training and inference make them difficult for deployment. This thesis studies how to address the DNNs computation difficulties from three different perspectives. The first part of the thesis focuses on reducing the DNNs end-to-end inference latency under hardware resource and model accuracy constraint. A deeply pipelined Convolutional Neural Network(CNN) inference architecture that operated on partial input is proposed. A series of designs are implemented on Field Programmable Gate Array(FPGA), which provide different trade-offs among inference latency, accuracy, and hardware resource usage. The second part of the thesis presents an efficient training framework that trains deep neural networks with integer-only arithmetic. The framework has three major innovations. Firstly, all the model parameters are stored directly with 8-bit signed integers. Secondly, a pseudo stochastic rounding scheme is proposed, which achieves commonly used stochastic rounding while without the need of external random number generation. Thirdly, a segment approximation of cross entropy loss backpropagation scheme with integer-only arithmetic is presented. Combined with the above contributions, this thesis presented the world's first integer-only arithmetic training framework. The last part of the thesis uses ultrafast single-cell image classification as a concrete example to demonstrate how to use the proposed methods in previous parts of the thesis to meet the DNNs deployment requirements. Firstly, integer training is used to find low precision alternatives for the floating point model. Then a series of hardware designs for real-time inference are presented to explore the trade-off between hardware resources usage and classification latency. Finally, this thesis presents a real-time image-based single-cell detection and classification system with state-of-the-art inference latency.
DegreeDoctor of Philosophy
SubjectNeural networks (Computer science)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/286786

 

DC FieldValueLanguage
dc.contributor.advisorSo, HKH-
dc.contributor.advisorLam, EYM-
dc.contributor.authorWang, Maolin-
dc.contributor.author王茂林-
dc.date.accessioned2020-09-05T01:20:56Z-
dc.date.available2020-09-05T01:20:56Z-
dc.date.issued2020-
dc.identifier.citationWang, M. [王茂林]. (2020). Efficient training and inference of deep neural networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/286786-
dc.description.abstractDeep Neural Networks (DNNs) are widely used in many fields due to their superior performance. However, their computation complexities in both training and inference make them difficult for deployment. This thesis studies how to address the DNNs computation difficulties from three different perspectives. The first part of the thesis focuses on reducing the DNNs end-to-end inference latency under hardware resource and model accuracy constraint. A deeply pipelined Convolutional Neural Network(CNN) inference architecture that operated on partial input is proposed. A series of designs are implemented on Field Programmable Gate Array(FPGA), which provide different trade-offs among inference latency, accuracy, and hardware resource usage. The second part of the thesis presents an efficient training framework that trains deep neural networks with integer-only arithmetic. The framework has three major innovations. Firstly, all the model parameters are stored directly with 8-bit signed integers. Secondly, a pseudo stochastic rounding scheme is proposed, which achieves commonly used stochastic rounding while without the need of external random number generation. Thirdly, a segment approximation of cross entropy loss backpropagation scheme with integer-only arithmetic is presented. Combined with the above contributions, this thesis presented the world's first integer-only arithmetic training framework. The last part of the thesis uses ultrafast single-cell image classification as a concrete example to demonstrate how to use the proposed methods in previous parts of the thesis to meet the DNNs deployment requirements. Firstly, integer training is used to find low precision alternatives for the floating point model. Then a series of hardware designs for real-time inference are presented to explore the trade-off between hardware resources usage and classification latency. Finally, this thesis presents a real-time image-based single-cell detection and classification system with state-of-the-art inference latency. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNeural networks (Computer science)-
dc.titleEfficient training and inference of deep neural networks-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044268206503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats