Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms

Liu, Junjie; 刘俊杰

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms

Title	Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms
Authors	Liu, Junjie 刘俊杰
Advisors	Advisor(s):So, HKH Wong Lui, KS
Issue Date	2020
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Deep neural networks (DNN) have achieved remarkable success in many challenging tasks. However, the inference process of deep neural network models is highly memory-intensive and computation-intensive due to the over-parameterization property of deep neural networks, which impedes the deployment of DNN models in resource-limited and latency-sensitive scenarios. Model compression has been considered as the remedy to improve the storage and computation efficiency of deep neural networks. Among all the approaches to model compression, network pruning can remove over 90% of the model parameters with little loss of performance. Network pruning also helps to avoid over-fitting therefore better generalization performance can be achieved with properly pruned neural networks. Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized model, 2) prune some portion of the less important parameter in the pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model performance. However, this traditional three-stage pipeline has two critical problems. Firstly, the expensive pruning and fine-tuning iterations require many additional training epochs. Secondly, the process of updating network parameters and the process of finding the optimal sparse structure is decoupled, which fails to obtain the optimal sparse subnetwork. The contributions in this thesis consist of two parts. Firstly, we extend the traditional three-stage pipeline on recurrent neural networks especially the long short-term memory network (LSTM), and propose hidden-state pruning that achieves higher compression and acceleration ratio. Secondly, we propose a novel sparse training algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse structure can be revealed with the same budget as training dense models. Besides, the potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods.
Degree	Master of Philosophy
Subject	Neural networks (Computer science)
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/286784

DC Field	Value	Language
dc.contributor.advisor	So, HKH	-
dc.contributor.advisor	Wong Lui, KS	-
dc.contributor.author	Liu, Junjie	-
dc.contributor.author	刘俊杰	-
dc.date.accessioned	2020-09-05T01:20:55Z	-
dc.date.available	2020-09-05T01:20:55Z	-
dc.date.issued	2020	-
dc.identifier.citation	Liu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/286784	-
dc.description.abstract	Deep neural networks (DNN) have achieved remarkable success in many challenging tasks. However, the inference process of deep neural network models is highly memory-intensive and computation-intensive due to the over-parameterization property of deep neural networks, which impedes the deployment of DNN models in resource-limited and latency-sensitive scenarios. Model compression has been considered as the remedy to improve the storage and computation efficiency of deep neural networks. Among all the approaches to model compression, network pruning can remove over 90% of the model parameters with little loss of performance. Network pruning also helps to avoid over-fitting therefore better generalization performance can be achieved with properly pruned neural networks. Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized model, 2) prune some portion of the less important parameter in the pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model performance. However, this traditional three-stage pipeline has two critical problems. Firstly, the expensive pruning and fine-tuning iterations require many additional training epochs. Secondly, the process of updating network parameters and the process of finding the optimal sparse structure is decoupled, which fails to obtain the optimal sparse subnetwork. The contributions in this thesis consist of two parts. Firstly, we extend the traditional three-stage pipeline on recurrent neural networks especially the long short-term memory network (LSTM), and propose hidden-state pruning that achieves higher compression and acceleration ratio. Secondly, we propose a novel sparse training algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse structure can be revealed with the same budget as training dense models. Besides, the potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.title	Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms	-
dc.type	PG_Thesis	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2020	-
dc.identifier.mmsid	991044268205703414	-

File Download

Supplementary

postgraduate thesis: Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats