File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms
Title | Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms |
---|---|
Authors | |
Advisors | |
Issue Date | 2020 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Deep neural networks (DNN) have achieved remarkable success in many challenging
tasks. However, the inference process of deep neural network models is
highly memory-intensive and computation-intensive due to the over-parameterization
property of deep neural networks, which impedes the deployment of DNN models
in resource-limited and latency-sensitive scenarios. Model compression has been
considered as the remedy to improve the storage and computation efficiency of deep
neural networks. Among all the approaches to model compression, network pruning
can remove over 90% of the model parameters with little loss of performance.
Network pruning also helps to avoid over-fitting therefore better generalization
performance can be achieved with properly pruned neural networks.
Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized
model, 2) prune some portion of the less important parameter in the
pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model
performance.
However, this traditional three-stage pipeline has two critical problems. Firstly,
the expensive pruning and fine-tuning iterations require many additional training
epochs. Secondly, the process of updating network parameters and the process of
finding the optimal sparse structure is decoupled, which fails to obtain the optimal
sparse subnetwork.
The contributions in this thesis consist of two parts. Firstly, we extend the traditional
three-stage pipeline on recurrent neural networks especially the long short-term
memory network (LSTM), and propose hidden-state pruning that achieves higher
compression and acceleration ratio. Secondly, we propose a novel sparse training
algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse
structure can be revealed with the same budget as training dense models. Besides, the
potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods. |
Degree | Master of Philosophy |
Subject | Neural networks (Computer science) |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/286784 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | So, HKH | - |
dc.contributor.advisor | Wong Lui, KS | - |
dc.contributor.author | Liu, Junjie | - |
dc.contributor.author | 刘俊杰 | - |
dc.date.accessioned | 2020-09-05T01:20:55Z | - |
dc.date.available | 2020-09-05T01:20:55Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Liu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/286784 | - |
dc.description.abstract | Deep neural networks (DNN) have achieved remarkable success in many challenging tasks. However, the inference process of deep neural network models is highly memory-intensive and computation-intensive due to the over-parameterization property of deep neural networks, which impedes the deployment of DNN models in resource-limited and latency-sensitive scenarios. Model compression has been considered as the remedy to improve the storage and computation efficiency of deep neural networks. Among all the approaches to model compression, network pruning can remove over 90% of the model parameters with little loss of performance. Network pruning also helps to avoid over-fitting therefore better generalization performance can be achieved with properly pruned neural networks. Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized model, 2) prune some portion of the less important parameter in the pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model performance. However, this traditional three-stage pipeline has two critical problems. Firstly, the expensive pruning and fine-tuning iterations require many additional training epochs. Secondly, the process of updating network parameters and the process of finding the optimal sparse structure is decoupled, which fails to obtain the optimal sparse subnetwork. The contributions in this thesis consist of two parts. Firstly, we extend the traditional three-stage pipeline on recurrent neural networks especially the long short-term memory network (LSTM), and propose hidden-state pruning that achieves higher compression and acceleration ratio. Secondly, we propose a novel sparse training algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse structure can be revealed with the same budget as training dense models. Besides, the potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Neural networks (Computer science) | - |
dc.title | Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2020 | - |
dc.identifier.mmsid | 991044268205703414 | - |