File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

TitleVertical Layering of Quantized Neural Networks for Heterogeneous Inference
Authors
Keywordsbit-width scalable network
Computational modeling
Degradation
Hardware
layered coding
multi-objective optimization
Neural networks
Optimization
Quantization (signal)
quantization-aware training
Training
Issue Date1-Dec-2023
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 12, p. 15964-15978 How to Cite?
Abstract

Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits (i.e., vertical layers) organized from the most significant bit (also called the basic layer) to less significant bits (i.e., enhance layers). Hence, a neural network with an arbitrary quantization precision can be obtained by adding corresponding enhance layers to the basic layer. However, we empirically find that models obtained with existing quantization methods suffer severe performance degradation if they are adapted to vertical-layered weight representation. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. After the model is trained, to construct a vertical-layered network, the lowest bit-width quantized weights become the basic layer, and every bit dropped along the downsampling process act as an enhance layer. Our design is extensively evaluated on CIFAR-100 and ImageNet datasets. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.


Persistent Identifierhttp://hdl.handle.net/10722/338164
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWu, Hai-
dc.contributor.authorHe, Ruifei-
dc.contributor.authorTan, Haoru-
dc.contributor.authorQi, Xiaojuan-
dc.contributor.authorHuang, Kaibin-
dc.date.accessioned2024-03-11T10:26:44Z-
dc.date.available2024-03-11T10:26:44Z-
dc.date.issued2023-12-01-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 12, p. 15964-15978-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/338164-
dc.description.abstract<p>Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits (i.e., vertical layers) organized from the most significant bit (also called the basic layer) to less significant bits (i.e., enhance layers). Hence, a neural network with an arbitrary quantization precision can be obtained by adding corresponding enhance layers to the basic layer. However, we empirically find that models obtained with existing quantization methods suffer severe performance degradation if they are adapted to vertical-layered weight representation. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. After the model is trained, to construct a vertical-layered network, the lowest bit-width quantized weights become the basic layer, and every bit dropped along the downsampling process act as an enhance layer. Our design is extensively evaluated on CIFAR-100 and ImageNet datasets. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.subjectbit-width scalable network-
dc.subjectComputational modeling-
dc.subjectDegradation-
dc.subjectHardware-
dc.subjectlayered coding-
dc.subjectmulti-objective optimization-
dc.subjectNeural networks-
dc.subjectOptimization-
dc.subjectQuantization (signal)-
dc.subjectquantization-aware training-
dc.subjectTraining-
dc.titleVertical Layering of Quantized Neural Networks for Heterogeneous Inference-
dc.typeArticle-
dc.identifier.doi10.1109/TPAMI.2023.3319045-
dc.identifier.scopuseid_2-s2.0-85173008691-
dc.identifier.volume45-
dc.identifier.issue12-
dc.identifier.spage15964-
dc.identifier.epage15978-
dc.identifier.eissn1939-3539-
dc.identifier.isiWOS:001104973300118-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats