File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

TitleDyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference
Authors
KeywordsAccelerator
deep neural networks (DNNs)
FPGAs
machine learning
quantization
Issue Date13-Dec-2023
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, v. 42, n. 5, p. 1613-1617 How to Cite?
AbstractTo accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variablelength encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-fields to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the ImageNet inference accuracy via DyBit is 1.97% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1× speedup compared with the original ResNet-50 model.
Persistent Identifierhttp://hdl.handle.net/10722/366081
ISSN
2023 Impact Factor: 2.7
2023 SCImago Journal Rankings: 0.957

 

DC FieldValueLanguage
dc.contributor.authorZhou, Jiajun-
dc.contributor.authorWu, Jiajun-
dc.contributor.authorGao, Yizhao-
dc.contributor.authorDing, Yuhao-
dc.contributor.authorTao, Chaofan-
dc.contributor.authorLi, Boyu-
dc.contributor.authorTu, Fengbin-
dc.contributor.authorCheng, Kwang Ting-
dc.contributor.authorSo, Hayden Kwok Hay-
dc.contributor.authorWong, Ngai-
dc.date.accessioned2025-11-15T00:35:25Z-
dc.date.available2025-11-15T00:35:25Z-
dc.date.issued2023-12-13-
dc.identifier.citationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, v. 42, n. 5, p. 1613-1617-
dc.identifier.issn0278-0070-
dc.identifier.urihttp://hdl.handle.net/10722/366081-
dc.description.abstractTo accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variablelength encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-fields to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the ImageNet inference accuracy via DyBit is 1.97% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1× speedup compared with the original ResNet-50 model.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectAccelerator-
dc.subjectdeep neural networks (DNNs)-
dc.subjectFPGAs-
dc.subjectmachine learning-
dc.subjectquantization-
dc.titleDyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference-
dc.typeArticle-
dc.identifier.doi10.1109/TCAD.2023.3342730-
dc.identifier.scopuseid_2-s2.0-85179783284-
dc.identifier.volume42-
dc.identifier.issue5-
dc.identifier.spage1613-
dc.identifier.epage1617-
dc.identifier.eissn1937-4151-
dc.identifier.issnl0278-0070-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats