File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TCSII.2023.3322259
- Scopus: eid_2-s2.0-85174812317
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC
Title | A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC |
---|---|
Authors | |
Keywords | Artificial neural networks Clocks Deep learning Energy efficiency fixed-point floating-point Hardware HPC MAC Multiple-precision PE Random access memory Training |
Issue Date | 5-Oct-2023 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Transactions on Circuits and Systems II: Express Briefs, 2023 How to Cite? |
Abstract | High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating-and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support 9×BFloat16 (BF16), 4×half-precision (FP16), 4×TensorFloat-32 (TF32) and 1×single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72×INT2, 36×INT4 and 9×INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10× and 4× improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference. |
Persistent Identifier | http://hdl.handle.net/10722/339469 |
ISSN | 2023 Impact Factor: 4.0 2023 SCImago Journal Rankings: 1.523 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Boyu | - |
dc.contributor.author | Li, Kai | - |
dc.contributor.author | Zhou, Jiajun | - |
dc.contributor.author | Ren, Yuan | - |
dc.contributor.author | Mao, Wei | - |
dc.contributor.author | Yu, Hao | - |
dc.contributor.author | Wong, Ngai | - |
dc.date.accessioned | 2024-03-11T10:36:53Z | - |
dc.date.available | 2024-03-11T10:36:53Z | - |
dc.date.issued | 2023-10-05 | - |
dc.identifier.citation | IEEE Transactions on Circuits and Systems II: Express Briefs, 2023 | - |
dc.identifier.issn | 1549-7747 | - |
dc.identifier.uri | http://hdl.handle.net/10722/339469 | - |
dc.description.abstract | <p>High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating-and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support 9×BFloat16 (BF16), 4×half-precision (FP16), 4×TensorFloat-32 (TF32) and 1×single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72×INT2, 36×INT4 and 9×INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10× and 4× improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.<br></p> | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Transactions on Circuits and Systems II: Express Briefs | - |
dc.subject | Artificial neural networks | - |
dc.subject | Clocks | - |
dc.subject | Deep learning | - |
dc.subject | Energy efficiency | - |
dc.subject | fixed-point | - |
dc.subject | floating-point | - |
dc.subject | Hardware | - |
dc.subject | HPC | - |
dc.subject | MAC | - |
dc.subject | Multiple-precision | - |
dc.subject | PE | - |
dc.subject | Random access memory | - |
dc.subject | Training | - |
dc.title | A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/TCSII.2023.3322259 | - |
dc.identifier.scopus | eid_2-s2.0-85174812317 | - |
dc.identifier.eissn | 1558-3791 | - |
dc.identifier.issnl | 1549-7747 | - |