File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3392717.3392749
- Scopus: eid_2-s2.0-85088517313
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks
Title | CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks |
---|---|
Authors | |
Keywords | FPGA RNN structured pruning workload balancing |
Issue Date | 2020 |
Publisher | Association for Computing Machinery (ACM). |
Citation | Proceedings of the 34th ACM International Conference on Supercomputing (ICS 2020), Barcelona Spain, 29 June - 2 July 2020, Article No. 24: 1-12 How to Cite? |
Abstract | Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning (a model compression method) schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure.
This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency (utilization). Compared to the vanilla design without optimizations, the hardware utilization has been enhanced by over 2X. With experiments on 10 RNN models from multiple application domains, CSB pruning demonstrates 3.5X-25X lossless pruning rate, which is 1.6X to 3.9X over existing designs. With several other innovations applied, the CSB-RNN inference can achieve faster-than-realtime latency of 0.79μs-6.58μs in an FPGA implementation, which contributes to 1.12X-12.57X lower latency and 3.53X-58.89X improvement on power-efficiency over the state-of-the-art. |
Description | Session 6: Architecture II |
Persistent Identifier | http://hdl.handle.net/10722/288469 |
ISBN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Shi, R | - |
dc.contributor.author | Dong, P | - |
dc.contributor.author | Geng, T | - |
dc.contributor.author | Ding, Y | - |
dc.contributor.author | Ma, X | - |
dc.contributor.author | So, HKH | - |
dc.contributor.author | Herbordt, M | - |
dc.contributor.author | Li, A | - |
dc.contributor.author | Wang, Y | - |
dc.date.accessioned | 2020-10-05T12:13:22Z | - |
dc.date.available | 2020-10-05T12:13:22Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Proceedings of the 34th ACM International Conference on Supercomputing (ICS 2020), Barcelona Spain, 29 June - 2 July 2020, Article No. 24: 1-12 | - |
dc.identifier.isbn | 9781450379830 | - |
dc.identifier.uri | http://hdl.handle.net/10722/288469 | - |
dc.description | Session 6: Architecture II | - |
dc.description.abstract | Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning (a model compression method) schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency (utilization). Compared to the vanilla design without optimizations, the hardware utilization has been enhanced by over 2X. With experiments on 10 RNN models from multiple application domains, CSB pruning demonstrates 3.5X-25X lossless pruning rate, which is 1.6X to 3.9X over existing designs. With several other innovations applied, the CSB-RNN inference can achieve faster-than-realtime latency of 0.79μs-6.58μs in an FPGA implementation, which contributes to 1.12X-12.57X lower latency and 3.53X-58.89X improvement on power-efficiency over the state-of-the-art. | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery (ACM). | - |
dc.relation.ispartof | Proceedings of the 34th ACM International Conference on Supercomputing (ICS 2020) | - |
dc.subject | FPGA | - |
dc.subject | RNN | - |
dc.subject | structured pruning | - |
dc.subject | workload balancing | - |
dc.title | CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks | - |
dc.type | Conference_Paper | - |
dc.identifier.email | So, HKH: hso@eee.hku.hk | - |
dc.identifier.authority | So, HKH=rp00169 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/3392717.3392749 | - |
dc.identifier.scopus | eid_2-s2.0-85088517313 | - |
dc.identifier.hkuros | 315356 | - |
dc.publisher.place | New York, NY | - |