File Download

There are no files associated with this item.

Supplementary

Conference Paper: Random Feature Attention

TitleRandom Feature Attention
Authors
KeywordsAttention
transformers
machine translation
language modeling
Issue Date2021
Citation
The 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, Austria, 3-7 May 2021 How to Cite?
AbstractTransformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, and explore its application in transformers. RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism. Experiments on language modeling and machine translation demonstrate that RFA achieves similar or better performance compared to strong transformer baselines. In the machine translation experiment, RFA decodes twice as fast as a vanilla transformer. Compared to existing efficient transformer variants, RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets. Our analysis shows that RFA’s efficiency gains are especially notable on long sequences, suggesting that RFA will be particularly useful in tasks that require working with large inputs, fast decoding speed, or low memory footprints.
DescriptionSpotlight Presentation
Persistent Identifierhttp://hdl.handle.net/10722/304336

 

DC FieldValueLanguage
dc.contributor.authorPeng, H-
dc.contributor.authorPappas, N-
dc.contributor.authorYogatama, D-
dc.contributor.authorSchwartz, R-
dc.contributor.authorSmith, N-
dc.contributor.authorKong, L-
dc.date.accessioned2021-09-23T08:58:37Z-
dc.date.available2021-09-23T08:58:37Z-
dc.date.issued2021-
dc.identifier.citationThe 9th International Conference on Learning Representations (ICLR 2021), Virtual Event, Austria, 3-7 May 2021-
dc.identifier.urihttp://hdl.handle.net/10722/304336-
dc.descriptionSpotlight Presentation-
dc.description.abstractTransformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, and explore its application in transformers. RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism. Experiments on language modeling and machine translation demonstrate that RFA achieves similar or better performance compared to strong transformer baselines. In the machine translation experiment, RFA decodes twice as fast as a vanilla transformer. Compared to existing efficient transformer variants, RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets. Our analysis shows that RFA’s efficiency gains are especially notable on long sequences, suggesting that RFA will be particularly useful in tasks that require working with large inputs, fast decoding speed, or low memory footprints.-
dc.languageeng-
dc.relation.ispartofInternational Conference on Learning Representations (ICLR 2021)-
dc.subjectAttention-
dc.subjecttransformers-
dc.subjectmachine translation-
dc.subjectlanguage modeling-
dc.titleRandom Feature Attention-
dc.typeConference_Paper-
dc.identifier.emailKong, L: lpk@cs.hku.hk-
dc.identifier.authorityKong, L=rp02775-
dc.identifier.hkuros324952-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats