ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection

Hu, Sihao; Huang, Tiansheng; Chow, Ka-Ho; Wei, Wenqi; Wu, Yanzhao; Liu, Ling

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection

Title	ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection
Authors	Hu, Sihao Huang, Tiansheng Chow, Ka-Ho Wei, Wenqi Wu, Yanzhao Liu, Ling
Issue Date	13-May-2024
Abstract	Language models (LMs) have demonstrated superior performance in detecting fraudulent activities on Blockchains. Nonetheless, the sheer volume of Blockchain data results in excessive memory and computational costs when training LMs from scratch, limiting their capabilities to large-scale applications. In this paper, we present ZipZap, a framework tailored to achieve both parameter and computational efficiency when training LMs on large-scale transaction data. First, with the frequency-aware compression, an LM can be compressed down to a mere 7.5% of its initial size with an imperceptible performance dip. This technique correlates the embedding dimension of an address with its occurrence frequency in the dataset, motivated by the observation that embeddings of low-frequency addresses are insufficiently trained and thus negating the need for a uniformly large dimension for knowledge representation. Second, ZipZap accelerates the speed through the asymmetric training paradigm: It performs transaction dropping and cross-layer parameter-sharing to expedite the pre-training process, while revert to the standard training paradigm for fine-tuning to strike a balance between efficiency and efficacy, motivated by the observation that the optimization goals of pre-training and fine-tuning are inconsistent. Evaluations on real-world, large-scale datasets demonstrate that ZipZap delivers notable parameter and computational efficiency improvements for training LMs. Our implementation is available at: https://github.com/git-disl/ZipZap.
Persistent Identifier	http://hdl.handle.net/10722/347896

DC Field	Value	Language
dc.contributor.author	Hu, Sihao	-
dc.contributor.author	Huang, Tiansheng	-
dc.contributor.author	Chow, Ka-Ho	-
dc.contributor.author	Wei, Wenqi	-
dc.contributor.author	Wu, Yanzhao	-
dc.contributor.author	Liu, Ling	-
dc.date.accessioned	2024-10-02T06:25:17Z	-
dc.date.available	2024-10-02T06:25:17Z	-
dc.date.issued	2024-05-13	-
dc.identifier.uri	http://hdl.handle.net/10722/347896	-
dc.description.abstract	<p>Language models (LMs) have demonstrated superior performance in detecting fraudulent activities on Blockchains. Nonetheless, the sheer volume of Blockchain data results in excessive memory and computational costs when training LMs from scratch, limiting their capabilities to large-scale applications. In this paper, we present ZipZap, a framework tailored to achieve both parameter and computational efficiency when training LMs on large-scale transaction data. First, with the frequency-aware compression, an LM can be compressed down to a mere 7.5% of its initial size with an imperceptible performance dip. This technique correlates the embedding dimension of an address with its occurrence frequency in the dataset, motivated by the observation that embeddings of low-frequency addresses are insufficiently trained and thus negating the need for a uniformly large dimension for knowledge representation. Second, ZipZap accelerates the speed through the asymmetric training paradigm: It performs transaction dropping and cross-layer parameter-sharing to expedite the pre-training process, while revert to the standard training paradigm for fine-tuning to strike a balance between efficiency and efficacy, motivated by the observation that the optimization goals of pre-training and fine-tuning are inconsistent. Evaluations on real-world, large-scale datasets demonstrate that ZipZap delivers notable parameter and computational efficiency improvements for training LMs. Our implementation is available at: https://github.com/git-disl/ZipZap.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	ACM Web Conference 2024 (13/05/2024-17/05/2024, Singapore)	-
dc.title	ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats